Jay Parikh, Facebook’s vice president of infrastructure engineering, went down a list of data stats to break down the massive amount of data the social network processes each day. When talking with reporters today, Parikh claimed that Facebook’s data cluster is larger than any comparable cluster at other companies. Here are some of those numbers:
- Facebook scans 105 terabytes of data every 30 minutes
- Has more than 100 petabytes of disk space
- Processes 2.7 billion likes made daily on and off of the Facebook site
- Deals with 300 million photos uploaded – daily
- 70,000 queries executed by people and automated systems – our site definitely executes one of the 70,000 queries
- “ingests” 500+ terabytes of new data
Instead of partitioning the data — essentially dividing it up and storing it based on criteria — like most companies do to make data more manageable, Facebook keeps it in one place for easy access.
That means an engineer who wants to identify stats or trends in a function, like how quickly people respond to messages, can easily get the data, write a code, and get results.
When pressed by reporters, Parikh said Facebook has a zero-tolerance policy when it comes to any abuse from this broad access. Additionally, all access is logged and monitored heavily, he said.
If you want to see Parikh’s short presentation of crazy numbers, and complex flow charts of Facebook’s data system, check out this Scribd link.