Facebook invited a handful of tech journalists to its Menlo Park campus on Wednesday for a briefing on how the world’s biggest social network approaches the technical challenges of handling big data.
As in, lots of data.
Every day, Facebook users share 2.5 billion unique pieces of content – including photos, videos, wall posts, updates and comments. Those users hit the “like” button, either on Facebook or on other sites, some 2.7 billion times a day.
They also upload 300 million photos a day. And all told, Facebook’s system must ingest more than 500 terabytes of data every day.
Jay Parikh, a Facebook vice president for infrastructure, gave the group a quick rundown on some of the software systems that Facebook has developed to handle that deluge of data. He also said the company is constantly finding new ways to use its data – creating new lists of suggested friends, for example, refining its system for auctioning advertisements, or tracking how different ads are received by different groups of users, based on gender, location and other characteristics.
The company also makes extensive use of “a/b testing,” a concept also employed heavily at Google. The idea is to offer different versions of each new idea or feature to different subsets of users, to see which is better received. Facebook runs “tens of thousands” of such experiments “at any point in time,” Parikh said.
Facebook approaches data somewhat differently than other big companies, Parikh boasted, explaining that Facebook makes all of its data available for all of its divisions, so individual product groups don’t face bureaucratic hurdles when they want to examine data from other segments of the company.
“We build one product. On our home page you get recommendations, advertisements, notifications andmessages. There are hundreds of different systems but it all has to come together in one unified user experience,” he said.
But in answer to a question, Parikh said the company has strict rules governing how employees can use that data. User information is anonymized, and the company keeps track every time an engineer accesses any data. “We train people on how they can use data and we have zero tolerance for abuse,” he added.