[OSCON 2008] Caching and Performance: Lessons from Facebook by Lucas Nealan
Facebook is not just a Social Networking Site, they're a "Social Utility". They have the 4th most trafficked site in the world and over 90 million active users. Of those users, the average usage is 50 pages per day. There's currently over 24,000 platform applications. There's thousands of Apache web servers and hundreds of MySQL and Memcached servers.
The biggest problem with scaling for Facebook is the complexity. Connecting to all the databases is impossible. They have a very large codebase - their homepage has 500 library files and 10,000 functions. Scaling affects resources, particularly with regards to memory consumption and socket connection limits. Cache retrieval is ~10% cpu-user of most pages.
Caching Layers: $GLOBALS, APC, Memcached, Database, Browser Cache, Third Party CDN.
The Globals Cache is a PHP function called "cache_get". The Globals Cache works nicely in that it avoids calling APC and Memcached, but it still requires the overhead of a function call. APC (Alternative PHP Cache) is used for opcode caching (hundreds of included libraries, thousands of functions) and variable caching (hundreds of MB's of data). They use the APC for non-user specific data: network/school information, database information, useragent strings, hot application data, site variables and language strings (now the largest consumer of memory). They don't use it for User data because they don't send users back to the same server each time.
Friends page with a normal run takes 4050ms, with APC enabled it takes 135ms. You can also set apc.stat=0 to gain even more speed (128ms). To bust client-caching, they use APC+SVN with the SVN tag on the file and get the latest version from SVN and store it. Of course, this is a "prime the pump" thing that doesn't happen in production at runtime.
The next layer of caching is Memcached. Facebook currently utilizes over 400 instances of Memcached and has made contributions back to the project. They use Memcached for user-specific data: long profile, short profile, friends and applications. They don't use the timeout feature, but rather use cache invalidation on SQL insert and update. It's harder to do when writing your application, but it's easier to maintain in the long run. To make Memcached faster, they created a PHP extension that reduced PHP function calling overhead and allowed UDP support. The Memcached extension runs ~10% faster realtime than in PHP space.
Facebook likes for each page to render in under 250ms on the backend. To see how long a page took to load, you can mouseover the copyright at the bottom of the page, and a tooltip will show you the elapsed time.
This presentation is available online at http://sizzo.org/talks.
Posted by Montana Harkin on July 24, 2008 at 12:08 AM MDT #