2. Introduction
• 1.15 billion monthly active users .
• 2.5 billion content items shared per day (status updates + wall posts +
photos + videos + comments)
• 2.7 billion Likes per day
• 300 million photos uploaded per day
• 500+terabytes of new data ingested into the databases every day
By this statics, Facebook have to use such a great technology to handle
this traffic and giving their user a faster and safer social experience
3. Technologies
For faster data transfer
• Cookies and Caches
• GZip compression
• AJAX and JSON
• XMPP messaging
For data storage
• HBase & Haystack
• Zookeeper
• Memcached
• Scribe
4. Cookies and Caches
Cookies are small pieces of data that are stored on your
computer, mobile phone or other device.
Cache is a type of memory which is used by web browser. When any
page loads and it is not changeable for a long time browser cache it’s
CSS/JS and read it from memory to reduce the data transfer .
It provide and understand a range of products and services.
Facebook use this technologies to do things like:
• make Facebook easier or faster to use;
• enable features and store information about you (including on your
device or in your browser cache) and your use of Facebook;
• deliver, understand and improve advertising;
• monitor and understand the use of FB products and services;
• to protect you, others and Facebook.
6. Gzip Compression
Gzip is a software application used for file compression and
decompression
It compresses the image, CSS, JS sent by server and loads in client
machine then decompress it. So there is no change in data and UI but
data transfer rate is decreased. So all servers of Facebook used Gzip
compression to make web more faster
7. AJAX and JSON
AJAX and JSON is a group of interrelated web development techniques
used on the client-side to create asynchronous web applications.
With AJAX, web applications can send data to, and retrieve data
from, a server asynchronously (in the background) without interfering
with the display and behavior of the existing page.
Data can be retrieved using the XMlHttpRequest object.
Where AJAX-JSON mainly used in Facebook
• Like, Comment, Share
• Post story
• Send message
• Load feed
• Dialog Box – likes, Mutual friends etc…
9. XMPP Messaging
XMPP stands for Extensible Messaging and Presence Protocol.
XMPP is also called jabber protocol.
Facebook chat and messages work on this platform.
Every user of Facebook has a unique id and personal chat email like
100000874067290@chat.facebook.com and someone wants to send
message to that user core script convert it to XML and send to Jabber
server.
After this process partner user gets the message at same instance due
to highly reliable servers.
12. HBase and Haystack
HDFS ( Highly Distributed File System )
• HBase & HDFS are elastic by design
• Multiple table shards (regions) per physical server
• On node additions
• Load balancer automatically reassigns shards from overloaded
nodes to new nodes
• Because file system underneath is itself distributed, data for
reassigned regions is instantly servable from the new nodes.
• Regions can be dynamically split into smaller regions.
• Pre-sharding is not necessary
• Splits are near instantaneous!
13. HBase and Haystack
Automatic failover
• Node failures automatically detected by HBase Master
• Regions on failed node are distributed evenly among surviving
nodes.
• Multiple regions/server model avoids need for substantial
overprovisioning
• HBase Master failover
• 1 active, rest standby
• When active master fails, a standby automatically takes over
15. Zookeeper
Zookeeper is open source software that FB use mainly for two purposes:
• As the controller for implementing sharding and failover of
application servers
• As a store for their discovery service.
Since Zookeeper provides FB with a highly available repository and
notification mechanism, it goes a long way towards helping FB build a
highly available service.
16. Memcached
If you've read anything about scaling large websites, you've probably
heard about memcached.
Memcached is a high-performance, distributed memory object
caching system.
It speeding up Facebook by alleviating database load.
Memcached is an in-memory key-value store for small chunks of
arbitrary data (strings, objects) from results of database calls, API calls, or
page rendering.
Facebook is the world's largest user of memcached. They use
memcached to alleviate database load.
17. Scribe – Log server
Scribe was developed at Facebook using Apache Thrift and released in
2008 as open source.
Scribe is a server for aggregating log data streamed in real-time from a
large number of servers. It is designed to be scalable, extensible without
client-side modification, and robust to failure of the network or any
specific machine.
Desktop site
Mobile site
Application
Legacy
SMS/Email
Scribe
Scribe
Scribe