SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
1
Erlang at Facebook
Eugene Letuchy
Apr 30, 2009
2
1 Facebook ... and Erlang
2 Story of Facebook Chat
3 Facebook Chat Architecture
4 Key Erlang Features
5 Then and Now
Agenda
3
Facebook ... and Erlang
4
The Facebook Environment
▪ The Site
▪ More than 200 million active users
▪ More than 3.5 billion minutes are spent on Facebook each day
▪ Fewer than 900 employees
▪ The Engineering Team
▪ Fast iteration: code gets out to production within a week
▪ Polyglot programming: interoperability is key
▪ Practical: high-leverage tools win
5
Erlang Projects
▪ Chat: the biggest and best known user
▪ AIM Presence: a JSONP validator
▪ Chat Jabber support (ejabberd)
6
Facebook Chat
7
2007: Facebook needs Chat
Messages, Wall, Links aren’t enough
8
Enter a Hackathon (Jan 2007)
▪ Chat started in one night of coding
▪ Floating conversation windows
▪ No buddy list
▪ One server (no distribution)
▪ Erlang was there!
9
Enter Eugene (Feb 2007)
▪ I joined Facebook after Chat Hackathon
▪ What is this Erlang?
▪ Spring 2007:
▪ Learning Erlang from Joe Armstrong's thesis
▪ Lots of prototyping
▪ Evaluating infrastructure needs
▪ Summer 2007:
▪ Chris Piro works on Erlang Thrift bindings
10
Let’s do this!
▪ Mid-Fall 2007: Chat becomes a “real” project
▪ 4 engineers, 0.5 designer
▪ Infrastructure components get built and improved
▪ Feb 2008: “Dark launch” testing begins
▪ Simulates load on the Erlang servers ... they hold up
▪ Apr 6, 2008: First real Chat message sent
▪ Apr 23, 2008: 100% rollout (Facebook has 70M users at the time)
11
Launch: April 2008
▪ Apr 6, 2008: gradual live rollout starts
▪ First message: "msn chat?"
▪ Apr 23, 2008: 100% rollout (to Facebook’s 70M users)
▪ Graph of sends in the first days of launch
0
3
6
9
12
15
Tue 00:00 12:00 Wed 00:00 12:00
millions of sends per hour
12
Chat ... one year later
▪ Facebook has 200M active users
▪ 800+ million user messages / day
▪ 7+ million active channels at peak
▪ 1GB+ in / sec at peak
▪ 100+ channel machines
▪ ~9-10 times the work at launch;
~2 as many machines
13
Chat Architecture
14
System challenges
▪ How does synchronous messaging work on the Web?
▪ “Presence” is hard to scale
▪ Need a system to queue and deliver messages
▪ Millions of connections, mostly idle
▪ Need logging, at least between page loads
▪ Make it work in Facebook’s environment
15
System overview
16
System overview - User Interface
Chat in the browser?
▪ Chat bar affixed to the bottom of each Facebook page
▪ Mix of client-side Javascript and server-side PHP
▪ Works around transport errors, browser differences
▪ Regular AJAX for sending messages, fetching conversation history
▪ Periodic AJAX polling for list of online friends
▪ AJAX long-polling for messages (Comet)
17
System Overview - Back End
How does the back end service requests?
▪ Discrete responsibilities for each service
▪ Communicate via Thrift
▪ Channel (Erlang): message queuing and delivery
▪ Queue messages in each user’s “channel”
▪ Deliver messages as responses to long-polling HTTP requests
▪ Presence (C++): aggregates online info in memory (pull-based presence)
▪ Chatlogger (C++): stores conversations between page loads
▪ Web tier (PHP): serves our vanilla web requests
18
System overview
19
Message send
Me:
Lunch?
Eugene:
Lunch?
1 - ajax
2a - thrift
2b - thrift
3 - long poll
20
Channel servers (Erlang)
21
Channel servers
Architectural overview
▪ One channel per user
▪ Web tier delivers messages for that user
▪ Channel State: short queue of sequenced messages
▪ Long poll for streaming (Comet)
▪ Clients make an HTTP request
▪ Server replies when a message is ready
▪ One active request per browser tab
22
channel application
messages
authentication
online list messages
23
Channel servers
Architectural details
▪ Distributed design
▪ User id space is partitioned (division of labor)
▪ Each partition is serviced by a cluster (availability)
▪ Presence aggregation
▪ Channel servers are authoritative
▪ Periodically shipped to presence servers
▪ Open source: Erlang, Mochiweb, Thrift, Scribe, fb303,et al.
24
Key Erlang Features we love
25
Concurrency
▪ Cheap parallelism at massive scale
▪ Simplifies modeling concurrent interactions
▪ Chat users are independent and concurrent
▪ Mapping onto traditional OS threads is unnatural
▪ Locality of reference
▪ Bonus: carries over to non-Erlang concurrent programming
26
Distribution
▪ Connected network of nodes
▪ Remote processes look like local processes
▪ Any node in a channel server cluster can route requests
▪ Naive load balancing
▪ Distributed Erlang works out-of-the-box (all nodes are trusted)
27
Fault Isolation
▪ Bugs in the initial versions of Chat:
▪ Process leaks in the Thrift bindings
▪ Unintended multicasting of messages
▪ Bad return state for presence aggregators
▪ (Horrible) bugs don’t kill a mostly functional system:
▪ C/C++ segfault takes down the OS process and your server state
▪ Erlang badmatch takes down an Erlang process
▪ ... and notifies linked processes
28
Error logging (Crash Reports)
▪ Any proc_lib-compliant process generates crash reports
▪ Error reports can be handled out of band (not where generated)
▪ Stacktraces point the way to bugs (functional languages win big here)
▪ ... but they could be improved with source line numbers
▪ Writing error_log handlers is simple:
▪ gen_event behavior
▪ Allows for massaging of the crash and error messages (binaries!)
▪ Thrift client in the error log
▪ WARNING: error logging can OOM the Erlang node
29
Hot code swapping
▪ Restart-free upgrades are awesome (!)
▪ Pushing new functional code for Chat takes ~20 seconds
▪ No state is lost
▪ Test on a running system
▪ Provides a safety net ... rolling back bad code is easy
▪ NOTE: we don’t use the OTP release/upgrade strategies
30
Monitoring and Error Recovery
▪ Supervision hierarchies
▪ Organize (and control) processes
▪ Organize thoughts
▪ Systematize restarts and error recovery
▪ simple_one_for_one for dynamic child processes
▪ net_kernel (Distributed Erlang)
▪ sends nodedown, nodeup messages
▪ any process can subscribe
▪ heart: monitors and restarts the OS process
31
Remote Shell
▪ To invoke:
> erl -name hidden -hidden -remsh <node_name> -setcookie <cookie>
Eshell V5.7.1 (abort with ^G)
(<node_name>)1>
▪ Ad-hoc inspection of a running node
▪ Command-and-control from a console
▪ Combines with hot code loading
32
Erlang top (etop)
▪ Shows Erlang processes, sorted by
reductions, memory and message
queue
▪ OS functionality ... for free
33
Hibernation
▪ Drastically shrink memory usage with erlang:hibernate/3
▪ Throws away the call stack
▪ Minimizes the heap
▪ Enters a wait state for new messages
▪ “Jumps” into a passed-in function for a received message
▪ Perfect for a long-running, idling HTTP request handler
▪ But ... not compatible with gen_server:call (and gen_server:reply)
▪ gen_server:call has its own receive() loop
▪ hibernate() doesn’t support have an explicit timeout
▪ Fixed with a few hours and a look at gen.erl
34
Symmetric MultiProcessing (SMP)
▪ Take advantage of multi-core servers
▪ erl -smp runs multiple scheduler threads inside the node
▪ SMP is emphasized in recent Erlang development
▪ Added to Erlang R11B
▪ Erlang R12B-0 through R13B include fixes and perf boosts
▪ Smart people have been optimizing our code for a year (!)
▪ Upgraded to R13B last night with about 1/3 less load
35
hipe_bifs
Cheating single assignment
▪ Erlang is opinionated:
▪ Destructive assignment is hard because it should be
▪ hipe_bifs:bytearray_update() allows for destructive array assignment
▪ Necessary for aggregating Chat users’ presence
▪ Don’t tell anyone!
36
Then and now Erlang in Progress
37
Then ... a steep learning curve
▪ Start of 2007:
▪ Few industry-focused English-language resources
▪ Few blogs (outside of Yariv’s and Joel Reymont’s)
▪ Code examples spread out and disorganized
▪ U.S. Erlang community limited in number and visibility
38
Now ...
▪ Programming Erlang (Jun 2007)
▪ Erlang Programming (upcoming...)
▪ More blogs and blog aggregators:
▪ Planet Erlang, Planet TrapExit
▪ Erlang Factory aggregates Erlang developments
▪ More code available:
▪ GitHub, CEAN
▪ More general-purpose Open Source Libraries
▪ U.S. -located conference and ErlLounges
39
(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
40

Weitere ähnliche Inhalte

Ähnlich wie Erlang at Facebook - How Facebook Chat was built using Erlang

Eugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat FacebookEugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat FacebookDario Salvelli
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibrarySebastian Andrasoni
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environmentEuropean Collaboration Summit
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Repeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirRepeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirBarry Jones
 
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel....NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...Karel Zikmund
 
Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14GABeech
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesNETWAYS
 
Building a Small Datacenter
Building a Small DatacenterBuilding a Small Datacenter
Building a Small Datacenterssuser4b98f0
 
2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar seriesOpen Mainframe Project
 
Building a Small DC
Building a Small DCBuilding a Small DC
Building a Small DCAPNIC
 
RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.Rainer Gerhards
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemAnnika Wickert
 
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Alan Quayle
 
ROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdfROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdfAmine Bendahmane
 
Devit - forget about http requests
Devit  -  forget about http requestsDevit  -  forget about http requests
Devit - forget about http requestsIrina Scurtu
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 

Ähnlich wie Erlang at Facebook - How Facebook Chat was built using Erlang (20)

Eugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat FacebookEugene Letuchy Erlangat Facebook
Eugene Letuchy Erlangat Facebook
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging Library
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Repeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirRepeating History...On Purpose...with Elixir
Repeating History...On Purpose...with Elixir
 
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel....NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel...
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
How we use Twisted in Launchpad
How we use Twisted in LaunchpadHow we use Twisted in Launchpad
How we use Twisted in Launchpad
 
Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
 
Building a Small Datacenter
Building a Small DatacenterBuilding a Small Datacenter
Building a Small Datacenter
 
2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series2020 oct zowe quarterly webinar series
2020 oct zowe quarterly webinar series
 
Building a Small DC
Building a Small DCBuilding a Small DC
Building a Small DC
 
RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.RSYSLOG v8 improvements and how to write plugins in any language.
RSYSLOG v8 improvements and how to write plugins in any language.
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference system
 
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
 
ROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdfROS - An open source platform for robotics software developers (lecture).pdf
ROS - An open source platform for robotics software developers (lecture).pdf
 
Devit - forget about http requests
Devit  -  forget about http requestsDevit  -  forget about http requests
Devit - forget about http requests
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 

Kürzlich hochgeladen

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Kürzlich hochgeladen (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Erlang at Facebook - How Facebook Chat was built using Erlang

  • 1. 1
  • 2. Erlang at Facebook Eugene Letuchy Apr 30, 2009 2
  • 3. 1 Facebook ... and Erlang 2 Story of Facebook Chat 3 Facebook Chat Architecture 4 Key Erlang Features 5 Then and Now Agenda 3
  • 4. Facebook ... and Erlang 4
  • 5. The Facebook Environment ▪ The Site ▪ More than 200 million active users ▪ More than 3.5 billion minutes are spent on Facebook each day ▪ Fewer than 900 employees ▪ The Engineering Team ▪ Fast iteration: code gets out to production within a week ▪ Polyglot programming: interoperability is key ▪ Practical: high-leverage tools win 5
  • 6. Erlang Projects ▪ Chat: the biggest and best known user ▪ AIM Presence: a JSONP validator ▪ Chat Jabber support (ejabberd) 6
  • 8. 2007: Facebook needs Chat Messages, Wall, Links aren’t enough 8
  • 9. Enter a Hackathon (Jan 2007) ▪ Chat started in one night of coding ▪ Floating conversation windows ▪ No buddy list ▪ One server (no distribution) ▪ Erlang was there! 9
  • 10. Enter Eugene (Feb 2007) ▪ I joined Facebook after Chat Hackathon ▪ What is this Erlang? ▪ Spring 2007: ▪ Learning Erlang from Joe Armstrong's thesis ▪ Lots of prototyping ▪ Evaluating infrastructure needs ▪ Summer 2007: ▪ Chris Piro works on Erlang Thrift bindings 10
  • 11. Let’s do this! ▪ Mid-Fall 2007: Chat becomes a “real” project ▪ 4 engineers, 0.5 designer ▪ Infrastructure components get built and improved ▪ Feb 2008: “Dark launch” testing begins ▪ Simulates load on the Erlang servers ... they hold up ▪ Apr 6, 2008: First real Chat message sent ▪ Apr 23, 2008: 100% rollout (Facebook has 70M users at the time) 11
  • 12. Launch: April 2008 ▪ Apr 6, 2008: gradual live rollout starts ▪ First message: "msn chat?" ▪ Apr 23, 2008: 100% rollout (to Facebook’s 70M users) ▪ Graph of sends in the first days of launch 0 3 6 9 12 15 Tue 00:00 12:00 Wed 00:00 12:00 millions of sends per hour 12
  • 13. Chat ... one year later ▪ Facebook has 200M active users ▪ 800+ million user messages / day ▪ 7+ million active channels at peak ▪ 1GB+ in / sec at peak ▪ 100+ channel machines ▪ ~9-10 times the work at launch; ~2 as many machines 13
  • 15. System challenges ▪ How does synchronous messaging work on the Web? ▪ “Presence” is hard to scale ▪ Need a system to queue and deliver messages ▪ Millions of connections, mostly idle ▪ Need logging, at least between page loads ▪ Make it work in Facebook’s environment 15
  • 17. System overview - User Interface Chat in the browser? ▪ Chat bar affixed to the bottom of each Facebook page ▪ Mix of client-side Javascript and server-side PHP ▪ Works around transport errors, browser differences ▪ Regular AJAX for sending messages, fetching conversation history ▪ Periodic AJAX polling for list of online friends ▪ AJAX long-polling for messages (Comet) 17
  • 18. System Overview - Back End How does the back end service requests? ▪ Discrete responsibilities for each service ▪ Communicate via Thrift ▪ Channel (Erlang): message queuing and delivery ▪ Queue messages in each user’s “channel” ▪ Deliver messages as responses to long-polling HTTP requests ▪ Presence (C++): aggregates online info in memory (pull-based presence) ▪ Chatlogger (C++): stores conversations between page loads ▪ Web tier (PHP): serves our vanilla web requests 18
  • 20. Message send Me: Lunch? Eugene: Lunch? 1 - ajax 2a - thrift 2b - thrift 3 - long poll 20
  • 22. Channel servers Architectural overview ▪ One channel per user ▪ Web tier delivers messages for that user ▪ Channel State: short queue of sequenced messages ▪ Long poll for streaming (Comet) ▪ Clients make an HTTP request ▪ Server replies when a message is ready ▪ One active request per browser tab 22
  • 24. Channel servers Architectural details ▪ Distributed design ▪ User id space is partitioned (division of labor) ▪ Each partition is serviced by a cluster (availability) ▪ Presence aggregation ▪ Channel servers are authoritative ▪ Periodically shipped to presence servers ▪ Open source: Erlang, Mochiweb, Thrift, Scribe, fb303,et al. 24
  • 25. Key Erlang Features we love 25
  • 26. Concurrency ▪ Cheap parallelism at massive scale ▪ Simplifies modeling concurrent interactions ▪ Chat users are independent and concurrent ▪ Mapping onto traditional OS threads is unnatural ▪ Locality of reference ▪ Bonus: carries over to non-Erlang concurrent programming 26
  • 27. Distribution ▪ Connected network of nodes ▪ Remote processes look like local processes ▪ Any node in a channel server cluster can route requests ▪ Naive load balancing ▪ Distributed Erlang works out-of-the-box (all nodes are trusted) 27
  • 28. Fault Isolation ▪ Bugs in the initial versions of Chat: ▪ Process leaks in the Thrift bindings ▪ Unintended multicasting of messages ▪ Bad return state for presence aggregators ▪ (Horrible) bugs don’t kill a mostly functional system: ▪ C/C++ segfault takes down the OS process and your server state ▪ Erlang badmatch takes down an Erlang process ▪ ... and notifies linked processes 28
  • 29. Error logging (Crash Reports) ▪ Any proc_lib-compliant process generates crash reports ▪ Error reports can be handled out of band (not where generated) ▪ Stacktraces point the way to bugs (functional languages win big here) ▪ ... but they could be improved with source line numbers ▪ Writing error_log handlers is simple: ▪ gen_event behavior ▪ Allows for massaging of the crash and error messages (binaries!) ▪ Thrift client in the error log ▪ WARNING: error logging can OOM the Erlang node 29
  • 30. Hot code swapping ▪ Restart-free upgrades are awesome (!) ▪ Pushing new functional code for Chat takes ~20 seconds ▪ No state is lost ▪ Test on a running system ▪ Provides a safety net ... rolling back bad code is easy ▪ NOTE: we don’t use the OTP release/upgrade strategies 30
  • 31. Monitoring and Error Recovery ▪ Supervision hierarchies ▪ Organize (and control) processes ▪ Organize thoughts ▪ Systematize restarts and error recovery ▪ simple_one_for_one for dynamic child processes ▪ net_kernel (Distributed Erlang) ▪ sends nodedown, nodeup messages ▪ any process can subscribe ▪ heart: monitors and restarts the OS process 31
  • 32. Remote Shell ▪ To invoke: > erl -name hidden -hidden -remsh <node_name> -setcookie <cookie> Eshell V5.7.1 (abort with ^G) (<node_name>)1> ▪ Ad-hoc inspection of a running node ▪ Command-and-control from a console ▪ Combines with hot code loading 32
  • 33. Erlang top (etop) ▪ Shows Erlang processes, sorted by reductions, memory and message queue ▪ OS functionality ... for free 33
  • 34. Hibernation ▪ Drastically shrink memory usage with erlang:hibernate/3 ▪ Throws away the call stack ▪ Minimizes the heap ▪ Enters a wait state for new messages ▪ “Jumps” into a passed-in function for a received message ▪ Perfect for a long-running, idling HTTP request handler ▪ But ... not compatible with gen_server:call (and gen_server:reply) ▪ gen_server:call has its own receive() loop ▪ hibernate() doesn’t support have an explicit timeout ▪ Fixed with a few hours and a look at gen.erl 34
  • 35. Symmetric MultiProcessing (SMP) ▪ Take advantage of multi-core servers ▪ erl -smp runs multiple scheduler threads inside the node ▪ SMP is emphasized in recent Erlang development ▪ Added to Erlang R11B ▪ Erlang R12B-0 through R13B include fixes and perf boosts ▪ Smart people have been optimizing our code for a year (!) ▪ Upgraded to R13B last night with about 1/3 less load 35
  • 36. hipe_bifs Cheating single assignment ▪ Erlang is opinionated: ▪ Destructive assignment is hard because it should be ▪ hipe_bifs:bytearray_update() allows for destructive array assignment ▪ Necessary for aggregating Chat users’ presence ▪ Don’t tell anyone! 36
  • 37. Then and now Erlang in Progress 37
  • 38. Then ... a steep learning curve ▪ Start of 2007: ▪ Few industry-focused English-language resources ▪ Few blogs (outside of Yariv’s and Joel Reymont’s) ▪ Code examples spread out and disorganized ▪ U.S. Erlang community limited in number and visibility 38
  • 39. Now ... ▪ Programming Erlang (Jun 2007) ▪ Erlang Programming (upcoming...) ▪ More blogs and blog aggregators: ▪ Planet Erlang, Planet TrapExit ▪ Erlang Factory aggregates Erlang developments ▪ More code available: ▪ GitHub, CEAN ▪ More general-purpose Open Source Libraries ▪ U.S. -located conference and ErlLounges 39
  • 40. (c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0 40