SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Billions of Hits:
Scaling Twitter
John Adams
Twitter Operations
#chirpscale
John Adams                            @netik
•   Early Twitter employee (mid-2008)

•   Lead engineer: Outward Facing Services (Apache,
    Unicorn, SMTP), Auth, Security

•   Keynote Speaker: O’Reilly Velocity 2009

•   O’Reilly Web 2.0 Speaker (2008, 2010)

•   Previous companies: Inktomi, Apple, c|net

•   Working on Web Operations book with John Alspaw
    (flickr, etsy), out in June
Growth.
752%
                       2008 Growth
source: comscore.com - (based only on www traffic, not API)
1358%
                       2009 Growth
source: comscore.com - (based only on www traffic, not API)
12 th
                    most popular
source: alexa.com
55M
                   Tweets per day
                  (640 TPS/sec, 1000 TPS/sec peak)
source: twitter.com internal
600M
                   Searches/Day
source: twitter.com internal
25%




Web               API
            75%
Operations
•   What do we do?

    •   Site Availability

    •   Capacity Planning (metrics-driven)

    •   Configuration Management

    •   Security

    •   Much more than basic Sysadmin
What have we done?
•   Improved response time, reduced latency

•   Less errors during deploys (Unicorn!)

•   Faster performance

•   Lower MTTD (Mean time to Detect)

•   Lower MTTR (Mean time to Recovery)
Operations Mantra

                                  Move to
     Find            Take
                                   Next
    Weakest        Corrective
                                  Weakest
     Point          Action
                                   Point


   Metrics +
Logs + Science =    Process     Repeatability
   Analysis
Make an attack plan.
 Symptom    Bottleneck   Vector     Solution

                          HTTP
Bandwidth   Network                Servers++
                         Latency
 Timeline                Update      Better
            Database
  Delay                   Delay    algorithm
  Status                             Flock
            Database     Delays
 Growth                            Cassandra
 Updates    Algorithm    Latency   Algorithms
Finding Weakness
•   Metrics + Graphs

    •   Individual metrics are irrelevant

    •   We aggregate metrics to find knowledge

•   Logs

•   SCIENCE!
Monitoring
•   Twitter graphs and reports critical metrics in
    as near real time as possible

•   If you build tools against our API, you should
    too.

    •   RRD, other Time-Series DB solutions

    •   Ganglia + custom gmetric scripts

•   dev.twitter.com - API availability
Analyze
•   Turn data into information

    •   Where is the code base going?

    •   Are things worse than they were?

        •   Understand the impact of the last software
            deploy

        •   Run check scripts during and after deploys

•   Capacity Planning, not Fire Fighting!
Data Analysis
•   Instrumenting the world pays off.

•   “Data analysis, visualization, and other
    techniques for seeing patterns in data are
    going to be an increasingly valuable skill set.
    Employers take notice!”
          “Web Squared: Web 2.0 Five Years On”, Tim O’Reilly, Web 2.0 Summit, 2009
Forecasting             Curve-fitting for capacity planning
                        (R, fityk, Mathematica, CurveFit)



              unsigned int (32 bit)
                Twitpocolypse



  status_id

                                      signed int (32 bit)
                                        Twitpocolypse




                                                  r2=0.99
Internal Dashboard
External API Dashbord




   http://dev.twitter.com/status
What’s a Robot ?
•   Actual error in the Rails stack (HTTP 500)

•   Uncaught Exception

•   Code problem, or failure / nil result

•   Increases our exception count

•   Shows up in Reports
What’s a Whale ?
•   HTTP Error 502, 503

•   Twitter has a hard and fast five second timeout

•   We’d rather fail fast than block on requests

•   We also kill long-running queries (mkill)

•   Timeout
Whale Watcher
•   Simple shell script,

    •   MASSIVE WIN by @ronpepsi

•   Whale = HTTP 503 (timeout)

•   Robot = HTTP 500 (error)

•   Examines last 60 seconds of
    aggregated daemon / www logs

•   “Whales per Second” > Wthreshold

    •   Thar be whales! Call in ops.
Deploy Watcher
Sample window: 300.0 seconds

First start time:
Mon Apr 5 15:30:00 2010 (Mon Apr   5 08:30:00 PDT 2010)
Second start time:
Tue Apr 6 02:09:40 2010 (Mon Apr   5 19:09:40 PDT 2010)

PRODUCTION APACHE: ALL OK
PRODUCTION OTHER: ALL OK
WEB0049 CANARY APACHE: ALL OK
WEB0049 CANARY BACKEND SERVICES: ALL OK
DAEMON0031 CANARY BACKEND SERVICES: ALL OK
DAEMON0031 CANARY OTHER: ALL OK
Feature “Darkmode”
•   Specific site controls to enable and disable
    computationally or IO-Heavy site function

•   The “Emergency Stop” button

•   Changes logged and reported to all teams

•   Around 60 switches we can throw

•   Static / Read-only mode
request flow
           Load Balancers

         Apache mod_proxy

           Rails (Unicorn)

 Flock      memcached        Kestrel

         MySQL      Cassandra

             Daemons
Servers
•   Co-located, dedicated machines at NTT America

•   No clouds; Only for monitoring, not serving

    •   Need raw processing power, latency too high
        in existing cloud offerings

•   Frees us to deal with real, intellectual, computer
    science problems.

•   Moving to our own data center soon
unicorn
•   A single socket Rails application Server (Rack)

•   Zero Downtime Deploys (!)

    •   Controlled, shuffled transfer to new code

•   Less memory, 30% less CPU

•   Shift from mod_proxy_balancer to
    mod_proxy_pass

    •   HAProxy, Ngnix wasn’t any better. really.
Rails
•   Mostly only for front-end.

•   Back end mostly Scala and pure ruby

•   Not to blame for our issues. Analysis found:

    •   Caching + Cache invalidation problems

    •   Bad queries generated by ActiveRecord, resulting in
        slow queries against the db

    •   Queue Latency

•   Replication Lag
memcached
•   memcached isn’t perfect.

    •   Memcached SEGVs hurt us early on.

•   Evictions make the cache unreliable for
    important configuration data
    (loss of darkmode flags, for example)

•   Network Memory Bus isn’t infinite

•   Segmented into pools for better performance
Loony
•   Central machine database (MySQL)

    •   Python, Django, Paraminko SSH

        •   Paraminko - Twitter OSS (@robey)

    •   Ties into LDAP groups

•   When data center sends us email, machine
    definitions built in real-time
Murder
•   @lg rocks!

•   Bittorrent based replication for deploys

•   ~30-60 seconds to update >1k machines

•   P2P - Legal, valid, Awesome.
Kestrel
•   @robey

•   Works like memcache (same protocol)

•   SET = enqueue | GET = dequeue

•   No strict ordering of jobs

•   No shared state between servers

•   Written in Scala.
Asynchronous Requests
•   Inbound traffic consumes a unicorn worker

•   Outbound traffic consumes a unicorn worker

•   The request pipeline should not be used to
    handle 3rd party communications or
    back-end work.

•   Reroute traffic to daemons
Daemons
•   Daemons touch every tweet

•   Many different daemon types at Twitter

•   Old way: One daemon per type (Rails)

    •   New way: Fewer Daemons (Pure Ruby)

•   Daemon Slayer - A Multi Daemon that could
    do many different jobs, all at once.
Disk is the new Tape.
•   Social Networking application profile has
    many O(ny) operations.

•   Page requests have to happen in < 500mS or
    users start to notice. Goal: 250-300mS

•   Web 2.0 isn’t possible without lots of RAM

•   SSDs? What to do?
Caching
•   We’re the real-time web, but lots of caching
    opportunity. You should cache what you get from us.

•   Most caching strategies rely on long TTLs (>60 s)

•   Separate memcache pools for different data types to
    prevent eviction

•   Optimize Ruby Gem to libmemcached + FNV Hash
    instead of Ruby + MD5

•   Twitter now largest contributor to libmemcached
MySQL
•   Sharding large volumes of data is hard

•   Replication delay and cache eviction produce
    inconsistent results to the end user.

•   Locks create resource contention for popular
    data
MySQL Challenges
•   Replication Delay

    •   Single threaded. Slow.

•   Social Networking not good for RDBMS

    •   N x N relationships and social graph / tree
        traversal

    •   Disk issues (FS Choice, noatime, scheduling
        algorithm)
Relational Databases
not a Panacea
•   Good for:

    •   Users, Relational Data, Transactions

•   Bad:

    •   Queues. Polling operations. Social Graph.

•   You don’t need ACID for everything.
Database Replication
•   Major issues around users and statuses tables

•   Multiple functional masters (FRP, FWP)

•   Make sure your code reads and writes to the
    write DBs. Reading from master = slow death

    •   Monitor the DB. Find slow / poorly designed
        queries

•   Kill long running queries before they kill you
    (mkill)
Flock
                                          Flock
•   Scalable Social Graph Store

•   Sharding via Gizzard
                                          Gizzard
•   MySQL backend (many.)

•   13 billion edges,
    100K reads/second
                                  Mysql   Mysql     Mysql
•   Open Source!
Cassandra
•   Originally written by Facebook

•   Distributed Data Store

•   @rk’s changes to Cassandra Open Sourced

•   Currently double-writing into it

•   Transitioning to 100% soon.
Lessons Learned
•   Instrument everything. Start graphing early.

•   Cache as much as possible

•   Start working on scaling early.

•   Don’t rely on memcache, and don’t rely on the
    database

•   Don’t use mongrel. Use Unicorn.
Join Us!
@jointheflock
Q&A
Thanks!
•   @jointheflock

•   http://twitter.com/jobs

•   Download our work

    •   http://twitter.com/about/opensource

Weitere ähnliche Inhalte

Was ist angesagt?

Red Teaming macOS Environments with Hermes the Swift Messenger
Red Teaming macOS Environments with Hermes the Swift MessengerRed Teaming macOS Environments with Hermes the Swift Messenger
Red Teaming macOS Environments with Hermes the Swift MessengerJustin Bui
 
Various Types of OpenSSL Commands and Keytool
Various Types of OpenSSL Commands and KeytoolVarious Types of OpenSSL Commands and Keytool
Various Types of OpenSSL Commands and KeytoolCheapSSLsecurity
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowDatabricks
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016Vlad Mihalcea
 
시스템 보안에 대해 최종본
시스템 보안에 대해   최종본시스템 보안에 대해   최종본
시스템 보안에 대해 최종본승표 홍
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing FrameworksSirKetchup
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015Luigi Dell'Aquila
 
Vulnerabilities in modern web applications
Vulnerabilities in modern web applicationsVulnerabilities in modern web applications
Vulnerabilities in modern web applicationsNiyas Nazar
 
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축Juhong Park
 
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019Kenneth Ceyer
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking VN
 
Sync, async and multithreading
Sync, async and multithreadingSync, async and multithreading
Sync, async and multithreadingTuan Chau
 
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMState: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMJonas Bonér
 
Advanced JavaScript
Advanced JavaScriptAdvanced JavaScript
Advanced JavaScriptNascenia IT
 
Low level java programming
Low level java programmingLow level java programming
Low level java programmingPeter Lawrey
 
MongoDB 모바일 게임 개발에 사용
MongoDB 모바일 게임 개발에 사용MongoDB 모바일 게임 개발에 사용
MongoDB 모바일 게임 개발에 사용흥배 최
 
今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介
今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介
今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介Kenichiro Nakamura
 

Was ist angesagt? (20)

Logstash
LogstashLogstash
Logstash
 
Red Teaming macOS Environments with Hermes the Swift Messenger
Red Teaming macOS Environments with Hermes the Swift MessengerRed Teaming macOS Environments with Hermes the Swift Messenger
Red Teaming macOS Environments with Hermes the Swift Messenger
 
Various Types of OpenSSL Commands and Keytool
Various Types of OpenSSL Commands and KeytoolVarious Types of OpenSSL Commands and Keytool
Various Types of OpenSSL Commands and Keytool
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016High-Performance Hibernate Devoxx France 2016
High-Performance Hibernate Devoxx France 2016
 
시스템 보안에 대해 최종본
시스템 보안에 대해   최종본시스템 보안에 대해   최종본
시스템 보안에 대해 최종본
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
 
Vulnerabilities in modern web applications
Vulnerabilities in modern web applicationsVulnerabilities in modern web applications
Vulnerabilities in modern web applications
 
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
 
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019
AI 연구자를 위한 클린코드 - GDG DevFest Seoul 2019
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
 
Sync, async and multithreading
Sync, async and multithreadingSync, async and multithreading
Sync, async and multithreading
 
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMState: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
 
Advanced JavaScript
Advanced JavaScriptAdvanced JavaScript
Advanced JavaScript
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 
MongoDB 모바일 게임 개발에 사용
MongoDB 모바일 게임 개발에 사용MongoDB 모바일 게임 개발에 사용
MongoDB 모바일 게임 개발에 사용
 
今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介
今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介
今更聞けない!?Microsoft Graph で始める Office 365 データ活用と事例の紹介
 

Andere mochten auch

Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 
Embracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from AlibabaEmbracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from AlibabaWensong Zhang
 
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Jon Watte
 
Us history shaping a new nation
Us history shaping a new nationUs history shaping a new nation
Us history shaping a new nationMrO97
 
LVS development and experience
LVS development and experienceLVS development and experience
LVS development and experienceWensong Zhang
 
3. shaping a new nation [1782 1788]
3. shaping a new nation [1782 1788]3. shaping a new nation [1782 1788]
3. shaping a new nation [1782 1788]jtoma84
 
Product design: How to create a product
Product design: How to create a productProduct design: How to create a product
Product design: How to create a productPress42
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with CassandraRyan King
 
Washington Presidency
Washington PresidencyWashington Presidency
Washington Presidencymrsvogel
 
The presidency of george washingtion ppt for notes
The presidency of george washingtion ppt for notesThe presidency of george washingtion ppt for notes
The presidency of george washingtion ppt for notesMatthew Fulghum
 
Lesson Plan_Us History_the birth of a new nation
Lesson Plan_Us History_the birth of a new nationLesson Plan_Us History_the birth of a new nation
Lesson Plan_Us History_the birth of a new nationscott severance
 
The presidency of john adams
The presidency of john adamsThe presidency of john adams
The presidency of john adamsAllison Barnette
 
The First Five Presidents of the United States
The First Five Presidents of the United StatesThe First Five Presidents of the United States
The First Five Presidents of the United Statesmentzers
 
American History - Chapter 6
American History - Chapter 6American History - Chapter 6
American History - Chapter 6Alison Kurtz
 
Adams To Jefferson
Adams To JeffersonAdams To Jefferson
Adams To JeffersonJames Henry
 

Andere mochten auch (20)

Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
Embracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from AlibabaEmbracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from Alibaba
 
Load balancing at tuenti
Load balancing at tuentiLoad balancing at tuenti
Load balancing at tuenti
 
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
 
Xyz affair
Xyz affairXyz affair
Xyz affair
 
Us history shaping a new nation
Us history shaping a new nationUs history shaping a new nation
Us history shaping a new nation
 
Thomas jefferson
Thomas jeffersonThomas jefferson
Thomas jefferson
 
LVS development and experience
LVS development and experienceLVS development and experience
LVS development and experience
 
3. shaping a new nation [1782 1788]
3. shaping a new nation [1782 1788]3. shaping a new nation [1782 1788]
3. shaping a new nation [1782 1788]
 
Product design: How to create a product
Product design: How to create a productProduct design: How to create a product
Product design: How to create a product
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
 
Washington Presidency
Washington PresidencyWashington Presidency
Washington Presidency
 
The presidency of george washingtion ppt for notes
The presidency of george washingtion ppt for notesThe presidency of george washingtion ppt for notes
The presidency of george washingtion ppt for notes
 
Lesson Plan_Us History_the birth of a new nation
Lesson Plan_Us History_the birth of a new nationLesson Plan_Us History_the birth of a new nation
Lesson Plan_Us History_the birth of a new nation
 
The presidency of john adams
The presidency of john adamsThe presidency of john adams
The presidency of john adams
 
The First Five Presidents of the United States
The First Five Presidents of the United StatesThe First Five Presidents of the United States
The First Five Presidents of the United States
 
John adams presidency ppt
John adams presidency pptJohn adams presidency ppt
John adams presidency ppt
 
American History - Chapter 6
American History - Chapter 6American History - Chapter 6
American History - Chapter 6
 
Adams To Jefferson
Adams To JeffersonAdams To Jefferson
Adams To Jefferson
 
Chapter 10 Sections 1 & 2
Chapter 10 Sections 1 & 2Chapter 10 Sections 1 & 2
Chapter 10 Sections 1 & 2
 

Ähnlich wie Chirp 2010: Scaling Twitter

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Velocity2009
Fixing Twitter Velocity2009Fixing Twitter Velocity2009
Fixing Twitter Velocity2009John Adams
 
.Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20...
.Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20....Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20...
.Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20...Javier García Magna
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profitRodrigo Campos
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Hacklu2011 tricaud
Hacklu2011 tricaudHacklu2011 tricaud
Hacklu2011 tricaudstricaud
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Asynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per secondAsynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per secondStuart (Pid) Williams
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scaleOvais Tariq
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
In Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionIn Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionAdaryl "Bob" Wakefield, MBA
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 

Ähnlich wie Chirp 2010: Scaling Twitter (20)

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Velocity2009
Fixing Twitter Velocity2009Fixing Twitter Velocity2009
Fixing Twitter Velocity2009
 
.Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20...
.Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20....Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20...
.Net Microservices with Event Sourcing, CQRS, Docker and... Windows Server 20...
 
Dibi Conference 2012
Dibi Conference 2012Dibi Conference 2012
Dibi Conference 2012
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Hacklu2011 tricaud
Hacklu2011 tricaudHacklu2011 tricaud
Hacklu2011 tricaud
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 
Asynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per secondAsynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per second
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
In Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionIn Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics Solution
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 

Kürzlich hochgeladen

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Chirp 2010: Scaling Twitter

  • 1.
  • 2. Billions of Hits: Scaling Twitter John Adams Twitter Operations
  • 4. John Adams @netik • Early Twitter employee (mid-2008) • Lead engineer: Outward Facing Services (Apache, Unicorn, SMTP), Auth, Security • Keynote Speaker: O’Reilly Velocity 2009 • O’Reilly Web 2.0 Speaker (2008, 2010) • Previous companies: Inktomi, Apple, c|net • Working on Web Operations book with John Alspaw (flickr, etsy), out in June
  • 6. 752% 2008 Growth source: comscore.com - (based only on www traffic, not API)
  • 7. 1358% 2009 Growth source: comscore.com - (based only on www traffic, not API)
  • 8. 12 th most popular source: alexa.com
  • 9. 55M Tweets per day (640 TPS/sec, 1000 TPS/sec peak) source: twitter.com internal
  • 10. 600M Searches/Day source: twitter.com internal
  • 11. 25% Web API 75%
  • 12. Operations • What do we do? • Site Availability • Capacity Planning (metrics-driven) • Configuration Management • Security • Much more than basic Sysadmin
  • 13. What have we done? • Improved response time, reduced latency • Less errors during deploys (Unicorn!) • Faster performance • Lower MTTD (Mean time to Detect) • Lower MTTR (Mean time to Recovery)
  • 14. Operations Mantra Move to Find Take Next Weakest Corrective Weakest Point Action Point Metrics + Logs + Science = Process Repeatability Analysis
  • 15. Make an attack plan. Symptom Bottleneck Vector Solution HTTP Bandwidth Network Servers++ Latency Timeline Update Better Database Delay Delay algorithm Status Flock Database Delays Growth Cassandra Updates Algorithm Latency Algorithms
  • 16. Finding Weakness • Metrics + Graphs • Individual metrics are irrelevant • We aggregate metrics to find knowledge • Logs • SCIENCE!
  • 17. Monitoring • Twitter graphs and reports critical metrics in as near real time as possible • If you build tools against our API, you should too. • RRD, other Time-Series DB solutions • Ganglia + custom gmetric scripts • dev.twitter.com - API availability
  • 18. Analyze • Turn data into information • Where is the code base going? • Are things worse than they were? • Understand the impact of the last software deploy • Run check scripts during and after deploys • Capacity Planning, not Fire Fighting!
  • 19. Data Analysis • Instrumenting the world pays off. • “Data analysis, visualization, and other techniques for seeing patterns in data are going to be an increasingly valuable skill set. Employers take notice!” “Web Squared: Web 2.0 Five Years On”, Tim O’Reilly, Web 2.0 Summit, 2009
  • 20. Forecasting Curve-fitting for capacity planning (R, fityk, Mathematica, CurveFit) unsigned int (32 bit) Twitpocolypse status_id signed int (32 bit) Twitpocolypse r2=0.99
  • 22. External API Dashbord http://dev.twitter.com/status
  • 23. What’s a Robot ? • Actual error in the Rails stack (HTTP 500) • Uncaught Exception • Code problem, or failure / nil result • Increases our exception count • Shows up in Reports
  • 24. What’s a Whale ? • HTTP Error 502, 503 • Twitter has a hard and fast five second timeout • We’d rather fail fast than block on requests • We also kill long-running queries (mkill) • Timeout
  • 25. Whale Watcher • Simple shell script, • MASSIVE WIN by @ronpepsi • Whale = HTTP 503 (timeout) • Robot = HTTP 500 (error) • Examines last 60 seconds of aggregated daemon / www logs • “Whales per Second” > Wthreshold • Thar be whales! Call in ops.
  • 26. Deploy Watcher Sample window: 300.0 seconds First start time: Mon Apr 5 15:30:00 2010 (Mon Apr 5 08:30:00 PDT 2010) Second start time: Tue Apr 6 02:09:40 2010 (Mon Apr 5 19:09:40 PDT 2010) PRODUCTION APACHE: ALL OK PRODUCTION OTHER: ALL OK WEB0049 CANARY APACHE: ALL OK WEB0049 CANARY BACKEND SERVICES: ALL OK DAEMON0031 CANARY BACKEND SERVICES: ALL OK DAEMON0031 CANARY OTHER: ALL OK
  • 27. Feature “Darkmode” • Specific site controls to enable and disable computationally or IO-Heavy site function • The “Emergency Stop” button • Changes logged and reported to all teams • Around 60 switches we can throw • Static / Read-only mode
  • 28. request flow Load Balancers Apache mod_proxy Rails (Unicorn) Flock memcached Kestrel MySQL Cassandra Daemons
  • 29. Servers • Co-located, dedicated machines at NTT America • No clouds; Only for monitoring, not serving • Need raw processing power, latency too high in existing cloud offerings • Frees us to deal with real, intellectual, computer science problems. • Moving to our own data center soon
  • 30. unicorn • A single socket Rails application Server (Rack) • Zero Downtime Deploys (!) • Controlled, shuffled transfer to new code • Less memory, 30% less CPU • Shift from mod_proxy_balancer to mod_proxy_pass • HAProxy, Ngnix wasn’t any better. really.
  • 31. Rails • Mostly only for front-end. • Back end mostly Scala and pure ruby • Not to blame for our issues. Analysis found: • Caching + Cache invalidation problems • Bad queries generated by ActiveRecord, resulting in slow queries against the db • Queue Latency • Replication Lag
  • 32. memcached • memcached isn’t perfect. • Memcached SEGVs hurt us early on. • Evictions make the cache unreliable for important configuration data (loss of darkmode flags, for example) • Network Memory Bus isn’t infinite • Segmented into pools for better performance
  • 33. Loony • Central machine database (MySQL) • Python, Django, Paraminko SSH • Paraminko - Twitter OSS (@robey) • Ties into LDAP groups • When data center sends us email, machine definitions built in real-time
  • 34. Murder • @lg rocks! • Bittorrent based replication for deploys • ~30-60 seconds to update >1k machines • P2P - Legal, valid, Awesome.
  • 35. Kestrel • @robey • Works like memcache (same protocol) • SET = enqueue | GET = dequeue • No strict ordering of jobs • No shared state between servers • Written in Scala.
  • 36. Asynchronous Requests • Inbound traffic consumes a unicorn worker • Outbound traffic consumes a unicorn worker • The request pipeline should not be used to handle 3rd party communications or back-end work. • Reroute traffic to daemons
  • 37. Daemons • Daemons touch every tweet • Many different daemon types at Twitter • Old way: One daemon per type (Rails) • New way: Fewer Daemons (Pure Ruby) • Daemon Slayer - A Multi Daemon that could do many different jobs, all at once.
  • 38. Disk is the new Tape. • Social Networking application profile has many O(ny) operations. • Page requests have to happen in < 500mS or users start to notice. Goal: 250-300mS • Web 2.0 isn’t possible without lots of RAM • SSDs? What to do?
  • 39. Caching • We’re the real-time web, but lots of caching opportunity. You should cache what you get from us. • Most caching strategies rely on long TTLs (>60 s) • Separate memcache pools for different data types to prevent eviction • Optimize Ruby Gem to libmemcached + FNV Hash instead of Ruby + MD5 • Twitter now largest contributor to libmemcached
  • 40. MySQL • Sharding large volumes of data is hard • Replication delay and cache eviction produce inconsistent results to the end user. • Locks create resource contention for popular data
  • 41. MySQL Challenges • Replication Delay • Single threaded. Slow. • Social Networking not good for RDBMS • N x N relationships and social graph / tree traversal • Disk issues (FS Choice, noatime, scheduling algorithm)
  • 42. Relational Databases not a Panacea • Good for: • Users, Relational Data, Transactions • Bad: • Queues. Polling operations. Social Graph. • You don’t need ACID for everything.
  • 43. Database Replication • Major issues around users and statuses tables • Multiple functional masters (FRP, FWP) • Make sure your code reads and writes to the write DBs. Reading from master = slow death • Monitor the DB. Find slow / poorly designed queries • Kill long running queries before they kill you (mkill)
  • 44. Flock Flock • Scalable Social Graph Store • Sharding via Gizzard Gizzard • MySQL backend (many.) • 13 billion edges, 100K reads/second Mysql Mysql Mysql • Open Source!
  • 45. Cassandra • Originally written by Facebook • Distributed Data Store • @rk’s changes to Cassandra Open Sourced • Currently double-writing into it • Transitioning to 100% soon.
  • 46. Lessons Learned • Instrument everything. Start graphing early. • Cache as much as possible • Start working on scaling early. • Don’t rely on memcache, and don’t rely on the database • Don’t use mongrel. Use Unicorn.
  • 48. Q&A
  • 49. Thanks! • @jointheflock • http://twitter.com/jobs • Download our work • http://twitter.com/about/opensource