SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Building a Highly
    Scalable, Open
Source Twitter Clone
 Dan Diephouse (dan@netzooid.com)
  Paul Brown (prb@mult.ifario.us)
Motivation
★   Wide (and growing) variety of
    non-relational databases.
    (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk)


★   Twitter application model
    presents interesting
    challenges of scope and
    scale.
    (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
Storage Metaphors
★   Key/Value Store
    Opaque values; fast and simple.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt
    ★   Dynomite — http://bit.ly/12AYmf
    ★   Redis — http://bit.ly/LBtCh
    ★   Tokyo Tyrant — http://bit.ly/oU4uV
    ★   Voldemort – http://bit.ly/oU4uV
Key/Value
Key Value

1




2




3
Storage Metaphors
★   Document-Oriented
    Unstructured content; rich queries.
★   Examples:
    ★   CouchDB — http://bit.ly/JAgUM
    ★   MongoDB — http://bit.ly/HDDOV
    ★   SOLR — http://bit.ly/q4gyi
    ★   XML databases...
Document-Oriented
ID=“dan-tweet-1”,
TEXT=“hello world”

ID=dan-tweet-2,
TEXT=“Twirp!”,
IN-REPLY-TO=“paul-tweet-5”
Storage Metaphors
★   Column-Oriented

    Organized in columns; easily scanned.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt

    ★   BigTable — http://bit.ly/QqMYA
        (available within AppEngine)


    ★   HBase — http://bit.ly/Zck7F
    ★   SimpleDB — http://bit.ly/toh0P
        (Typica library for Java — http://bit.ly/22kxZ4)
Column-Oriented
    Name          Date                  Tweet Text
    Bob           20090506              Eating dinner.
    Dan           20090507              Is it Friday yet?
    Dan           20090506              Beer me!
    Ralph         20090508              My bum itches.



Index   Name         Index   Date         Index   Tweet Text
0       Bob          0       20090506     0       Eating dinner.
1       Dan          1       20090507     1       Is it Friday yet?
2       Dan          2       20090506     2       Beer me!
3       Ralph        3       20090508     3       My bum itches.

        Storage          Storage                  Storage
Every Store is Special.

★   Lots of different little tweaks
    to the storage model.
★   Widely varying levels of
    maturity.
★   Growing communities.
★   Limited (but growing) tooling,
    libraries, and production
    adoption.
Reliability Through
        Replication
★   Consistent hashing to assign
    keys to partitions.
★   Partitions replicated on
    multiple nodes for
    redundancy.
★   Minimum number of successful
    reads to consider a write
    complete.
Reliability Through
         Replication
    PUT (k,v)

Client
Web UI
http://tat1.datapr0n.com:8080
Stores
★   Tweets
    Individual tweets.

★   Friends’ Timeline
    Fixed-length timelines.

★   Users
    Info and followers.

★   Command Queue
    Actions to perform (tweet, follow, etc.).
Data
★   Command (Java serialization)
    Keyed by node name, increasing ID.
★   Tweets (Java serialization)
    Keyed by user name, increasing ID.
★   FriendsTimeline (Java serialization)
    Keyed by username.
    List of date, tweet ID.
★   Users (Java serialization)
    Keyed by username.
    Followers (list), Followed (list), last tweet ID.
Life of a Tweet, Part I
                 1

                     Beer me.                    Users
1.User tweets.                             2


2.Find next
  tweet ID for                             3   Commands




                                Web Tier
  user.
3.Store “tweet                                  Friends
                                                Timeline

  for user”
  command.
                                                Tweets
Life of a Tweet, Part II
                         Where's
1. Read next command.   Demi with
                                                          Users
                        my beer?!?
2. Store tweet in
   user’s timeline                              1
   (Tweets).                                            Commands

                                                    4




                                     Web Tier
3. Store tweet ID in
   friends’
                                                    3
   timelines.                                            Friends
                                                         Timeline
   (Requires *many*
   operations.)
                                                    2

4. DELETE command.                                       Tweets
Some Patterns
★   “Sequences” are implemented
    as race-for-non-collision.
★   “Joins” are common keys or
    keys referenced from values.
★   “Transactions” are idempotent
    operations with DELETE at the
    end.
Operations
★   Deploy to Amazon EC2
    ★   2 nodes for Voldemort
    ★   2 nodes for Tomcat
    ★   1 node for Cacti
★   All “small” instances w/RightScale CentOS
    5.2 image.
★   Minor inconvenience of “EBS” volume for
    MySQL for Cacti.
    (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
Deployment
★   Lots of choices for automated rollout
    (Chef, Capistrano, etc.)
★   Took simplest path — Maven build, Ant
    (scp/ssh and property substitution
    tasks), and bash scripts.
    for i in vn1 vn2; do

      ant -Dnode=${i} setup-v-node

    done

★   Takes ~30 seconds to provision a Tomcat
    or Voldemort node.
Dashboarding
★   As above, lots of choices
    (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.)


★   Cacti as simplest choice.
    yum install -y cacti

★   Vanilla SNMP on nodes for host
    data.
★   Minimal extensions to Voldemort
    for stats in Cacti-friendly
    format.
Dashboarding
Performance
★   270 req/sec for getFriendsTimeline against
    web tier.
    ★   21 GETs on V stores to pull data.
    ★   5600 req/sec for V is similar to
        performance reported at NoSQL meetup (20k
        req/sec) when adjusted for hardware.
    ★   Cache on the web tier could make this
        faster...
★   Some hassles when hammering individual keys
    with rapid updates.
Take Aways
★   Linked-list representation deserves some thought
    (and experiments).
    Dynomite + Osmos (http://bit.ly/BYMdW)

★   Additional use cases (search, rich API, replies,
    direct messages, etc.) might alter design.
★   BigTable/HBase approach deserves another look.
★   Source code is available; come and git it.

    http://github.com/prb/bigbird

    git://github.com/prb/bigbird.git
Coordinates
★   Dan Diephouse (@dandiep)
    dan@netzooid.com
    http://netzooid.com
★   Paul Brown (@paulrbrown)
    prb@mult.ifario.us
    http://mult.ifario.us/a

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
REST APIs with Spring
REST APIs with SpringREST APIs with Spring
REST APIs with SpringJoshua Long
 
The career path of software engineers and how to navigate it
The career path of software engineers and how to navigate itThe career path of software engineers and how to navigate it
The career path of software engineers and how to navigate itNikolay Stoitsev
 
Entity Framework Core
Entity Framework CoreEntity Framework Core
Entity Framework CoreKiran Shahi
 
Introduction to Django
Introduction to DjangoIntroduction to Django
Introduction to DjangoKnoldus Inc.
 
서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드KwangSeob Jeong
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
 
SecDevOps - The Operationalisation of Security
SecDevOps -  The Operationalisation of SecuritySecDevOps -  The Operationalisation of Security
SecDevOps - The Operationalisation of SecurityDinis Cruz
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowDatabricks
 
Introduction to Spring Boot
Introduction to Spring BootIntroduction to Spring Boot
Introduction to Spring BootTrey Howard
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patternsjoergreichert
 
Managing user's data with Spring Session
Managing user's data with Spring SessionManaging user's data with Spring Session
Managing user's data with Spring SessionDavid Gómez García
 
RESTful services
RESTful servicesRESTful services
RESTful servicesgouthamrv
 
코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)
코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)
코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)Suhyun Park
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx InternalsJoshua Zhu
 

Was ist angesagt? (20)

Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
REST APIs with Spring
REST APIs with SpringREST APIs with Spring
REST APIs with Spring
 
The career path of software engineers and how to navigate it
The career path of software engineers and how to navigate itThe career path of software engineers and how to navigate it
The career path of software engineers and how to navigate it
 
MVC Framework
MVC FrameworkMVC Framework
MVC Framework
 
Entity Framework Core
Entity Framework CoreEntity Framework Core
Entity Framework Core
 
Introduction to Django
Introduction to DjangoIntroduction to Django
Introduction to Django
 
Arquitetura Limpa em .NET Core
Arquitetura Limpa em .NET CoreArquitetura Limpa em .NET Core
Arquitetura Limpa em .NET Core
 
서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
SecDevOps - The Operationalisation of Security
SecDevOps -  The Operationalisation of SecuritySecDevOps -  The Operationalisation of Security
SecDevOps - The Operationalisation of Security
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
Introduction to Spring Boot
Introduction to Spring BootIntroduction to Spring Boot
Introduction to Spring Boot
 
JSON WEB TOKEN
JSON WEB TOKENJSON WEB TOKEN
JSON WEB TOKEN
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patterns
 
Managing user's data with Spring Session
Managing user's data with Spring SessionManaging user's data with Spring Session
Managing user's data with Spring Session
 
RESTful services
RESTful servicesRESTful services
RESTful services
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)
코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)
코딩 테스트 및 알고리즘 문제해결 공부 방법 (고려대학교 KUCC, 2022년 4월)
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
 

Ähnlich wie Building a Highly Scalable, Open Source Twitter Clone

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeDavid Boike
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdfYogeshwaran R
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMGuillaume Arnaud
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF WorkshopIdo Flatow
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Rich Bowen
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central DispatchRobert Brown
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalHoward Glynn
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialTom Croucher
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesPeter Hlavaty
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshynvtors
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task QueueRichard Leland
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Bostonbenbrowning
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDocker, Inc.
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitationegypt
 

Ähnlich wie Building a Highly Scalable, Open Source Twitter Clone (20)

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught Me
 
IP Multicast on ec2
IP Multicast on ec2IP Multicast on ec2
IP Multicast on ec2
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdf
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
spdy
spdyspdy
spdy
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central Dispatch
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 final
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js Tutorial
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshyn
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
 
Demystfying container-networking
Demystfying container-networkingDemystfying container-networking
Demystfying container-networking
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for Beginners
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitation
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Building a Highly Scalable, Open Source Twitter Clone

  • 1. Building a Highly Scalable, Open Source Twitter Clone Dan Diephouse (dan@netzooid.com) Paul Brown (prb@mult.ifario.us)
  • 2. Motivation ★ Wide (and growing) variety of non-relational databases. (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk) ★ Twitter application model presents interesting challenges of scope and scale. (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
  • 3. Storage Metaphors ★ Key/Value Store Opaque values; fast and simple. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ Dynomite — http://bit.ly/12AYmf ★ Redis — http://bit.ly/LBtCh ★ Tokyo Tyrant — http://bit.ly/oU4uV ★ Voldemort – http://bit.ly/oU4uV
  • 5. Storage Metaphors ★ Document-Oriented Unstructured content; rich queries. ★ Examples: ★ CouchDB — http://bit.ly/JAgUM ★ MongoDB — http://bit.ly/HDDOV ★ SOLR — http://bit.ly/q4gyi ★ XML databases...
  • 7. Storage Metaphors ★ Column-Oriented Organized in columns; easily scanned. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ BigTable — http://bit.ly/QqMYA (available within AppEngine) ★ HBase — http://bit.ly/Zck7F ★ SimpleDB — http://bit.ly/toh0P (Typica library for Java — http://bit.ly/22kxZ4)
  • 8. Column-Oriented Name Date Tweet Text Bob 20090506 Eating dinner. Dan 20090507 Is it Friday yet? Dan 20090506 Beer me! Ralph 20090508 My bum itches. Index Name Index Date Index Tweet Text 0 Bob 0 20090506 0 Eating dinner. 1 Dan 1 20090507 1 Is it Friday yet? 2 Dan 2 20090506 2 Beer me! 3 Ralph 3 20090508 3 My bum itches. Storage Storage Storage
  • 9. Every Store is Special. ★ Lots of different little tweaks to the storage model. ★ Widely varying levels of maturity. ★ Growing communities. ★ Limited (but growing) tooling, libraries, and production adoption.
  • 10. Reliability Through Replication ★ Consistent hashing to assign keys to partitions. ★ Partitions replicated on multiple nodes for redundancy. ★ Minimum number of successful reads to consider a write complete.
  • 11. Reliability Through Replication PUT (k,v) Client
  • 13. Stores ★ Tweets Individual tweets. ★ Friends’ Timeline Fixed-length timelines. ★ Users Info and followers. ★ Command Queue Actions to perform (tweet, follow, etc.).
  • 14. Data ★ Command (Java serialization) Keyed by node name, increasing ID. ★ Tweets (Java serialization) Keyed by user name, increasing ID. ★ FriendsTimeline (Java serialization) Keyed by username. List of date, tweet ID. ★ Users (Java serialization) Keyed by username. Followers (list), Followed (list), last tweet ID.
  • 15. Life of a Tweet, Part I 1 Beer me. Users 1.User tweets. 2 2.Find next tweet ID for 3 Commands Web Tier user. 3.Store “tweet Friends Timeline for user” command. Tweets
  • 16. Life of a Tweet, Part II Where's 1. Read next command. Demi with Users my beer?!? 2. Store tweet in user’s timeline 1 (Tweets). Commands 4 Web Tier 3. Store tweet ID in friends’ 3 timelines. Friends Timeline (Requires *many* operations.) 2 4. DELETE command. Tweets
  • 17. Some Patterns ★ “Sequences” are implemented as race-for-non-collision. ★ “Joins” are common keys or keys referenced from values. ★ “Transactions” are idempotent operations with DELETE at the end.
  • 18. Operations ★ Deploy to Amazon EC2 ★ 2 nodes for Voldemort ★ 2 nodes for Tomcat ★ 1 node for Cacti ★ All “small” instances w/RightScale CentOS 5.2 image. ★ Minor inconvenience of “EBS” volume for MySQL for Cacti. (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
  • 19. Deployment ★ Lots of choices for automated rollout (Chef, Capistrano, etc.) ★ Took simplest path — Maven build, Ant (scp/ssh and property substitution tasks), and bash scripts. for i in vn1 vn2; do ant -Dnode=${i} setup-v-node done ★ Takes ~30 seconds to provision a Tomcat or Voldemort node.
  • 20. Dashboarding ★ As above, lots of choices (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.) ★ Cacti as simplest choice. yum install -y cacti ★ Vanilla SNMP on nodes for host data. ★ Minimal extensions to Voldemort for stats in Cacti-friendly format.
  • 22. Performance ★ 270 req/sec for getFriendsTimeline against web tier. ★ 21 GETs on V stores to pull data. ★ 5600 req/sec for V is similar to performance reported at NoSQL meetup (20k req/sec) when adjusted for hardware. ★ Cache on the web tier could make this faster... ★ Some hassles when hammering individual keys with rapid updates.
  • 23. Take Aways ★ Linked-list representation deserves some thought (and experiments). Dynomite + Osmos (http://bit.ly/BYMdW) ★ Additional use cases (search, rich API, replies, direct messages, etc.) might alter design. ★ BigTable/HBase approach deserves another look. ★ Source code is available; come and git it. http://github.com/prb/bigbird git://github.com/prb/bigbird.git
  • 24. Coordinates ★ Dan Diephouse (@dandiep) dan@netzooid.com http://netzooid.com ★ Paul Brown (@paulrbrown) prb@mult.ifario.us http://mult.ifario.us/a