SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Downloaden Sie, um offline zu lesen
Big Data in Real-Time
at Twitter
by Nick Kallen (@nk)
Follow along
http://www.slideshare.net/nkallen/qcon
What is Real-Time Data?
• On-line queries for a single web request
• Off-line computations with very low latency
• Latency and throughput are equally important
• Not talking about Hadoop and other high-latency,
Big Data tools
The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
What is a Tweet?
• 140 character message, plus some metadata
• Query patterns:
  • by id
  • by author
  • (also @replies, but not discussed here)
• Row Storage
Find by primary key: 4376167936
Find all by user_id: 749863
Original Implementation
id      user_id                  text                       created_at

20        12          just setting up my twttr          2006-03-21 20:50:14

29        12             inviting coworkers             2006-03-21 21:02:56

34        16      Oh shit, I just twittered a little.   2006-03-21 21:08:09



     • Relational
     • Single table, vertically scaled
     • Master-Slave replication and Memcached for
     read throughput.
Original Implementation
 Master-Slave Replication   Memcached for reads
Problems w/ solution
• Disk space: did not want to support disk arrays larger
than 800GB
• At 2,954,291,678 tweets, disk was over 90% utilized.
PARTITION
Possible implementations
           Partition by primary key
        Partition 1             Partition 2

   id         user_id      id         user_id

   20            ...      21             ...

   22            ...      23             ...

   24            ...      25             ...
Possible implementations
           Partition by primary key
        Partition 1               Partition 2

   id         user_id        id         user_id

   20            ...         21            ...

   22            ...         23            ...

   24            ...    Finding recent tweets
                              25     ...

                        by user_id queries N
                        partitions
Possible implementations
                Partition by user id
         Partition 1               Partition 2

   id          user_id        id         user_id

   ...             1          21             2

   ...             1          23             2

   ...             3          25             2
Possible implementations
                Partition by user id
         Partition 1               Partition 2

   id          user_id        id         user_id

   ...             1          21             2

   ...             1          23             2

   ...             3          25             2

                         Finding a tweet by id
                         queries N partitions
Current Implementation
            Partition by time
                   id    user_id
                   24      ...
     Partition 2
                   23      ...

                   id    user_id
     Partition 1   22      ...
                   21      ...
Current Implementation
                                   Queries try each
             Partition by time partition in order
                    id      user_iduntil enough data
     Partition 2
                   24          ...
                                    is accumulated
                   23     ...

                   id   user_id
     Partition 1   22     ...
                   21     ...
LOCALITY
Low Latency
                                PK Lookup

        Memcached                  1ms
          MySQL                 <10ms*




   * Depends on the number of partitions searched
Principles
• Partition and index
• Exploit locality (in this case, temporal locality)
  • New tweets are requested most frequently, so
usually only 1 partition is checked
Problems w/ solution
• Write throughput
• Have encountered deadlocks in MySQL at crazy
tweet velocity
• Creating a new temporal shard is a manual process
and takes too long; it involves setting up a parallel
replication hierarchy. Our DBA hates us
Future solution
         Partition k1               Partition k2
    id          user_id        id          user_id
    20             ...         21             ...
    22             ...         23             ...
         Partition u1               Partition u2
  user_id         ids        user_id         ids
    12         20, 21, ...     13         48, 27, ...

    14         25, 32, ...     15         23, 51, ...


   • Cassandra (non-relational)
   • Primary Key partitioning
   • Manual secondary index on user_id
   • Memcached for 90+% of reads
The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
What is a Timeline?
• Sequence of tweet ids
• Query pattern: get by user_id
• Operations:
  • append
  • merge
  • truncate
• High-velocity bounded vector
• Space-based (in-place mutation)
Tweets from 3
different people
Original Implementation
    SELECT * FROM tweets
    WHERE user_id IN
      (SELECT source_id
       FROM followers
       WHERE destination_id = ?)
    ORDER BY created_at DESC
    LIMIT 20
Original Implementation
    SELECT * FROM tweets
    WHERE user_id IN
      (SELECT source_id
       FROM followers
       WHERE destination_id = ?)
    ORDER BY created_at DESC
    LIMIT 20

          Crazy slow if you have lots
         of friends or indices can’t be
                  kept in RAM
OFF-LINE VS.
ONLINE
COMPUTATION
Current Implementation



• Sequences stored in Memcached
• Fanout off-line, but has a low latency SLA
• Truncate at random intervals to ensure bounded
length
• On cache miss, merge user timelines
Throughput Statistics

   date     average tps   peak tps   fanout ratio    deliveries


10/7/2008      30          120        175:1          21,000

4/15/2010     700         2,000       600:1         1,200,000
1.2m
Deliveries per second
MEMORY
HIERARCHY
Possible implementations
• Fanout to disk
  • Ridonculous number of IOPS required, even with
fancy buffering techniques
  • Cost of rebuilding data from other durable stores not
too expensive
• Fanout to memory
 • Good if cardinality of corpus * bytes/datum not too
many GB
Low Latency

          get            append          fanout



         1ms             1ms             <1s*



   * Depends on the number of followers of the tweeter
Principles
• Off-line vs. Online computation
  • The answer to some problems can be pre-computed
if the amount of work is bounded and the query
pattern is very limited
• Keep the memory hierarchy in mind
• The efficiency of a system includes the cost of
generating data from another source (such as a
backup) times the probability of needing to
The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
What is a Social Graph?
• List of who follows whom, who blocks whom, etc.
• Operations:
  • Enumerate by time
  • Intersection, Union, Difference
  • Inclusion
  • Cardinality
  • Mass-deletes for spam
• Medium-velocity unbounded vectors
• Complex, predetermined queries
Temporal enumeration
Inclusion
Temporal enumeration
Inclusion
Temporal enumeration




                           Cardinality
Intersection: Deliver to people
who follow both @aplusk and
        @foursquare
Original Implementation
      source_id                     destination_id

         20                              12

         29                              12

         34                              16



• Single table, vertically scaled
• Master-Slave replication
Index

Original Implementation
      source_id                     destination_id

         20                              12

         29                              12

         34                              16



• Single table, vertically scaled
• Master-Slave replication
Index                             Index

Original Implementation
      source_id                     destination_id

         20                              12

         29                              12

         34                              16



• Single table, vertically scaled
• Master-Slave replication
Problems w/ solution
• Write throughput
• Indices couldn’t be kept in RAM
Current solution
               Forward                                     Backward
source_id destination_id   updated_at   x   destination_id source_id   updated_at   x
   20           12          20:50:14    x        12           20        20:50:14    x
   20           13          20:51:32             12           32        20:51:32
   20           16                               12           16



        • Partitioned by user id
        • Edges stored in “forward” and “backward” directions
        • Indexed by time
        • Indexed by element (for set algebra)
        • Denormalized cardinality
Current solution
               Forward                                     Backward
source_id destination_id   updated_at   x   destination_id source_id   updated_at   x
   20           12          20:50:14    x        12           20        20:50:14    x
   20           13          20:51:32             12           32        20:51:32
   20           16                               12           16



        • Partitioned by user id
                   Partitioned by user
        • Edges stored in “forward” and “backward” directions
        • Indexed by time
        • Indexed by element (for set algebra)
        • Denormalized cardinality
Edges stored in both directions

          Current solution
               Forward                                     Backward
source_id destination_id   updated_at   x   destination_id source_id   updated_at   x
   20           12          20:50:14    x        12           20        20:50:14    x
   20           13          20:51:32             12           32        20:51:32
   20           16                               12           16



        • Partitioned by user id
                   Partitioned by user
        • Edges stored in “forward” and “backward” directions
        • Indexed by time
        • Indexed by element (for set algebra)
        • Denormalized cardinality
Challenges
• Data consistency in the presence of failures
• Write operations are idempotent: retry until success
• Last-Write Wins for edges
  • (with an ordering relation on State for time
conflicts)
• Other commutative strategies for mass-writes
Low Latency

                                                   write
cardinality      iteration          write ack                  inclusion
                                                 materialize


  1ms         100edges/ms*           1ms          16ms          1ms



                             * 2ms lower bound
Principles
• It is not possible to pre-compute set algebra queries
• Simple distributed coordination techniques work
• Partition, replicate, index. Many efficiency and
scalability problems are solved the same way
The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
What is a Search Index?
   • “Find me all tweets with these words in it...”
   • Posting list
   • Boolean and/or queries
   • Complex, ad hoc queries
   • Relevance is recency*



* Note: there is a non-real-time component to search, but it is not discussed here
Intersection of
three posting lists
Original Implementation
      term_id                      doc_id

        20                          12

        20                          86

        34                          16



• Single table, vertically scaled
• Master-Slave replication for read throughput
Problems w/ solution
• Index could not be kept in memory
Current Implementation
                   term_id   doc_id
                     24        ...
     Partition 2
                     23        ...

                   term_id   doc_id
     Partition 1     22        ...
                     21        ...


       • Partitioned by time
       • Uses MySQL
       • Uses delayed key-write
Problems w/ solution
• Write throughput
• Queries for rare terms need to search many
partitions
• Space efficiency/recall
  • MySQL requires lots of memory
DATA NEAR
COMPUTATION
Future solution




   • Document partitioning
   • Time partitioning too
   • Merge layer
   • May use Lucene instead of MySQL
Principles
• Partition so that work can be parallelized
• Temporal locality is not always enough
The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
Summary Statistics
                           writes/
            reads/second                 cardinality   bytes/item   durability
                           second

 Tweets       100k          850            12b          300b        durable

Timelines      80k         1.2m            a lot        3.2k         non

 Graphs       100k          20k            13b           110        durable

 Search        13k         21k†          315m‡            1k        durable


                       † tps * 25 postings
                       ‡ 75 partitions * 4.2m tweets
Principles
• All engineering solutions are transient
• Nothing’s perfect but some solutions are good enough
for a while
• Scalability solutions aren’t magic. They involve
partitioning, indexing, and replication
• All data for real-time queries MUST be in memory.
Disk is for writes only.
• Some problems can be solved with pre-computation,
but a lot can’t
• Exploit locality where possible
Appendix

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (19)

P2
P2P2
P2
 
Empenho
EmpenhoEmpenho
Empenho
 
Programacarnaval (1)
Programacarnaval (1)Programacarnaval (1)
Programacarnaval (1)
 
312435221388975647331243500000000000000000000000000000000000000000000
312435221388975647331243500000000000000000000000000000000000000000000312435221388975647331243500000000000000000000000000000000000000000000
312435221388975647331243500000000000000000000000000000000000000000000
 
Kate mather fmp schedule 14.01.13
Kate mather fmp schedule 14.01.13Kate mather fmp schedule 14.01.13
Kate mather fmp schedule 14.01.13
 
Market Watch December 2012
Market Watch December 2012Market Watch December 2012
Market Watch December 2012
 
Laplanche e pontalis vocabulário de psicanálise
Laplanche e pontalis   vocabulário de psicanáliseLaplanche e pontalis   vocabulário de psicanálise
Laplanche e pontalis vocabulário de psicanálise
 
Sustentabilidade e gestão urbana
Sustentabilidade e gestão urbanaSustentabilidade e gestão urbana
Sustentabilidade e gestão urbana
 
De mae para_filhos
De mae para_filhosDe mae para_filhos
De mae para_filhos
 
showroom thiet bi nha bep cao cap Bepthaison.vn
showroom thiet bi nha bep cao cap Bepthaison.vnshowroom thiet bi nha bep cao cap Bepthaison.vn
showroom thiet bi nha bep cao cap Bepthaison.vn
 
Test preso
Test presoTest preso
Test preso
 
Test
TestTest
Test
 
Imagenes de sam valentin
Imagenes  de  sam  valentinImagenes  de  sam  valentin
Imagenes de sam valentin
 
dan laybourn
dan laybourndan laybourn
dan laybourn
 
Acertijo NavideñO Adc Ell
Acertijo NavideñO Adc EllAcertijo NavideñO Adc Ell
Acertijo NavideñO Adc Ell
 
Saúde coletiva
Saúde coletivaSaúde coletiva
Saúde coletiva
 
Equação modular
Equação modularEquação modular
Equação modular
 
CV_Alen_Basa
CV_Alen_BasaCV_Alen_Basa
CV_Alen_Basa
 
Os games antigos
Os games antigosOs games antigos
Os games antigos
 

Ähnlich wie Nick twitter

Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitternkallen
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Rainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectRainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectPeter Hlavaty
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudHossein Riasati
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica SetsMongoDB
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Infrastructure for cloud_computing
Infrastructure for cloud_computingInfrastructure for cloud_computing
Infrastructure for cloud_computingJULIO GONZALEZ SANZ
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 
Automatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier SystemsAutomatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier SystemsDaniele Loiacono
 
Reversing banking trojan: an in-depth look into Gataka
Reversing banking trojan: an in-depth look into GatakaReversing banking trojan: an in-depth look into Gataka
Reversing banking trojan: an in-depth look into Gatakajiboutin
 
Boutin reversing banking trojan. an in-depth look into gataka
Boutin   reversing banking trojan. an in-depth look into gatakaBoutin   reversing banking trojan. an in-depth look into gataka
Boutin reversing banking trojan. an in-depth look into gatakaDefconRussia
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Development with Qt for Windows CE
Development with Qt for Windows CEDevelopment with Qt for Windows CE
Development with Qt for Windows CEaccount inactive
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
 
Cassandra under the hood
Cassandra under the hoodCassandra under the hood
Cassandra under the hoodAndriy Rymar
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaMatteo Baglini
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 

Ähnlich wie Nick twitter (20)

Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 
VAST-Tree, EDBT'12
VAST-Tree, EDBT'12VAST-Tree, EDBT'12
VAST-Tree, EDBT'12
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Rainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectRainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could Expect
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the Cloud
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica Sets
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Infrastructure for cloud_computing
Infrastructure for cloud_computingInfrastructure for cloud_computing
Infrastructure for cloud_computing
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
Automatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier SystemsAutomatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier Systems
 
Reversing banking trojan: an in-depth look into Gataka
Reversing banking trojan: an in-depth look into GatakaReversing banking trojan: an in-depth look into Gataka
Reversing banking trojan: an in-depth look into Gataka
 
Boutin reversing banking trojan. an in-depth look into gataka
Boutin   reversing banking trojan. an in-depth look into gatakaBoutin   reversing banking trojan. an in-depth look into gataka
Boutin reversing banking trojan. an in-depth look into gataka
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Development with Qt for Windows CE
Development with Qt for Windows CEDevelopment with Qt for Windows CE
Development with Qt for Windows CE
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Cassandra under the hood
Cassandra under the hoodCassandra under the hood
Cassandra under the hood
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscana
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 

Mehr von d0nn9n

腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)
腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)
腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)d0nn9n
 
腾讯大讲堂:55 企业法律风险防范
腾讯大讲堂:55 企业法律风险防范腾讯大讲堂:55 企业法律风险防范
腾讯大讲堂:55 企业法律风险防范d0nn9n
 
腾讯大讲堂:56 qzone安全之路
腾讯大讲堂:56 qzone安全之路腾讯大讲堂:56 qzone安全之路
腾讯大讲堂:56 qzone安全之路d0nn9n
 
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里d0nn9n
 
腾讯大讲堂:57 超级qq的千万之路
腾讯大讲堂:57 超级qq的千万之路 腾讯大讲堂:57 超级qq的千万之路
腾讯大讲堂:57 超级qq的千万之路 d0nn9n
 
蔡学镛 Rebol漫谈
蔡学镛   Rebol漫谈蔡学镛   Rebol漫谈
蔡学镛 Rebol漫谈d0nn9n
 
赵泽欣 - 淘宝网前端应用与发展
赵泽欣 - 淘宝网前端应用与发展赵泽欣 - 淘宝网前端应用与发展
赵泽欣 - 淘宝网前端应用与发展d0nn9n
 
Yanggang wps
Yanggang wpsYanggang wps
Yanggang wpsd0nn9n
 
熊节 - 软件工厂的精益之路
熊节 - 软件工厂的精益之路熊节 - 软件工厂的精益之路
熊节 - 软件工厂的精益之路d0nn9n
 
谢恩伟 - 微软在云端
谢恩伟 - 微软在云端谢恩伟 - 微软在云端
谢恩伟 - 微软在云端d0nn9n
 
去哪儿平台技术
去哪儿平台技术去哪儿平台技术
去哪儿平台技术d0nn9n
 
吴磊 - Silverlight企业级RIA
吴磊 - Silverlight企业级RIA吴磊 - Silverlight企业级RIA
吴磊 - Silverlight企业级RIAd0nn9n
 
Tom - Scrum
Tom - ScrumTom - Scrum
Tom - Scrumd0nn9n
 
Tim - FSharp
Tim - FSharpTim - FSharp
Tim - FSharpd0nn9n
 
Tiger oracle
Tiger oracleTiger oracle
Tiger oracled0nn9n
 
Paulking groovy
Paulking groovyPaulking groovy
Paulking groovyd0nn9n
 
Paulking dlp
Paulking dlpPaulking dlp
Paulking dlpd0nn9n
 
Patrick jcp
Patrick jcpPatrick jcp
Patrick jcpd0nn9n
 
Marc facebook
Marc facebookMarc facebook
Marc facebookd0nn9n
 
Kane debt
Kane debtKane debt
Kane debtd0nn9n
 

Mehr von d0nn9n (20)

腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)
腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)
腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)
 
腾讯大讲堂:55 企业法律风险防范
腾讯大讲堂:55 企业法律风险防范腾讯大讲堂:55 企业法律风险防范
腾讯大讲堂:55 企业法律风险防范
 
腾讯大讲堂:56 qzone安全之路
腾讯大讲堂:56 qzone安全之路腾讯大讲堂:56 qzone安全之路
腾讯大讲堂:56 qzone安全之路
 
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里
 
腾讯大讲堂:57 超级qq的千万之路
腾讯大讲堂:57 超级qq的千万之路 腾讯大讲堂:57 超级qq的千万之路
腾讯大讲堂:57 超级qq的千万之路
 
蔡学镛 Rebol漫谈
蔡学镛   Rebol漫谈蔡学镛   Rebol漫谈
蔡学镛 Rebol漫谈
 
赵泽欣 - 淘宝网前端应用与发展
赵泽欣 - 淘宝网前端应用与发展赵泽欣 - 淘宝网前端应用与发展
赵泽欣 - 淘宝网前端应用与发展
 
Yanggang wps
Yanggang wpsYanggang wps
Yanggang wps
 
熊节 - 软件工厂的精益之路
熊节 - 软件工厂的精益之路熊节 - 软件工厂的精益之路
熊节 - 软件工厂的精益之路
 
谢恩伟 - 微软在云端
谢恩伟 - 微软在云端谢恩伟 - 微软在云端
谢恩伟 - 微软在云端
 
去哪儿平台技术
去哪儿平台技术去哪儿平台技术
去哪儿平台技术
 
吴磊 - Silverlight企业级RIA
吴磊 - Silverlight企业级RIA吴磊 - Silverlight企业级RIA
吴磊 - Silverlight企业级RIA
 
Tom - Scrum
Tom - ScrumTom - Scrum
Tom - Scrum
 
Tim - FSharp
Tim - FSharpTim - FSharp
Tim - FSharp
 
Tiger oracle
Tiger oracleTiger oracle
Tiger oracle
 
Paulking groovy
Paulking groovyPaulking groovy
Paulking groovy
 
Paulking dlp
Paulking dlpPaulking dlp
Paulking dlp
 
Patrick jcp
Patrick jcpPatrick jcp
Patrick jcp
 
Marc facebook
Marc facebookMarc facebook
Marc facebook
 
Kane debt
Kane debtKane debt
Kane debt
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Nick twitter

  • 1. Big Data in Real-Time at Twitter by Nick Kallen (@nk)
  • 3. What is Real-Time Data? • On-line queries for a single web request • Off-line computations with very low latency • Latency and throughput are equally important • Not talking about Hadoop and other high-latency, Big Data tools
  • 4. The four data problems • Tweets • Timelines • Social graphs • Search indices
  • 5.
  • 6. What is a Tweet? • 140 character message, plus some metadata • Query patterns: • by id • by author • (also @replies, but not discussed here) • Row Storage
  • 7. Find by primary key: 4376167936
  • 8. Find all by user_id: 749863
  • 9. Original Implementation id user_id text created_at 20 12 just setting up my twttr 2006-03-21 20:50:14 29 12 inviting coworkers 2006-03-21 21:02:56 34 16 Oh shit, I just twittered a little. 2006-03-21 21:08:09 • Relational • Single table, vertically scaled • Master-Slave replication and Memcached for read throughput.
  • 10. Original Implementation Master-Slave Replication Memcached for reads
  • 11. Problems w/ solution • Disk space: did not want to support disk arrays larger than 800GB • At 2,954,291,678 tweets, disk was over 90% utilized.
  • 13. Possible implementations Partition by primary key Partition 1 Partition 2 id user_id id user_id 20 ... 21 ... 22 ... 23 ... 24 ... 25 ...
  • 14. Possible implementations Partition by primary key Partition 1 Partition 2 id user_id id user_id 20 ... 21 ... 22 ... 23 ... 24 ... Finding recent tweets 25 ... by user_id queries N partitions
  • 15. Possible implementations Partition by user id Partition 1 Partition 2 id user_id id user_id ... 1 21 2 ... 1 23 2 ... 3 25 2
  • 16. Possible implementations Partition by user id Partition 1 Partition 2 id user_id id user_id ... 1 21 2 ... 1 23 2 ... 3 25 2 Finding a tweet by id queries N partitions
  • 17. Current Implementation Partition by time id user_id 24 ... Partition 2 23 ... id user_id Partition 1 22 ... 21 ...
  • 18. Current Implementation Queries try each Partition by time partition in order id user_iduntil enough data Partition 2 24 ... is accumulated 23 ... id user_id Partition 1 22 ... 21 ...
  • 20. Low Latency PK Lookup Memcached 1ms MySQL <10ms* * Depends on the number of partitions searched
  • 21. Principles • Partition and index • Exploit locality (in this case, temporal locality) • New tweets are requested most frequently, so usually only 1 partition is checked
  • 22. Problems w/ solution • Write throughput • Have encountered deadlocks in MySQL at crazy tweet velocity • Creating a new temporal shard is a manual process and takes too long; it involves setting up a parallel replication hierarchy. Our DBA hates us
  • 23. Future solution Partition k1 Partition k2 id user_id id user_id 20 ... 21 ... 22 ... 23 ... Partition u1 Partition u2 user_id ids user_id ids 12 20, 21, ... 13 48, 27, ... 14 25, 32, ... 15 23, 51, ... • Cassandra (non-relational) • Primary Key partitioning • Manual secondary index on user_id • Memcached for 90+% of reads
  • 24. The four data problems • Tweets • Timelines • Social graphs • Search indices
  • 25.
  • 26. What is a Timeline? • Sequence of tweet ids • Query pattern: get by user_id • Operations: • append • merge • truncate • High-velocity bounded vector • Space-based (in-place mutation)
  • 28. Original Implementation SELECT * FROM tweets WHERE user_id IN (SELECT source_id FROM followers WHERE destination_id = ?) ORDER BY created_at DESC LIMIT 20
  • 29. Original Implementation SELECT * FROM tweets WHERE user_id IN (SELECT source_id FROM followers WHERE destination_id = ?) ORDER BY created_at DESC LIMIT 20 Crazy slow if you have lots of friends or indices can’t be kept in RAM
  • 31. Current Implementation • Sequences stored in Memcached • Fanout off-line, but has a low latency SLA • Truncate at random intervals to ensure bounded length • On cache miss, merge user timelines
  • 32. Throughput Statistics date average tps peak tps fanout ratio deliveries 10/7/2008 30 120 175:1 21,000 4/15/2010 700 2,000 600:1 1,200,000
  • 35. Possible implementations • Fanout to disk • Ridonculous number of IOPS required, even with fancy buffering techniques • Cost of rebuilding data from other durable stores not too expensive • Fanout to memory • Good if cardinality of corpus * bytes/datum not too many GB
  • 36. Low Latency get append fanout 1ms 1ms <1s* * Depends on the number of followers of the tweeter
  • 37. Principles • Off-line vs. Online computation • The answer to some problems can be pre-computed if the amount of work is bounded and the query pattern is very limited • Keep the memory hierarchy in mind • The efficiency of a system includes the cost of generating data from another source (such as a backup) times the probability of needing to
  • 38. The four data problems • Tweets • Timelines • Social graphs • Search indices
  • 39.
  • 40. What is a Social Graph? • List of who follows whom, who blocks whom, etc. • Operations: • Enumerate by time • Intersection, Union, Difference • Inclusion • Cardinality • Mass-deletes for spam • Medium-velocity unbounded vectors • Complex, predetermined queries
  • 41.
  • 45.
  • 46. Intersection: Deliver to people who follow both @aplusk and @foursquare
  • 47. Original Implementation source_id destination_id 20 12 29 12 34 16 • Single table, vertically scaled • Master-Slave replication
  • 48. Index Original Implementation source_id destination_id 20 12 29 12 34 16 • Single table, vertically scaled • Master-Slave replication
  • 49. Index Index Original Implementation source_id destination_id 20 12 29 12 34 16 • Single table, vertically scaled • Master-Slave replication
  • 50. Problems w/ solution • Write throughput • Indices couldn’t be kept in RAM
  • 51. Current solution Forward Backward source_id destination_id updated_at x destination_id source_id updated_at x 20 12 20:50:14 x 12 20 20:50:14 x 20 13 20:51:32 12 32 20:51:32 20 16 12 16 • Partitioned by user id • Edges stored in “forward” and “backward” directions • Indexed by time • Indexed by element (for set algebra) • Denormalized cardinality
  • 52. Current solution Forward Backward source_id destination_id updated_at x destination_id source_id updated_at x 20 12 20:50:14 x 12 20 20:50:14 x 20 13 20:51:32 12 32 20:51:32 20 16 12 16 • Partitioned by user id Partitioned by user • Edges stored in “forward” and “backward” directions • Indexed by time • Indexed by element (for set algebra) • Denormalized cardinality
  • 53. Edges stored in both directions Current solution Forward Backward source_id destination_id updated_at x destination_id source_id updated_at x 20 12 20:50:14 x 12 20 20:50:14 x 20 13 20:51:32 12 32 20:51:32 20 16 12 16 • Partitioned by user id Partitioned by user • Edges stored in “forward” and “backward” directions • Indexed by time • Indexed by element (for set algebra) • Denormalized cardinality
  • 54. Challenges • Data consistency in the presence of failures • Write operations are idempotent: retry until success • Last-Write Wins for edges • (with an ordering relation on State for time conflicts) • Other commutative strategies for mass-writes
  • 55. Low Latency write cardinality iteration write ack inclusion materialize 1ms 100edges/ms* 1ms 16ms 1ms * 2ms lower bound
  • 56. Principles • It is not possible to pre-compute set algebra queries • Simple distributed coordination techniques work • Partition, replicate, index. Many efficiency and scalability problems are solved the same way
  • 57. The four data problems • Tweets • Timelines • Social graphs • Search indices
  • 58.
  • 59. What is a Search Index? • “Find me all tweets with these words in it...” • Posting list • Boolean and/or queries • Complex, ad hoc queries • Relevance is recency* * Note: there is a non-real-time component to search, but it is not discussed here
  • 61. Original Implementation term_id doc_id 20 12 20 86 34 16 • Single table, vertically scaled • Master-Slave replication for read throughput
  • 62. Problems w/ solution • Index could not be kept in memory
  • 63. Current Implementation term_id doc_id 24 ... Partition 2 23 ... term_id doc_id Partition 1 22 ... 21 ... • Partitioned by time • Uses MySQL • Uses delayed key-write
  • 64. Problems w/ solution • Write throughput • Queries for rare terms need to search many partitions • Space efficiency/recall • MySQL requires lots of memory
  • 66. Future solution • Document partitioning • Time partitioning too • Merge layer • May use Lucene instead of MySQL
  • 67. Principles • Partition so that work can be parallelized • Temporal locality is not always enough
  • 68. The four data problems • Tweets • Timelines • Social graphs • Search indices
  • 69. Summary Statistics writes/ reads/second cardinality bytes/item durability second Tweets 100k 850 12b 300b durable Timelines 80k 1.2m a lot 3.2k non Graphs 100k 20k 13b 110 durable Search 13k 21k† 315m‡ 1k durable † tps * 25 postings ‡ 75 partitions * 4.2m tweets
  • 70. Principles • All engineering solutions are transient • Nothing’s perfect but some solutions are good enough for a while • Scalability solutions aren’t magic. They involve partitioning, indexing, and replication • All data for real-time queries MUST be in memory. Disk is for writes only. • Some problems can be solved with pre-computation, but a lot can’t • Exploit locality where possible