SlideShare ist ein Scribd-Unternehmen logo
1 von 36
PUTTING THE X FACTOR INTO
              CASSANDRA:
  ADVENTURES IN COUNTING
                  MALCOLM BOX, CTO, LIVE TALKBACK
       BIG DATA LONDON MEETUP, 18TH JANUARY 2012



                                                    1
INTRO
  Malcolm Box, CTO & Co-Founder

  @malcolmbox

  malcolm@tellybug.com

  http://tellybug.com




                                  2
WHAT WE DID FOR X FACTOR




                           2
X-FACTOR: THE RESULTS




                        4
X-FACTOR: THE RESULTS
  Over 1 Million app downloads




                                 4
X-FACTOR: THE RESULTS
  Over 1 Million app downloads




  Over 260 Million boos/claps




                                 4
X-FACTOR: THE RESULTS
  Over 1 Million app downloads




  Over 260 Million boos/claps




  Massive peak loads on CTA




                                 4
BUT SURELY COUNTING IS EASY?
  Need real time results

     How many boos?

     How many claps?

     Rate of boos

     Rate of claps

  Design for scale

     Goal of handling 10K per second coming into our servers




                                                               5
DISTRIBUTED COUNTING
  “Hey, my CPU can do 22305 MIPS!”

  “Stick it in Memcache!”

  “How about Redis?”

  “OK, how about sharding?”

  “Well, I hear Cassandra 0.8 has counters”




                                              6
MEMCACHE CAN’T COUNT




                       -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)




                        -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)




                         -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)

  >>> 0L




                         -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)

  >>> 0L

  cache.decr('key', 1)




                         -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)

  >>> 0L

  cache.decr('key', 1)

  >>> 0L




                         -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)

  >>> 0L

  cache.decr('key', 1)

  >>> 0L

  cache.incr('key', -1)




                          -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)

  >>> 0L

  cache.decr('key', 1)

  >>> 0L

  cache.incr('key', -1)

  >>> 4294967295L




                          -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)

  >>> 0L

  cache.decr('key', 1)

  >>> 0L

  cache.incr('key', -1)

  >>> 4294967295L

  cache.incr('key', 1)




                          -1
MEMCACHE CAN’T COUNT
  cache.set('key', 1)

  cache.decr('key', 1)

  >>> 0L

  cache.decr('key', 1)

  >>> 0L

  cache.incr('key', -1)

  >>> 4294967295L

  cache.incr('key', 1)

  >>> 4294967296L


                          -1
MEMCACHE CAN’T COUNT PART 3




                              8
MEMCACHE CAN’T COUNT PART 3
  EC2 limits

     Single Memcache server runs out of network I/O

     What then?




                                                      8
MEMCACHE CAN’T COUNT PART 3
  EC2 limits

     Single Memcache server runs out of network I/O

     What then?

  Redis?

     Benchmarked on EC2

     m1.large -> m1.large, 28K INCR/s

     Network I/O limited

     Can’t horizontally scale


                                                      8
SHARDED COUNTERS
  Implemented 2 level cache on web tier (https://gist.github.com/953524)

  But a counter is more complicated

  Sharded counter

     Store (count, delta, timestamp) locally

     Store count in L2 cache

     Increment changes local delta

     Push deltas to central every N seconds & refresh count

  Eventually consistent

     Maybe....unless something crashes


                                                                           9
CASSANDRA HAS COUNTERS
  New feature in Cassandra 0.8

  Special column type - CounterColumnType as the validator

  Distributed 64 bit counter, with eventual consistency

     CL.ONE writes recommended to avoid implicit reads impacting performance

     Reads tot up values from replicas to give value

  Simple functionality

     incr()/decr(), get()




                                                                               10
CAN CASSANDRA COUNT?




                       11
CAN CASSANDRA COUNT?
  Yes, But....




                       11
CAN CASSANDRA COUNT?
  Yes, But....

  Performance can suck

     Switch off replicate_on_write, tune RF & cluster size




                                                             11
CAN CASSANDRA COUNT?
  Yes, But....

  Performance can suck

     Switch off replicate_on_write, tune RF & cluster size

  Not scalable

     Scales as function of RF up to 4 nodes

     Above that ... you’re out of luck

     Best we achieved is ~10K/s increments to single counter value




                                                                     11
CAN CASSANDRA COUNT?
  Yes, But....

  Performance can suck

     Switch off replicate_on_write, tune RF & cluster size

  Not scalable

     Scales as function of RF up to 4 nodes

     Above that ... you’re out of luck

     Best we achieved is ~10K/s increments to single counter value

  What do you do if an operation fails?


                                                                     11
CASSANDRA - MAKE IT COUNT *FASTER*
  Recommendation (from Cassandra committers...):




                                                   12
CASSANDRA - MAKE IT COUNT *FASTER*
  Recommendation (from Cassandra committers...):




           SHARD YOUR COUNTERS




                                                   12
YOU’RE NOT COUNTING IT RIGHT




                               13
YOU’RE NOT COUNTING IT RIGHT
  When 1+1+1 is 2




                               13
YOU’RE NOT COUNTING IT RIGHT
  When 1+1+1 is 2




  Write Only Databases




                               13
CONCLUSION
  Counting is easy.....

  Unless you want to do it really, really fast



  If you’re inside the I/O limits for a single box, all is peachy



  Above that, there’s no good off the shelf answers




                                                                    14
ANY QUESTIONS?


      We’re hiring - if you’re interested in helping us count, get in touch!

                            malcolm@tellybug.com

                                 @malcolmbox




                                                                               15

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationDataStax Academy
 
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Frederic Descamps
 
Percon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshellPercon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshellFrederic Descamps
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3Marco Tusa
 
Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?Frederic Descamps
 
Low level java programming
Low level java programmingLow level java programming
Low level java programmingPeter Lawrey
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandraJon Haddad
 
Determinism in finance
Determinism in financeDeterminism in finance
Determinism in financePeter Lawrey
 
Plmce2k15 15 tips galera cluster
Plmce2k15   15 tips galera clusterPlmce2k15   15 tips galera cluster
Plmce2k15 15 tips galera clusterFrederic Descamps
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafkaconfluent
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaPeter Lawrey
 
Galera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesGalera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesSeveralnines
 
Deterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsDeterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsPeter Lawrey
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databasePeter Lawrey
 

Was ist angesagt? (20)

Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
 
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
 
Percon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshellPercon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshell
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3
 
Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandra
 
Determinism in finance
Determinism in financeDeterminism in finance
Determinism in finance
 
Plmce2k15 15 tips galera cluster
Plmce2k15   15 tips galera clusterPlmce2k15   15 tips galera cluster
Plmce2k15 15 tips galera cluster
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafka
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
 
Galera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slidesGalera Cluster - Node Recovery - Webinar slides
Galera Cluster - Node Recovery - Webinar slides
 
Deterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsDeterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systems
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL database
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 

Ähnlich wie How Cassandra was used to count over 1 million X Factor app downloads and 260 million votes

Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScyllaDB
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesScyllaDB
 
Performance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksMongoDB
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
 
Austin Cassandra Meetup re: Atomic Counters
Austin Cassandra Meetup re: Atomic CountersAustin Cassandra Meetup re: Atomic Counters
Austin Cassandra Meetup re: Atomic CountersTrevor Francis
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlDavid Daeschler
 
Stampede con 2014 cassandra in the real world
Stampede con 2014   cassandra in the real worldStampede con 2014   cassandra in the real world
Stampede con 2014 cassandra in the real worldzznate
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Reactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesReactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesStéphane Maldini
 
Run Your Business 6X Faster at Lower Costs!
Run Your Business 6X Faster at Lower Costs!Run Your Business 6X Faster at Lower Costs!
Run Your Business 6X Faster at Lower Costs!Scott Hayes
 
Growing Up MongoDB
Growing Up MongoDBGrowing Up MongoDB
Growing Up MongoDBMongoDB
 
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...Anant Corporation
 
Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterEugene Kirpichov
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best PracticesJeff Larkin
 

Ähnlich wie How Cassandra was used to count over 1 million X Factor app downloads and 260 million votes (20)

Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Performance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware Bottlenecks
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
 
Austin Cassandra Meetup re: Atomic Counters
Austin Cassandra Meetup re: Atomic CountersAustin Cassandra Meetup re: Atomic Counters
Austin Cassandra Meetup re: Atomic Counters
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosql
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
 
Stampede con 2014 cassandra in the real world
Stampede con 2014   cassandra in the real worldStampede con 2014   cassandra in the real world
Stampede con 2014 cassandra in the real world
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Reactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesReactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServices
 
Run Your Business 6X Faster at Lower Costs!
Run Your Business 6X Faster at Lower Costs!Run Your Business 6X Faster at Lower Costs!
Run Your Business 6X Faster at Lower Costs!
 
Growing Up MongoDB
Growing Up MongoDBGrowing Up MongoDB
Growing Up MongoDB
 
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
 
Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core cluster
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best Practices
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

How Cassandra was used to count over 1 million X Factor app downloads and 260 million votes

  • 1. PUTTING THE X FACTOR INTO CASSANDRA: ADVENTURES IN COUNTING MALCOLM BOX, CTO, LIVE TALKBACK BIG DATA LONDON MEETUP, 18TH JANUARY 2012 1
  • 2. INTRO Malcolm Box, CTO & Co-Founder @malcolmbox malcolm@tellybug.com http://tellybug.com 2
  • 3. WHAT WE DID FOR X FACTOR 2
  • 5. X-FACTOR: THE RESULTS Over 1 Million app downloads 4
  • 6. X-FACTOR: THE RESULTS Over 1 Million app downloads Over 260 Million boos/claps 4
  • 7. X-FACTOR: THE RESULTS Over 1 Million app downloads Over 260 Million boos/claps Massive peak loads on CTA 4
  • 8. BUT SURELY COUNTING IS EASY? Need real time results How many boos? How many claps? Rate of boos Rate of claps Design for scale Goal of handling 10K per second coming into our servers 5
  • 9. DISTRIBUTED COUNTING “Hey, my CPU can do 22305 MIPS!” “Stick it in Memcache!” “How about Redis?” “OK, how about sharding?” “Well, I hear Cassandra 0.8 has counters” 6
  • 11. MEMCACHE CAN’T COUNT cache.set('key', 1) -1
  • 12. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) -1
  • 13. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) >>> 0L -1
  • 14. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) >>> 0L cache.decr('key', 1) -1
  • 15. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) >>> 0L cache.decr('key', 1) >>> 0L -1
  • 16. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) >>> 0L cache.decr('key', 1) >>> 0L cache.incr('key', -1) -1
  • 17. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) >>> 0L cache.decr('key', 1) >>> 0L cache.incr('key', -1) >>> 4294967295L -1
  • 18. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) >>> 0L cache.decr('key', 1) >>> 0L cache.incr('key', -1) >>> 4294967295L cache.incr('key', 1) -1
  • 19. MEMCACHE CAN’T COUNT cache.set('key', 1) cache.decr('key', 1) >>> 0L cache.decr('key', 1) >>> 0L cache.incr('key', -1) >>> 4294967295L cache.incr('key', 1) >>> 4294967296L -1
  • 21. MEMCACHE CAN’T COUNT PART 3 EC2 limits Single Memcache server runs out of network I/O What then? 8
  • 22. MEMCACHE CAN’T COUNT PART 3 EC2 limits Single Memcache server runs out of network I/O What then? Redis? Benchmarked on EC2 m1.large -> m1.large, 28K INCR/s Network I/O limited Can’t horizontally scale 8
  • 23. SHARDED COUNTERS Implemented 2 level cache on web tier (https://gist.github.com/953524) But a counter is more complicated Sharded counter Store (count, delta, timestamp) locally Store count in L2 cache Increment changes local delta Push deltas to central every N seconds & refresh count Eventually consistent Maybe....unless something crashes 9
  • 24. CASSANDRA HAS COUNTERS New feature in Cassandra 0.8 Special column type - CounterColumnType as the validator Distributed 64 bit counter, with eventual consistency CL.ONE writes recommended to avoid implicit reads impacting performance Reads tot up values from replicas to give value Simple functionality incr()/decr(), get() 10
  • 26. CAN CASSANDRA COUNT? Yes, But.... 11
  • 27. CAN CASSANDRA COUNT? Yes, But.... Performance can suck Switch off replicate_on_write, tune RF & cluster size 11
  • 28. CAN CASSANDRA COUNT? Yes, But.... Performance can suck Switch off replicate_on_write, tune RF & cluster size Not scalable Scales as function of RF up to 4 nodes Above that ... you’re out of luck Best we achieved is ~10K/s increments to single counter value 11
  • 29. CAN CASSANDRA COUNT? Yes, But.... Performance can suck Switch off replicate_on_write, tune RF & cluster size Not scalable Scales as function of RF up to 4 nodes Above that ... you’re out of luck Best we achieved is ~10K/s increments to single counter value What do you do if an operation fails? 11
  • 30. CASSANDRA - MAKE IT COUNT *FASTER* Recommendation (from Cassandra committers...): 12
  • 31. CASSANDRA - MAKE IT COUNT *FASTER* Recommendation (from Cassandra committers...): SHARD YOUR COUNTERS 12
  • 32. YOU’RE NOT COUNTING IT RIGHT 13
  • 33. YOU’RE NOT COUNTING IT RIGHT When 1+1+1 is 2 13
  • 34. YOU’RE NOT COUNTING IT RIGHT When 1+1+1 is 2 Write Only Databases 13
  • 35. CONCLUSION Counting is easy..... Unless you want to do it really, really fast If you’re inside the I/O limits for a single box, all is peachy Above that, there’s no good off the shelf answers 14
  • 36. ANY QUESTIONS? We’re hiring - if you’re interested in helping us count, get in touch! malcolm@tellybug.com @malcolmbox 15

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. redis - what if you need 30K/s?\n\n
  19. redis - what if you need 30K/s?\n\n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. reveal - “shard your counters”\n
  27. 1+1+1 = 2 - eventual consistency. Cache consistency\n\nWrite only DB - Cassandra bug where get_range() wasn’t returning all the data in the DB.\n
  28. 1+1+1 = 2 - eventual consistency. Cache consistency\n\nWrite only DB - Cassandra bug where get_range() wasn’t returning all the data in the DB.\n
  29. single box - failures?\nCass counters don’t scale :(\n
  30. \n