SlideShare a Scribd company logo
1 of 19
Download to read offline
Realtime Analytics
 with Cassandra
   or: How I Learned to
  Stopped Worrying and
       Love Counting
Combining “big” and “real-time” is hard

    Live & historical                    Drill downs
                         Trends...
      aggregates...                      and roll ups




2
                                                        Analytics
What is Realtime Analytics?
    eg “show me the number of mentions of
        ‘Acunu’ per day, between May and
          November 2011, on Twitter”


       Batch (Hadoop) approach would
    require processing ~30 billion tweets,
              or ~4.2 TB of data
                 http://blog.twitter.com/2011/03/numbers.html
Okay, so how are we
            going to do it?
                                counter
                                updates
             tweets
Twitter                   ?

  •   Push processing into ingest phase
  •   Make queries fast
Okay, so how are we
     going to do it?
For each tweet,
increment a bunch of counters,
such that answering a query
is as easy as reading some counters
Preparing the data
                              12:32:15 I like #trafficlights
Step 1: Get a feed of    12:33:43 Nobody expects...
        the tweets     12:33:49 I ate a #bee; woe is...
                      12:34:04 Man, @acunu rocks!

Step 2: Tokenise the
        tweet

Step 3: Increment counters            [1234, man]   +1
        in time buckets for           [1234, acunu] +1
        each token                    [1234, rock] +1
Querying
                            start: [01/05/11, acunu]
Step 1: Do a range query    end:   [30/05/11, acunu]

                                       Key            #Mentions
                              [01/05/11 00:01, acunu]    3
Step 2: Result table          [01/05/11 00:02, acunu]    5
                                        ...              ...


                              90

Step 3: Plot pretty graph     45
                               0
                                   May Jun Jul Aug Sept Oct Nov
Except it’s not that easy...
• Cassandra best practice is to use RandomPartitioner,
  so not possible to range queries on rows
• Could manually work out each row in range, do lots of
  point gets
  • This would suck - each query would be 100’s of random
    IOs on disk
• Need to use wide rows, range query is a column slice,
  each query ~1 IO - Denormalisation
So instead of this...
                              Key            #Mentions
                     [01/05/11 00:01, acunu]    3
                     [01/05/11 00:02, acunu]    5
                               ...              ...




                     We do this
                  Key           00:01       00:02        ...
            [01/05/11, acunu]     3           5          ...
            [02/05/11, acunu]    12           4          ...
                    ...           ...                    ...

Row key is ‘big’                     Column key is ‘small’
 time bucket                             time bucket
Demo
./painbird.py -u tom_wilkie

    http://ec2-176-34-212-226.eu-
 west-1.compute.amazonaws.com:8000
Now its your
  turn.....
1. Get a twitter account - http://twitter.com

2. Get some Cassandra VMs - http://goo.gl/Ruqlt

3. Cluster them up

4. Get the code - http://goo.gl/VxXKB

5. Implement the missing bits!

6. (Prizes for the ones that spot bugs!)
Get some Cassandra
       VMs


http://goo.gl/O9hkv
Cluster them up
• SSH in, set password (on both!)
• Check you can connect to the UI
• Use UI (click add host)
Get the code
SSH into one of the VMs:
# curl https://acunu-
oss.s3.amazonaws.com/
painbird-2.tar.gz | tar zxf -
# cd release
# ./painbird.py -u tom_wilkie
Implement the “core”

• In core.py
• def insert_tweet(cassandra, tweet):
• def do_query(cassandra, term, start, finish):
Check you data
-bash-3.2$ cassandra-cli
Connected to: "Test Cluster" on localhost/9160
Welcome to Cassandra CLI version 1.0.8.acunu2
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use painbird;
Authenticated to keyspace: painbird
[default@painbird] list keywords;
Using default limit of 100
-------------------
RowKey: m-5-"woe
=> (counter=11, value=1)
Extensions
Extensions
UI
• Pretty graphs
• Automatically periodically update?
• Search multiple terms
Painbird
•   mentions of multiple terms
•   sentiment analysis - http://www.nltk.org/
•   filtering by multiple fields (geo + keyword)

More Related Content

Similar to Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormalized London 2012

Earthquake shakes twitter users
Earthquake shakes twitter usersEarthquake shakes twitter users
Earthquake shakes twitter usersEshan Mudwel
 
It's all about the timing
It's all about the timingIt's all about the timing
It's all about the timingSensePost
 
Reflex - How Does It Work? (extended dance remix)
Reflex - How Does It Work? (extended dance remix)Reflex - How Does It Work? (extended dance remix)
Reflex - How Does It Work? (extended dance remix)Rocco Caputo
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Psychtoolbox (PTB) practical course by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...
Psychtoolbox (PTB) practical course  by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...Psychtoolbox (PTB) practical course  by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...
Psychtoolbox (PTB) practical course by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...Volodymyr Bogdanov
 
Pointers lesson 3 (data types and pointer arithmetics)
Pointers lesson 3 (data types and pointer arithmetics)Pointers lesson 3 (data types and pointer arithmetics)
Pointers lesson 3 (data types and pointer arithmetics)SetuMaheshwari1
 
200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DB200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DBcookie1969
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
Asynchronous Awesome
Asynchronous AwesomeAsynchronous Awesome
Asynchronous AwesomeFlip Sasser
 
Upgrade and Self Help Tools for Atlassian Admins
Upgrade and Self Help Tools for Atlassian AdminsUpgrade and Self Help Tools for Atlassian Admins
Upgrade and Self Help Tools for Atlassian AdminsAtlassian
 
SKB Kontur: Digging Cassandra cluster
SKB Kontur: Digging Cassandra clusterSKB Kontur: Digging Cassandra cluster
SKB Kontur: Digging Cassandra clusterDataStax Academy
 
Digging Cassandra Cluster
Digging Cassandra ClusterDigging Cassandra Cluster
Digging Cassandra ClusterIvan Burmistrov
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 

Similar to Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormalized London 2012 (15)

More Mining
More MiningMore Mining
More Mining
 
Earthquake shakes twitter users
Earthquake shakes twitter usersEarthquake shakes twitter users
Earthquake shakes twitter users
 
It's all about the timing
It's all about the timingIt's all about the timing
It's all about the timing
 
Reflex - How Does It Work? (extended dance remix)
Reflex - How Does It Work? (extended dance remix)Reflex - How Does It Work? (extended dance remix)
Reflex - How Does It Work? (extended dance remix)
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Psychtoolbox (PTB) practical course by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...
Psychtoolbox (PTB) practical course  by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...Psychtoolbox (PTB) practical course  by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...
Psychtoolbox (PTB) practical course by Volodymyr B. Bogdanov, Lyon/Kyiv 2018...
 
Pointers lesson 3 (data types and pointer arithmetics)
Pointers lesson 3 (data types and pointer arithmetics)Pointers lesson 3 (data types and pointer arithmetics)
Pointers lesson 3 (data types and pointer arithmetics)
 
200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DB200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DB
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
Asynchronous Awesome
Asynchronous AwesomeAsynchronous Awesome
Asynchronous Awesome
 
Upgrade and Self Help Tools for Atlassian Admins
Upgrade and Self Help Tools for Atlassian AdminsUpgrade and Self Help Tools for Atlassian Admins
Upgrade and Self Help Tools for Atlassian Admins
 
SKB Kontur: Digging Cassandra cluster
SKB Kontur: Digging Cassandra clusterSKB Kontur: Digging Cassandra cluster
SKB Kontur: Digging Cassandra cluster
 
Digging Cassandra Cluster
Digging Cassandra ClusterDigging Cassandra Cluster
Digging Cassandra Cluster
 
Storm 2012 03-29
Storm 2012 03-29Storm 2012 03-29
Storm 2012 03-29
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 

More from Acunu

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinAcunu
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with CassandraAcunu
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: CassandraAcunu
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation CassandraAcunu
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
 
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Acunu
 
Cassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowCassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowAcunu
 
Acunu Analytics
Acunu AnalyticsAcunu Analytics
Acunu AnalyticsAcunu
 
Cassandra Performance: Past, present & future
Cassandra Performance: Past, present & futureCassandra Performance: Past, present & future
Cassandra Performance: Past, present & futureAcunu
 

More from Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
 
Cassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowCassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard Low
 
Acunu Analytics
Acunu AnalyticsAcunu Analytics
Acunu Analytics
 
Cassandra Performance: Past, present & future
Cassandra Performance: Past, present & futureCassandra Performance: Past, present & future
Cassandra Performance: Past, present & future
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormalized London 2012

  • 1. Realtime Analytics with Cassandra or: How I Learned to Stopped Worrying and Love Counting
  • 2. Combining “big” and “real-time” is hard Live & historical Drill downs Trends... aggregates... and roll ups 2 Analytics
  • 3. What is Realtime Analytics? eg “show me the number of mentions of ‘Acunu’ per day, between May and November 2011, on Twitter” Batch (Hadoop) approach would require processing ~30 billion tweets, or ~4.2 TB of data http://blog.twitter.com/2011/03/numbers.html
  • 4. Okay, so how are we going to do it? counter updates tweets Twitter ? • Push processing into ingest phase • Make queries fast
  • 5. Okay, so how are we going to do it? For each tweet, increment a bunch of counters, such that answering a query is as easy as reading some counters
  • 6. Preparing the data 12:32:15 I like #trafficlights Step 1: Get a feed of 12:33:43 Nobody expects... the tweets 12:33:49 I ate a #bee; woe is... 12:34:04 Man, @acunu rocks! Step 2: Tokenise the tweet Step 3: Increment counters [1234, man] +1 in time buckets for [1234, acunu] +1 each token [1234, rock] +1
  • 7. Querying start: [01/05/11, acunu] Step 1: Do a range query end: [30/05/11, acunu] Key #Mentions [01/05/11 00:01, acunu] 3 Step 2: Result table [01/05/11 00:02, acunu] 5 ... ... 90 Step 3: Plot pretty graph 45 0 May Jun Jul Aug Sept Oct Nov
  • 8. Except it’s not that easy... • Cassandra best practice is to use RandomPartitioner, so not possible to range queries on rows • Could manually work out each row in range, do lots of point gets • This would suck - each query would be 100’s of random IOs on disk • Need to use wide rows, range query is a column slice, each query ~1 IO - Denormalisation
  • 9. So instead of this... Key #Mentions [01/05/11 00:01, acunu] 3 [01/05/11 00:02, acunu] 5 ... ... We do this Key 00:01 00:02 ... [01/05/11, acunu] 3 5 ... [02/05/11, acunu] 12 4 ... ... ... ... Row key is ‘big’ Column key is ‘small’ time bucket time bucket
  • 10. Demo ./painbird.py -u tom_wilkie http://ec2-176-34-212-226.eu- west-1.compute.amazonaws.com:8000
  • 11. Now its your turn.....
  • 12. 1. Get a twitter account - http://twitter.com 2. Get some Cassandra VMs - http://goo.gl/Ruqlt 3. Cluster them up 4. Get the code - http://goo.gl/VxXKB 5. Implement the missing bits! 6. (Prizes for the ones that spot bugs!)
  • 13. Get some Cassandra VMs http://goo.gl/O9hkv
  • 14. Cluster them up • SSH in, set password (on both!) • Check you can connect to the UI • Use UI (click add host)
  • 15. Get the code SSH into one of the VMs: # curl https://acunu- oss.s3.amazonaws.com/ painbird-2.tar.gz | tar zxf - # cd release # ./painbird.py -u tom_wilkie
  • 16. Implement the “core” • In core.py • def insert_tweet(cassandra, tweet): • def do_query(cassandra, term, start, finish):
  • 17. Check you data -bash-3.2$ cassandra-cli Connected to: "Test Cluster" on localhost/9160 Welcome to Cassandra CLI version 1.0.8.acunu2 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use painbird; Authenticated to keyspace: painbird [default@painbird] list keywords; Using default limit of 100 ------------------- RowKey: m-5-"woe => (counter=11, value=1)
  • 19. Extensions UI • Pretty graphs • Automatically periodically update? • Search multiple terms Painbird • mentions of multiple terms • sentiment analysis - http://www.nltk.org/ • filtering by multiple fields (geo + keyword)