SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Real Time Analytics for Big Data
A Twitter Inspired Case Study



                                   @natishalom
Big Data Predictions




         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
                                                                2
The Two Vs of Big Data

         Velocity                                                   Volume




3            ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
We’re Living in a Real Time World…
        Social                           User Tracking &                 Homeland Security
                                          Engagement




      eCommerce                       Financial Services                 Real Time Search




4                 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
The Flavors of Big Data Analytics




       Counting                                Correlating               Research




5                 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Counting

     How many signups,
      tweets, retweets for a
      topic?
     What’s the average
      latency?
     Demographics
          Countries and cities
          Gender
          Age groups
          Device types
          …



6                     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Correlating

     What devices fail at the
      same time?
     What features get user
      hooked?
     What places on the
      globe are “happening”?




7                 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Research

     Sentiment analysis
        “Obama is popular”
     Trends
        “People like to tweet
         after watching
         American Idol”
     Spam patterns
        How can you tell
         when a user spams?




8                   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s All about Timing




       “Real time”                      Reasonably Quick                     Batch
     (< few Seconds)                   (seconds - minutes)                (hours/days)




9                  ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s All about Timing
               • Event driven / stream processing
               • High resolution – every tweet gets counted



               • Ad-hoc querying          This is what
               • Medium resolution (aggregations)
                                          we’re here              we’re here
                                                                  to discuss 
               • Long running batch jobs (ETL, map/reduce)
               • Low resolution (trends & patterns)

10         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Challenge – Word Count
        Tweets




11
                                  ?
          ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
                                                                       Count




                                                                 • URL mentions
                                                                 • etc.
                                                                                Word:Count




                                                                 • Hottest topics
URL Mentions – Here’s One Use Case




12        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



     It takes a week for users to
     send   1 billion Tweets.
                                                      Source: http://blog.twitter.com/2011/03/numbers.html

13          ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



                On average,
        140 million
     tweets get sent every day.
                                                    Source: http://blog.twitter.com/2011/03/numbers.html

14        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



         The highest
     throughput to date is
6,939 tweets/sec.
                                                    Source: http://blog.twitter.com/2011/03/numbers.html

15        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



      460,000 new
       accounts
        are created daily.
                                                    Source: http://blog.twitter.com/2011/03/numbers.html

16        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers




     5% of the users generate
      75% of the content.
                                                            Source: http://www.sysomos.com/insidetwitter/

17        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analyze the Problem

  (Tens of) thousands of tweets per second to
   process
      Assumption: Need to process in near real time
  Aggregate counters for each word
      A few 10s of thousands of words (or hundreds of
       thousands if we include URLs)
  System needs to linearly scale
  System needs to be fault tolerant


18            ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Key Elements in
                        Real Time Big Data Analytics




19   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Sharding (Partitioning)

           Tokenizer                 Counter
                       Filterer 1
              1                     Updater 1

           Tokenizer                 Counter
                       Filterer 2   Updater 2
              2
           Tokenizer                 Counter
                       Filterer 3
              3                     Updater 3




           Tokenizer                 Counter
                       Filterer n
               n                    Updater n
Keep Things In Memory

   Facebook keeps 80% of its
   data in Memory
   (Stanford research)

   RAM is 100-1000x faster
   than Disk (Random seek)
   • Disk: 5 -10ms
   • RAM: ~0.001msec
Use EDA (Event Driven Architecture)




22        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Putting it all together




23         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Know Your Toolset




24        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
References
  Writing your own twitter analytics:
   http://ht.ly/d8j4I
  Detailed blog post
     http://bit.ly/gs-bigdata-analytics
  Twitter in numbers:
     http://blog.twitter.com/2011/03/numbers.html
  Twitter Storm:
     http://bit.ly/twitter-storm
  Apache S4
     http://incubator.apache.org/s4/


25               ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
26   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Weitere ähnliche Inhalte

Mehr von Nati Shalom

Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Nati Shalom
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013
Nati Shalom
 
Disaster recovery on demand on the cloud
Disaster recovery on demand on the cloudDisaster recovery on demand on the cloud
Disaster recovery on demand on the cloud
Nati Shalom
 
Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)
Nati Shalom
 

Mehr von Nati Shalom (20)

What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the Summit
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & Tosca
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined Operator
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOps
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)
 
Application Centric Approach to Devops
Application Centric Approach to DevopsApplication Centric Approach to Devops
Application Centric Approach to Devops
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOps
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Disaster Recovery on Demand on the Cloud
Disaster Recovery on Demand on the CloudDisaster Recovery on Demand on the Cloud
Disaster Recovery on Demand on the Cloud
 
Disaster recovery on demand on the cloud
Disaster recovery on demand on the cloudDisaster recovery on demand on the cloud
Disaster recovery on demand on the cloud
 
Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)Giga spaces cloudify road map-3 (citi)
Giga spaces cloudify road map-3 (citi)
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Real Time Analytics for Big Data a Twiiter Case Study

  • 1. Real Time Analytics for Big Data A Twitter Inspired Case Study @natishalom
  • 2. Big Data Predictions ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 2
  • 3. The Two Vs of Big Data Velocity Volume 3 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 4. We’re Living in a Real Time World… Social User Tracking & Homeland Security Engagement eCommerce Financial Services Real Time Search 4 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 5. The Flavors of Big Data Analytics Counting Correlating Research 5 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 6. Analytics @ Twitter – Counting  How many signups, tweets, retweets for a topic?  What’s the average latency?  Demographics  Countries and cities  Gender  Age groups  Device types  … 6 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 7. Analytics @ Twitter – Correlating  What devices fail at the same time?  What features get user hooked?  What places on the globe are “happening”? 7 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 8. Analytics @ Twitter – Research  Sentiment analysis  “Obama is popular”  Trends  “People like to tweet after watching American Idol”  Spam patterns  How can you tell when a user spams? 8 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 9. It’s All about Timing “Real time” Reasonably Quick Batch (< few Seconds) (seconds - minutes) (hours/days) 9 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 10. It’s All about Timing • Event driven / stream processing • High resolution – every tweet gets counted • Ad-hoc querying This is what • Medium resolution (aggregations) we’re here we’re here to discuss  • Long running batch jobs (ETL, map/reduce) • Low resolution (trends & patterns) 10 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 11. Challenge – Word Count Tweets 11 ? ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved Count • URL mentions • etc. Word:Count • Hottest topics
  • 12. URL Mentions – Here’s One Use Case 12 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 13. Twitter in Numbers (March 2011) It takes a week for users to send 1 billion Tweets. Source: http://blog.twitter.com/2011/03/numbers.html 13 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 14. Twitter in Numbers (March 2011) On average, 140 million tweets get sent every day. Source: http://blog.twitter.com/2011/03/numbers.html 14 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 15. Twitter in Numbers (March 2011) The highest throughput to date is 6,939 tweets/sec. Source: http://blog.twitter.com/2011/03/numbers.html 15 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 16. Twitter in Numbers (March 2011) 460,000 new accounts are created daily. Source: http://blog.twitter.com/2011/03/numbers.html 16 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 17. Twitter in Numbers 5% of the users generate 75% of the content. Source: http://www.sysomos.com/insidetwitter/ 17 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 18. Analyze the Problem  (Tens of) thousands of tweets per second to process  Assumption: Need to process in near real time  Aggregate counters for each word  A few 10s of thousands of words (or hundreds of thousands if we include URLs)  System needs to linearly scale  System needs to be fault tolerant 18 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 19. Key Elements in Real Time Big Data Analytics 19 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 20. Sharding (Partitioning) Tokenizer Counter Filterer 1 1 Updater 1 Tokenizer Counter Filterer 2 Updater 2 2 Tokenizer Counter Filterer 3 3 Updater 3 Tokenizer Counter Filterer n n Updater n
  • 21. Keep Things In Memory Facebook keeps 80% of its data in Memory (Stanford research) RAM is 100-1000x faster than Disk (Random seek) • Disk: 5 -10ms • RAM: ~0.001msec
  • 22. Use EDA (Event Driven Architecture) 22 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 23. Putting it all together 23 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 24. Know Your Toolset 24 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 25. References  Writing your own twitter analytics: http://ht.ly/d8j4I  Detailed blog post http://bit.ly/gs-bigdata-analytics  Twitter in numbers: http://blog.twitter.com/2011/03/numbers.html  Twitter Storm: http://bit.ly/twitter-storm  Apache S4 http://incubator.apache.org/s4/ 25 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 26. 26 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Hinweis der Redaktion

  1. ActiveInsight