SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
“If everything seems
  under control, you're
not going fast enough.”
realtime analysis of #debate hashtag




                  Davide Palmisano @dpalmisano
when size matters: the
  4Vs of   big data

   Volume, Velocity, Variety,
   and Veracity
let’s focus on   Velocity
during peak time ~35
              persons/second top up
              their Oyster card*




http://www.tfl.gov.uk/corporate/modesoftransport/londonunderground/1608.aspx
every second ~58 new
   pictures are uploaded on
                   Instagram*




http://www.digitalbuzzblog.com/infographic-instagram-stats/
the night of the first
#debate,      2615 tweets
           per second have
                been recorded*


http://www.nbcnews.com/technology/technolog/presidential-debate-sets-twitter-record-6281796
What have been the most
  influential URLs ?
What have been the   implicit
 concepts underlying the
       conversation?
How these concepts
evolved during the
    discussion?
every single tweet
potentially contains some
   hidden information
extract such information,
   making it explicit,
     analysing it
 and doing it at a rate of
   ~2000 tweets/sec?
real-time analytics

Storm,       a free and open source
   distributed realtime computation
   system. Storm makes it easy to
        reliably process unbounded
streams of data, doing for realtime
   processing what Hadoop did for
                    batch processing.
batch analyses

The Apache Hadoop software library is a
framework that allows for the distributed
  processing of large data sets across
   clusters of computers using simple
          programming models.

         + hdfs, a distributed FS
data gathering from the Social Web




    crunching the Social Web, in real-time.



formerly known as          Beancounter
beancounter.io is a SaaS
  platform to profile your
users from their activities on
      the Social Web
now powering part
of the Italian public
        broadcaster
            #socialtv
       environment
(a quick parenthesis)

                                                                                or ...

   “how a butterfly flapping
     its wings in Asia might
   cause a hurricane in the
                    Atlantic”                                                       *
http://www.amazon.com/Strategic-Thinking-New-Science-Complexity/dp/0684842688
beancounter.io uses Twitter
  OAuth authorisation to
 perform TV Social events
        check-ins
while beancounter.io was
         handling more than ~100
          check-ins per minute

       at 13.32 UTC-8 Twitter had
                            an         outage *
https://status.io.watchmouse.com/7617/125017//statuses/home_timeline-(OAuth-1.0a)
Facebook and Twitter check-ins rate


                             Nov 6, 2012 13:32 UTC-8      twitter service disruption
                                                                                                                      200




                                                                                                                    150




                                                                                                                100




                                                                                                               50


2012-11-06T20:45:01.690984
                             2012-11-06T21:40:03.615521

                                                          2012-11-06T22:35:04.645506                       0


                                                                                       2012-11-06T23:30:05.627388
Facebook and Twitter overall comments
                                   Nov 6, 2012 13:32 UTC-8                         twitter service disruption

                                                                                                                               1500




                                                                                                                              1125




                                                                                                                         750




                                                                                                                        375




2012-11-06T20:45:01.690984
                                                                                                                    0
                             2012-11-06T21:30:02.861083

                                                          2012-11-06T22:15:04.455317

                                                                                       2012-11-06T23:00:05.432714




                                                                                              Facebook              Twitter
lesson learnt: the real-time
Web is an hyper-connected
graph of a myriad of di!erent
        live systems


 always mind the butterflies,
 even if you can’t see them
back to #debate
<timestamp, <c0...cn>>

concepts are extracted using NLP
  technologies for each tweet
we’ve tied together beancounter.io,
              Storm and Hadoop


  please note, this was only the
       10% of the firehose


                                                  real-time analytics


hdfs, distributed FS

                                                Storm
                              batch analytics
more than ~ 500k tweets
processed in 2h for an average
      rate of ~70 t/sec

    each tweet produced a
  snapshot (~10k each) for an
 overall size of 4.6GB of data
more than ~18k
    di!erent URLs shared


 highest peak: 253 tweets/sec


5 amazon EC2 x-large instance
    + 2 mid-sized for HDFS
recurring concepts

                                                                                                70000




                                                                                            52500




                                                                                         35000




                                                                                        17500




Osama Bin Laden
             Iran
                    Israel                                                          0
                             Middle East
                                           Pakistan
                                                      Iraq
                                                             Afghanistan
                                                                           Russia
most co-occurrent concepts

      Iran - Israel 35.356 %
 Russia - Middle East 24.7 %
                 ...
                 ...
Wikileaks - Richard Nixon 93.5%
5321




17284   6960
facts
  data viz is a completely another job


mining data requires science skills, it’s not
 just about technology: it’s about math

 forget to control everything when data
   flows at that speed: make reasoned
             approximations
?
Davide Palmisano
@dpalmisano
http://davidepalmisano.com

Weitere ähnliche Inhalte

Mehr von Davide Palmisano

beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz Davide Palmisano
 
NoTube: past, present and future
NoTube: past, present and futureNoTube: past, present and future
NoTube: past, present and futureDavide Palmisano
 
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?Davide Palmisano
 
distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)Davide Palmisano
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upDavide Palmisano
 
NoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social WebNoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social WebDavide Palmisano
 

Mehr von Davide Palmisano (7)

beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz
 
NoTube: past, present and future
NoTube: past, present and futureNoTube: past, present and future
NoTube: past, present and future
 
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
 
distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
Unwinding The Twine
Unwinding The TwineUnwinding The Twine
Unwinding The Twine
 
NoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social WebNoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social Web
 

Kürzlich hochgeladen

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

If everything seems under control, you're not going fast enough

  • 1. “If everything seems under control, you're not going fast enough.” realtime analysis of #debate hashtag Davide Palmisano @dpalmisano
  • 2. when size matters: the 4Vs of big data Volume, Velocity, Variety, and Veracity
  • 3. let’s focus on Velocity
  • 4. during peak time ~35 persons/second top up their Oyster card* http://www.tfl.gov.uk/corporate/modesoftransport/londonunderground/1608.aspx
  • 5. every second ~58 new pictures are uploaded on Instagram* http://www.digitalbuzzblog.com/infographic-instagram-stats/
  • 6. the night of the first #debate, 2615 tweets per second have been recorded* http://www.nbcnews.com/technology/technolog/presidential-debate-sets-twitter-record-6281796
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. What have been the most influential URLs ?
  • 13. What have been the implicit concepts underlying the conversation?
  • 14. How these concepts evolved during the discussion?
  • 15. every single tweet potentially contains some hidden information
  • 16. extract such information, making it explicit, analysing it and doing it at a rate of ~2000 tweets/sec?
  • 17. real-time analytics Storm, a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
  • 18. batch analyses The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. + hdfs, a distributed FS
  • 19. data gathering from the Social Web crunching the Social Web, in real-time. formerly known as Beancounter
  • 20. beancounter.io is a SaaS platform to profile your users from their activities on the Social Web
  • 21. now powering part of the Italian public broadcaster #socialtv environment
  • 22. (a quick parenthesis) or ... “how a butterfly flapping its wings in Asia might cause a hurricane in the Atlantic” * http://www.amazon.com/Strategic-Thinking-New-Science-Complexity/dp/0684842688
  • 23. beancounter.io uses Twitter OAuth authorisation to perform TV Social events check-ins
  • 24. while beancounter.io was handling more than ~100 check-ins per minute at 13.32 UTC-8 Twitter had an outage * https://status.io.watchmouse.com/7617/125017//statuses/home_timeline-(OAuth-1.0a)
  • 25. Facebook and Twitter check-ins rate Nov 6, 2012 13:32 UTC-8 twitter service disruption 200 150 100 50 2012-11-06T20:45:01.690984 2012-11-06T21:40:03.615521 2012-11-06T22:35:04.645506 0 2012-11-06T23:30:05.627388
  • 26. Facebook and Twitter overall comments Nov 6, 2012 13:32 UTC-8 twitter service disruption 1500 1125 750 375 2012-11-06T20:45:01.690984 0 2012-11-06T21:30:02.861083 2012-11-06T22:15:04.455317 2012-11-06T23:00:05.432714 Facebook Twitter
  • 27. lesson learnt: the real-time Web is an hyper-connected graph of a myriad of di!erent live systems always mind the butterflies, even if you can’t see them
  • 29. <timestamp, <c0...cn>> concepts are extracted using NLP technologies for each tweet
  • 30. we’ve tied together beancounter.io, Storm and Hadoop please note, this was only the 10% of the firehose real-time analytics hdfs, distributed FS Storm batch analytics
  • 31. more than ~ 500k tweets processed in 2h for an average rate of ~70 t/sec each tweet produced a snapshot (~10k each) for an overall size of 4.6GB of data
  • 32. more than ~18k di!erent URLs shared highest peak: 253 tweets/sec 5 amazon EC2 x-large instance + 2 mid-sized for HDFS
  • 33. recurring concepts 70000 52500 35000 17500 Osama Bin Laden Iran Israel 0 Middle East Pakistan Iraq Afghanistan Russia
  • 34. most co-occurrent concepts Iran - Israel 35.356 % Russia - Middle East 24.7 % ... ... Wikileaks - Richard Nixon 93.5%
  • 35. 5321 17284 6960
  • 36. facts data viz is a completely another job mining data requires science skills, it’s not just about technology: it’s about math forget to control everything when data flows at that speed: make reasoned approximations
  • 37. ?