SlideShare ist ein Scribd-Unternehmen logo
1 von 11
How we process half a
billion mentions a day
      George & Shrikar
Agenda
 Who we are?

 Some numbers about our system

 Open-Source Technologies we use

 Architecture of the System

 Component Overview
Who We Are
 Social Media Analytics, Monitoring and
  Engagement Company
  (www.viralheat.com)

 We are based in San Mateo, CA
Data Crunched Daily
 In total we ingest around 1TB of Social Data
   every day to our infrastructure

 Social Data :
   Twitter, Facebook, Linkedin, Pinterest, Blogs
   etc.
How we manage it?
 Redis

 Mysql

 Riak

 ElasticSearch

 Memcache

 Storm (Real time data processing)

 Beanstalk
Data Pipeline
Crawlers

        Beanstalk


        Processor    Elastic
                     Search



          Memcache    Stats    Redis




 Storm Cluster        Riak
Deep Dive
 Processor tags Social Mention with
   Sentiment and Intent.

 Around 100 Million Social mentions every 5
   hours.

 Elasticsearch indexes and ranks the social
   data.

 Stats calculates the analytics for each
   keyword grouped by sentiment and intent.
Near Realtime
 We use Storm for near real time data
   pipeline.

 Benefits : Scalable, fault tolerant and easy to
   operate

 Easy to load and store data from existing
   databases/queues.
Q&A
Thank You
We are hiring!
www.viralheat.com/company/careers/

Weitere ähnliche Inhalte

Mehr von Open Analytics

M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsOpen Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital EconomyOpen Analytics
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Open Analytics
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Open Analytics
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...Open Analytics
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Open Analytics
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Open Analytics
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)Open Analytics
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYCOpen Analytics
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupOpen Analytics
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupOpen Analytics
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalOpen Analytics
 
Oas schwartz OA Summit
Oas schwartz OA SummitOas schwartz OA Summit
Oas schwartz OA SummitOpen Analytics
 
Luigi presentation OA Summit
Luigi presentation OA SummitLuigi presentation OA Summit
Luigi presentation OA SummitOpen Analytics
 
Intridea ajn-rttos OA NYC Summit
Intridea ajn-rttos OA NYC SummitIntridea ajn-rttos OA NYC Summit
Intridea ajn-rttos OA NYC SummitOpen Analytics
 
Open analytics summit nyc
Open analytics summit nycOpen analytics summit nyc
Open analytics summit nycOpen Analytics
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanycOpen Analytics
 

Mehr von Open Analytics (20)

M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital Economy
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics Meetup
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetup
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_final
 
HDScores OA DC Pitch
HDScores OA DC PitchHDScores OA DC Pitch
HDScores OA DC Pitch
 
Oas schwartz 16
Oas schwartz 16Oas schwartz 16
Oas schwartz 16
 
Oas schwartz OA Summit
Oas schwartz OA SummitOas schwartz OA Summit
Oas schwartz OA Summit
 
Luigi presentation OA Summit
Luigi presentation OA SummitLuigi presentation OA Summit
Luigi presentation OA Summit
 
Intridea ajn-rttos OA NYC Summit
Intridea ajn-rttos OA NYC SummitIntridea ajn-rttos OA NYC Summit
Intridea ajn-rttos OA NYC Summit
 
Open analytics summit nyc
Open analytics summit nycOpen analytics summit nyc
Open analytics summit nyc
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanyc
 

Processing half a billion mentions daily

  • 1. How we process half a billion mentions a day George & Shrikar
  • 2. Agenda  Who we are?  Some numbers about our system  Open-Source Technologies we use  Architecture of the System  Component Overview
  • 3. Who We Are  Social Media Analytics, Monitoring and Engagement Company (www.viralheat.com)  We are based in San Mateo, CA
  • 4. Data Crunched Daily  In total we ingest around 1TB of Social Data every day to our infrastructure  Social Data : Twitter, Facebook, Linkedin, Pinterest, Blogs etc.
  • 5. How we manage it?  Redis  Mysql  Riak  ElasticSearch  Memcache  Storm (Real time data processing)  Beanstalk
  • 6. Data Pipeline Crawlers Beanstalk Processor Elastic Search Memcache Stats Redis Storm Cluster Riak
  • 7. Deep Dive  Processor tags Social Mention with Sentiment and Intent.  Around 100 Million Social mentions every 5 hours.  Elasticsearch indexes and ranks the social data.  Stats calculates the analytics for each keyword grouped by sentiment and intent.
  • 8. Near Realtime  We use Storm for near real time data pipeline.  Benefits : Scalable, fault tolerant and easy to operate  Easy to load and store data from existing databases/queues.
  • 9. Q&A

Hinweis der Redaktion

  1. Talk about why each component is used.
  2. Individual component overview.
  3. Existing Spouts / Bolts. Like MysqlSpout , Redispubsub spout etc.