SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Realtime

Distributed Analysis

of Datastreams

Philipp Nolte – University of Passau – January 2014

1
Learn
Why we need fancy Big Data frameworks.
How the lambda architecture looks like.
How twitter used to do real-time analytics.
Why twitter created Storm.
How Storm works.
2
Limits

Imagine a traditional web analytics software:
Every page view increments

the url’s database row.

3
First Aid

Queue your writes and write in batches.
Shard your data: Partition horizontally.

4
Chronic Issues
Fault-tolerance is hard.
Applications become more and more complex.
You have to do all the work.

5
New Tools
Large scale computation systems such as Hadoop.
Scalable databases such as Casandra and Riak.
Easy to use frameworks such as Storm and Dempsy.

6
Lambda Architecture
Theoretical, abstract architecture for working with big data.

Speed Layer
Serving Layer
Batch Layer
7
Goal

Compute arbitrary functions on arbitrary data.
query = function ( all data )

8
Properties
Robust and fault-tolerant.
Low latency reads and updates.
Scalable.
Minimal maintenance.

9
Batch Layer

Speed Layer
Serving Layer

Stores the immutable master dataset.
Precomputes arbitrary batch views.
Home of batch processing and map

reduce systems such as Hadoop.

10

Batch Layer
Serving Layer

Speed Layer
Serving Layer
Batch Layer

Read-only random-access to batch views.
Updated by batch layer.
Indexes batch views.
Home of real-time query systems

such as Cloudera Impala for Hadoop.
11
Speed Layer

Speed Layer
Serving Layer
Batch Layer

Compensates for high-latency batch views.
Fast, incremental algorithms.
More complex because of random-writes.
Home of Apache HBase or Storm.

12
Lambda Architecture
Speed Layer
Realtime Views
Batch Views

Data

Serving Layer
Batch Layer
13

Query
Available Data
Batch View
Time

Realtime View
Discard Realtime View

as soon as it is represented
in the batch view.

Batch View

Realtime View

14
Twitter’s Early Days
Worker

Queue

Queue

Worker

Worker

Queue

Worker

Map

Queue

Worker

Queue

Worker

Tweets

Worker

Queue

Worker

URLs
Hadoop

Cassandra
15
Storm
Guaranteed message processing without

message brokers.
Horizontal scalability.
Fault-tolerance.
High level of abstraction.
Just works.
16
Storm Topologies
Stream

Spout

⚡️Bolt

⚡️Bolt

Spout

⚡️Bolt

⚡️Bolt

17
Parallel Tasks
Task

Spout
T

T

⚡️Bolt
T

Spout
T

Stream

T

⚡️Bolt

T

⚡️Bolt
T

18

T

T

⚡️Bolt
T

T

T
Demo

Storm in action

19
Know
Why we need fancy Big Data frameworks.
How the lambda architecture looks like.
How twitter used to do real-time analytics.
Why twitter created Storm.
How Storm works.
20
The End.

Questions?

21

Weitere ähnliche Inhalte

Was ist angesagt?

Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013
Christopher Curtin
 

Was ist angesagt? (20)

Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day Cambridge
 
Final deck
Final deckFinal deck
Final deck
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
Big Data - Part IV
Big Data - Part IVBig Data - Part IV
Big Data - Part IV
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
What's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache SparkWhat's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache Spark
 
Realtime search
Realtime searchRealtime search
Realtime search
 
Big Data - Part III
Big Data - Part IIIBig Data - Part III
Big Data - Part III
 
Ajug april 2011
Ajug april 2011Ajug april 2011
Ajug april 2011
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
 
Big Data - Part II
Big Data - Part IIBig Data - Part II
Big Data - Part II
 
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the CloudAmazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013
 
Big Data - Part I
Big Data - Part IBig Data - Part I
Big Data - Part I
 

Andere mochten auch (8)

Presentació tdr Ignasi Güell
Presentació tdr Ignasi GüellPresentació tdr Ignasi Güell
Presentació tdr Ignasi Güell
 
Fortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative FilteringFortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative Filtering
 
Robustheit in Empfehlungssystemen
Robustheit in EmpfehlungssystemenRobustheit in Empfehlungssystemen
Robustheit in Empfehlungssystemen
 
Trust-based recommender systems
Trust-based recommender systemsTrust-based recommender systems
Trust-based recommender systems
 
Ansätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches FilteringAnsätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches Filtering
 
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für EmpfehlungssystemeTrust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
 
Effiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen DatenmengenEffiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen Datenmengen
 
Profile Injection Attack Detection in Recommender System
Profile Injection Attack Detection in Recommender SystemProfile Injection Attack Detection in Recommender System
Profile Injection Attack Detection in Recommender System
 

Ähnlich wie Realtime
 Distributed Analysis
 of Datastreams

Ähnlich wie Realtime
 Distributed Analysis
 of Datastreams (20)

Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworks
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 

Mehr von Florian Stegmaier

Mehr von Florian Stegmaier (10)

Musikempfehlungssysteme
MusikempfehlungssystemeMusikempfehlungssysteme
Musikempfehlungssysteme
 
Linked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für EmpfehlungssystemeLinked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für Empfehlungssysteme
 
Entscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender SystemEntscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender System
 
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem FilternFunktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
 
Context Basierte Personalisierungsansätze
Context Basierte PersonalisierungsansätzeContext Basierte Personalisierungsansätze
Context Basierte Personalisierungsansätze
 
Evaluierung von Empfehlungssystemen
Evaluierung von EmpfehlungssystemenEvaluierung von Empfehlungssystemen
Evaluierung von Empfehlungssystemen
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBC
 
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
 
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Realtime
 Distributed Analysis
 of Datastreams