SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Streaming
Computing
Some thoughts and technology choices for event-driven processing
Natalino Busa - 29 Aug. 2013
Outline
● Concurrency
● Streaming computing
● Technologies
○ Gigaspaces
○ Storm
○ Akka
● Comparison matrix
● Opportunities
Algorithms: a tribute
Numbers and Algorithms:
9th century Persian Muslim
mathematician Abu Abdullah
Muhammad ibn Musa Al-Khwarizmi,
whose work built upon that of the 7th
century Indian mathematician
Brahmagupta.
We own a lot to these guys !!!
Why do we need parallelism?
It gets bigger,
It doesn’t get much faster
BUT
We get more cores in a chip.
More cores = more parallelism
We are happy now, right?
Moore’s law
Every 18 months, the number of CPU
core’s double
Another
interpretation:
Every 18 months, the number of idle
CPU core’s double
More parallelism
We trade:
Time vs ( CPU, Memory, I/O)
Modern applications
Scalability:
Vertical: concurrency
(use all the cores, memory and I/O of a given machine)
Horizontal: distribution
(use all the machines in the cluster)
High availability:
Fault tolerance: all levels (local, distributed)
(the terminator effect: you can stop it but can’t kill it )
Streaming applications
Performance:
Efficient use of resources:
CPU and memory, but also OS threads and sockets
Asynchronous:
event driven, reacts on new data
Distributed:
more machines = more performance
the algorithm is partitioned and/or replicated on the cluster
What to increase?
More CPU: It helps when there is
computation involved
More MEMORY: It helps when there is
more state to keep
More I/O: It helps when there are
more messages to transfer
Streaming or batch?
ProcessingData
Natalino Busa - 12 Feb. 2013
Data
source system target systemour system
What differentiate Streaming from Batch?
● Granularity of Data
● Granularity of Processing
Granularity impacts:
Throughput, Latency, and the Cost of the system!
The choice is yours
1000 events/sec (1 KB/event)
running on 100 cores all day long
“Wait a day, then process”
860 M events = 86 GB of data
Latency: 24 hours
Throughput: 1 update/day
BATCH: Hadoop
Latency 1ms
Throughput: 1000 updates/sec
STREAMING: Akka
“Do not wait”
Process the 1KB of data each msec.
“Both are valid options. It depends on the application domain and the
requirements/specs of the target and source systems”
Mapping it to existing applications
Granularity of
Data
256 GB 256 GB
Granularity of
Processing
1 CPU 100 CPU’s
Traditional DB systems Big Data (Hadoop)
Granularity of
Data
1 KB 1 KB
Granularity of
Processing
1 CPU 100 CPU’s
Traditional mail server Web application server
Technologies: Gigaspaces
Technologies: Storm
Topology
Supervising
Scaling
Technologies: Akka
Supervising:
tree of actors
Topology (statics and dynamic actors)
Scaling and
distributed processing
Technology matrix
GranularityofData Granularity of Processing
Small Big
Small Akka Akka
Gigaspaces
Big ? Storm
System end-to-end throughput
High ~ 10’000 events/sec Medium ~100 events/sec Low ~10 events/sec
Akka Storm/ Gigaspaces Scripting languages
Big Data in motion
Both are:
Distributed, fault-tolerant, streaming
- Storm
++ multi-language
-- not user/admin friendly
-- slow supervising
processing elements are jvm’s
ideal when data is coarse grained
- Akka
++ high throughput, fine grained actors
++ dynamic topologies
-- low-level, but high performance
processing elements are small and lightweight
ideal for millions of transactions per second
- Gigaspaces
++ combines memory + application distribution
-- framework api is not very flexible
processing elements are jvms
ideal for all-in-one solution, with little customization
Opportunity: Lambda Architecture
Logic layer
Software as a Service
e.g realt-time predictor
Natalino Busa - 12 Feb. 2013
from http://www.manning.com/marz/
Opportunity: Batch + Streaming
Batch
Computing
Front End Services
In-Memory
Distributed Database
In-memory
Distributed DB’s
Batch
Streaming
HTML5 Client / Responsive Applow-latency
HTTP API services FETCH
(refresh)
Streaming
Computing
Data Warehouses Messaging Busses
PUSH
(SSE, notifications)
Thanks
linkedin:
www.linkedin.com/in/natalinobusa
blog:
www.natalinobusa.com
twitter:
@natalinobusa

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
vm provisioning
vm provisioningvm provisioning
vm provisioning
 
Google File System
Google File SystemGoogle File System
Google File System
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computing
 
google file system
google file systemgoogle file system
google file system
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Database replication
Database replicationDatabase replication
Database replication
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 

Andere mochten auch

Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Surface Computing
Surface ComputingSurface Computing
Surface Computingshivu1234
 
Surface computing
Surface computingSurface computing
Surface computingChandan Jha
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafkaconfluent
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Surface computer ppt
Surface computer pptSurface computer ppt
Surface computer ppttejalc
 
Surface Computing
Surface ComputingSurface Computing
Surface Computingrandyp311
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 

Andere mochten auch (13)

Surface Computing & Devices
Surface Computing & DevicesSurface Computing & Devices
Surface Computing & Devices
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Surface Computing
Surface ComputingSurface Computing
Surface Computing
 
Surface computing
Surface computingSurface computing
Surface computing
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafka
 
Big Data
Big DataBig Data
Big Data
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Surface computer ppt
Surface computer pptSurface computer ppt
Surface computer ppt
 
Surface Computing
Surface ComputingSurface Computing
Surface Computing
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Ähnlich wie Streaming computing: architectures, and tchnologies

Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesGuy Coates
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesJamie Kinney
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architectureMr SMAK
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAmazon Web Services
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Overinside-BigData.com
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataZhong Wang
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)Ashok Rangaswamy
 

Ähnlich wie Streaming computing: architectures, and tchnologies (20)

Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomes
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Scientific Computing With Amazon Web Services
Scientific Computing With Amazon Web ServicesScientific Computing With Amazon Web Services
Scientific Computing With Amazon Web Services
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architecture
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Over
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
supercomputer
supercomputersupercomputer
supercomputer
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)
 

Mehr von Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsNatalino Busa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 

Mehr von Natalino Busa (19)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Data in Action
Data in ActionData in Action
Data in Action
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analytics
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Streaming computing: architectures, and tchnologies

  • 1. Streaming Computing Some thoughts and technology choices for event-driven processing Natalino Busa - 29 Aug. 2013
  • 2. Outline ● Concurrency ● Streaming computing ● Technologies ○ Gigaspaces ○ Storm ○ Akka ● Comparison matrix ● Opportunities
  • 3. Algorithms: a tribute Numbers and Algorithms: 9th century Persian Muslim mathematician Abu Abdullah Muhammad ibn Musa Al-Khwarizmi, whose work built upon that of the 7th century Indian mathematician Brahmagupta. We own a lot to these guys !!!
  • 4. Why do we need parallelism? It gets bigger, It doesn’t get much faster BUT We get more cores in a chip. More cores = more parallelism We are happy now, right?
  • 5. Moore’s law Every 18 months, the number of CPU core’s double Another interpretation: Every 18 months, the number of idle CPU core’s double
  • 6. More parallelism We trade: Time vs ( CPU, Memory, I/O)
  • 7. Modern applications Scalability: Vertical: concurrency (use all the cores, memory and I/O of a given machine) Horizontal: distribution (use all the machines in the cluster) High availability: Fault tolerance: all levels (local, distributed) (the terminator effect: you can stop it but can’t kill it )
  • 8. Streaming applications Performance: Efficient use of resources: CPU and memory, but also OS threads and sockets Asynchronous: event driven, reacts on new data Distributed: more machines = more performance the algorithm is partitioned and/or replicated on the cluster
  • 9. What to increase? More CPU: It helps when there is computation involved More MEMORY: It helps when there is more state to keep More I/O: It helps when there are more messages to transfer
  • 10. Streaming or batch? ProcessingData Natalino Busa - 12 Feb. 2013 Data source system target systemour system What differentiate Streaming from Batch? ● Granularity of Data ● Granularity of Processing Granularity impacts: Throughput, Latency, and the Cost of the system!
  • 11. The choice is yours 1000 events/sec (1 KB/event) running on 100 cores all day long “Wait a day, then process” 860 M events = 86 GB of data Latency: 24 hours Throughput: 1 update/day BATCH: Hadoop Latency 1ms Throughput: 1000 updates/sec STREAMING: Akka “Do not wait” Process the 1KB of data each msec. “Both are valid options. It depends on the application domain and the requirements/specs of the target and source systems”
  • 12. Mapping it to existing applications Granularity of Data 256 GB 256 GB Granularity of Processing 1 CPU 100 CPU’s Traditional DB systems Big Data (Hadoop) Granularity of Data 1 KB 1 KB Granularity of Processing 1 CPU 100 CPU’s Traditional mail server Web application server
  • 15. Technologies: Akka Supervising: tree of actors Topology (statics and dynamic actors) Scaling and distributed processing
  • 16. Technology matrix GranularityofData Granularity of Processing Small Big Small Akka Akka Gigaspaces Big ? Storm System end-to-end throughput High ~ 10’000 events/sec Medium ~100 events/sec Low ~10 events/sec Akka Storm/ Gigaspaces Scripting languages
  • 17. Big Data in motion Both are: Distributed, fault-tolerant, streaming - Storm ++ multi-language -- not user/admin friendly -- slow supervising processing elements are jvm’s ideal when data is coarse grained - Akka ++ high throughput, fine grained actors ++ dynamic topologies -- low-level, but high performance processing elements are small and lightweight ideal for millions of transactions per second - Gigaspaces ++ combines memory + application distribution -- framework api is not very flexible processing elements are jvms ideal for all-in-one solution, with little customization
  • 18. Opportunity: Lambda Architecture Logic layer Software as a Service e.g realt-time predictor Natalino Busa - 12 Feb. 2013 from http://www.manning.com/marz/
  • 19. Opportunity: Batch + Streaming Batch Computing Front End Services In-Memory Distributed Database In-memory Distributed DB’s Batch Streaming HTML5 Client / Responsive Applow-latency HTTP API services FETCH (refresh) Streaming Computing Data Warehouses Messaging Busses PUSH (SSE, notifications)