SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
YeezyScore
A comparison of stream
processing software
By: Kat Chuang
@katychuang
10 mins
High level overview
Kat Chuang @katychuang
Batch
Streaming
Microbatching
Storm Trident Spark Streaming
Released 2011 2010
Delivery
Semantics
Exactly Once Exactly once
State Management Yes Yes
Latency Seconds Seconds
Output MapState Resilient Distributed
Dataset (RDD)
Throughput 10k/nodes/sec? 400k/nodes/sec?
Test Cases Metrics
1. Does every message pass
through the pipeline?
2. How fast does each message
take to process?
Data
1. Timestamps
Kat Chuang @katychuang
Timestamp1 (Timestamp1,
Timestamp2)
(Timestamp1,
Timestamp2)
Timestamp1
Pipelines
Kat Chuang @katychuang
1. Does every message pass
through the pipeline?
Kat Chuang @katychuang
This is a scatterplot
2. How fast does each
message take to
process?
Kat Chuang @katychuang
This is a scatterplot
Storm Trident Vs Spark Streaming
Storm Trident Spark Streaming
Stream processing framework
that also does micro-batching.
Great for transforming or
computing as data flows in.
Complex event processing
(CEP), continuous computation.
Task-Parallel Computations, i.
e. reading Twitter streams
Batch processing framework
that also does micro-batching.
Great for combining with
historical data.
ML algos included. Requires
HDFS-backed data source.
Data-Parallel Computations, i.
e. offering recommendations
Kat Chuang
Data Engineering Fellow
#DE-2015c
hello@katychuang.com
Github: katychuang
Twitter: katychuang
IG: katychuang.nyc

Weitere ähnliche Inhalte

Was ist angesagt?

Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
 
VeriFlow Presentation
VeriFlow PresentationVeriFlow Presentation
VeriFlow PresentationKrystle Bates
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayQAware GmbH
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper diveRobert Kubiś
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examplesPeter Lawrey
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
Chronicle accelerate building a digital currency
Chronicle accelerate   building a digital currencyChronicle accelerate   building a digital currency
Chronicle accelerate building a digital currencyPeter Lawrey
 
Dataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayDataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayQAware GmbH
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftDmitry Sotnikov
 
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Steffen Gebert
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Sri Prasanna
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an actionGordon Chung
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed DebuggingAnant Narayanan
 
Metrics lightning talk
Metrics lightning talkMetrics lightning talk
Metrics lightning talkChris Lohfink
 
Batch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fastBatch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fastMarc Sturlese
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamVerverica
 

Was ist angesagt? (20)

Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, Confluent
 
VeriFlow Presentation
VeriFlow PresentationVeriFlow Presentation
VeriFlow Presentation
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Chronicle accelerate building a digital currency
Chronicle accelerate   building a digital currencyChronicle accelerate   building a digital currency
Chronicle accelerate building a digital currency
 
Clocks
ClocksClocks
Clocks
 
Dataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayDataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice Way
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack Swift
 
Spanner osdi2012
Spanner osdi2012Spanner osdi2012
Spanner osdi2012
 
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...
 
Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed Debugging
 
Metrics lightning talk
Metrics lightning talkMetrics lightning talk
Metrics lightning talk
 
Batch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fastBatch Indexing & Near Real Time, keeping things fast
Batch Indexing & Near Real Time, keeping things fast
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Bathcamp 2010-riak
Bathcamp 2010-riakBathcamp 2010-riak
Bathcamp 2010-riak
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 

Andere mochten auch

Comparison of supportive interactions
Comparison of supportive interactionsComparison of supportive interactions
Comparison of supportive interactionsKat Chuang
 
DjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clientsDjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clientsKat Chuang
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defenseKat Chuang
 
NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013Kat Chuang
 
Python talk web frameworks
Python talk web frameworksPython talk web frameworks
Python talk web frameworksKat Chuang
 

Andere mochten auch (6)

Seven
SevenSeven
Seven
 
Comparison of supportive interactions
Comparison of supportive interactionsComparison of supportive interactions
Comparison of supportive interactions
 
DjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clientsDjangoCon 2013 - Rapid prototyping and communicating with clients
DjangoCon 2013 - Rapid prototyping and communicating with clients
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013NYC Pyladies talk May 2, 2013
NYC Pyladies talk May 2, 2013
 
Python talk web frameworks
Python talk web frameworksPython talk web frameworks
Python talk web frameworks
 

Ähnlich wie Insight DE project

strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptrveiga100
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration HellDatabricks
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in SparkReynold Xin
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in SparkDatabricks
 
Tecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de DatosTecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de DatosAngel Giraldo
 
Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)NetProtocol Xpert
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptAbhijitManna19
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptsnowflakebatch
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingShidrokhGoudarzi1
 
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...Zhen Ming (Jack) Jiang
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzaAbhishek Shivanna
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
 

Ähnlich wie Insight DE project (20)

strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration Hell
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Malstone KDD 2010
Malstone KDD 2010Malstone KDD 2010
Malstone KDD 2010
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in Spark
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in Spark
 
Tecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de DatosTecnicas e Instrumentos de Recoleccion de Datos
Tecnicas e Instrumentos de Recoleccion de Datos
 
Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP)
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
strata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streamingstrata spark streaming strata spark streamingsrata spark streaming
strata spark streaming strata spark streamingsrata spark streaming
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
A Framework to Evaluate the Effectiveness of Different Load Testing Analysis ...
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 

Mehr von Kat Chuang

Android NFC Nuance
Android NFC NuanceAndroid NFC Nuance
Android NFC NuanceKat Chuang
 
rheumatological diseases
rheumatological diseasesrheumatological diseases
rheumatological diseasesKat Chuang
 
Mayans and Chocolate
Mayans and ChocolateMayans and Chocolate
Mayans and ChocolateKat Chuang
 
Nurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social NetworkingNurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social NetworkingKat Chuang
 
Revolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceRevolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceKat Chuang
 
Candidacy Exam
Candidacy ExamCandidacy Exam
Candidacy ExamKat Chuang
 
How to become a top notch scholar
How to become a top notch scholarHow to become a top notch scholar
How to become a top notch scholarKat Chuang
 
Helping you to help me (slides)
Helping you to help me (slides)Helping you to help me (slides)
Helping you to help me (slides)Kat Chuang
 
Helping you to help me
Helping you to help meHelping you to help me
Helping you to help meKat Chuang
 

Mehr von Kat Chuang (10)

Android NFC Nuance
Android NFC NuanceAndroid NFC Nuance
Android NFC Nuance
 
rheumatological diseases
rheumatological diseasesrheumatological diseases
rheumatological diseases
 
Mayans and Chocolate
Mayans and ChocolateMayans and Chocolate
Mayans and Chocolate
 
Nurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social NetworkingNurturant Support in Online Health Social Networking
Nurturant Support in Online Health Social Networking
 
Revolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experienceRevolutionizing the hypatia metadata experience
Revolutionizing the hypatia metadata experience
 
Candidacy Exam
Candidacy ExamCandidacy Exam
Candidacy Exam
 
How to become a top notch scholar
How to become a top notch scholarHow to become a top notch scholar
How to become a top notch scholar
 
Prospectus
Prospectus Prospectus
Prospectus
 
Helping you to help me (slides)
Helping you to help me (slides)Helping you to help me (slides)
Helping you to help me (slides)
 
Helping you to help me
Helping you to help meHelping you to help me
Helping you to help me
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Insight DE project

  • 1. YeezyScore A comparison of stream processing software By: Kat Chuang @katychuang
  • 3. High level overview Kat Chuang @katychuang Batch Streaming Microbatching Storm Trident Spark Streaming Released 2011 2010 Delivery Semantics Exactly Once Exactly once State Management Yes Yes Latency Seconds Seconds Output MapState Resilient Distributed Dataset (RDD) Throughput 10k/nodes/sec? 400k/nodes/sec?
  • 4. Test Cases Metrics 1. Does every message pass through the pipeline? 2. How fast does each message take to process? Data 1. Timestamps Kat Chuang @katychuang
  • 6. 1. Does every message pass through the pipeline? Kat Chuang @katychuang This is a scatterplot
  • 7. 2. How fast does each message take to process? Kat Chuang @katychuang This is a scatterplot
  • 8. Storm Trident Vs Spark Streaming Storm Trident Spark Streaming Stream processing framework that also does micro-batching. Great for transforming or computing as data flows in. Complex event processing (CEP), continuous computation. Task-Parallel Computations, i. e. reading Twitter streams Batch processing framework that also does micro-batching. Great for combining with historical data. ML algos included. Requires HDFS-backed data source. Data-Parallel Computations, i. e. offering recommendations
  • 9. Kat Chuang Data Engineering Fellow #DE-2015c hello@katychuang.com Github: katychuang Twitter: katychuang IG: katychuang.nyc