SlideShare a Scribd company logo
1 of 11
Apache Kafka
• 2.1 Trillion messages ingested per day
• 0.5 PB in, 2 PB out per day (compressed)
• 16 million msg/sec peaks
Apache Samza
• Over 500 applications running in production,
• With 10000+ containers
• Applications with several TB of local state
1
Scale of Event Processing at LinkedIn
Best in Class Support for
Stateful Stream Processing
• Incremental checkpointing for large state and
fast recovery.
• Local state that works seamlessly across
upgrades and failures.
• Async Processing for efficient remote I/O
Hardened at Internet Scale
• In use at LinkedIn, Uber, Netflix, Intuit,
Metamarkets, TripAdvisor, VMWare, Optimizely,
Redfin, etc.
• Processing events from Kafka, Kinesis, EventHub,
HDFS, ZeroMQ, DynamoDB Streams, MongoDB,
Databus, Brooklin etc.
Why Apache Samza ?
2
Unified API For Stream and Batch
Processing
• Process data in streams or in hadoop without any
code changes.
Run as a Service or a Library
• Write once run anywhere.
• Deploy in a managed cluster, or embed as a
library in another application.
Stream (data in motion) Processing
• Click Stream Processing, Interactive User Feeds
• Security, Fraud Detection
• Application Monitoring
• Internet of Things
• Ads, Gaming, Trading etc.
Security
3
Multi-Stage Dataflow Example
4
Page View
in stream
Page View per Member
out stream
Repartition
by member id
Window Map SendTo
public class PageViewCountApplication implements StreamApplication {
@Override public void init(StreamGraph graph, Config config) {
MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageViewStream" );
MessageStream pageViewPerMember = graph.getOutputStream("pageViewPerMemberStream" );
pageView
.partitionBy(m -> m.memberId)
.window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5),
initialValue, (m, c) -> c + 1))
.map(MyStreamOutput::new)
.sendTo(pageViewPerMember);
}
}
built-in
transform
functions
Stream Application in Batch
Application logic: Count number of ‘Page Views’ for each member in a 5 minute
window and send the counts to ‘Page View Per Member’
5
Page View
in stream
Page View per Member
out stream
Repartition
by member id
Window Map SendTo
HDFS
PageView: hdfs://mydbsnapshot/PageViewFiles/
PageViewPerMember: hdfs://myoutputdb/PageViewPerMemberFiles Zero code changes
Stream Processing as a Library
6
Page View Page View per Member
Repartition
by member id
Window Map SendTo
Launch Stream Processor
public static void main(String[] args) {
CommandLine cmdLine = new CommandLine();
OptionSet options = cmdLine.parser().parse(args);
Config config = cmdLine.loadConfig(options);
LocalApplicationRunner runner = new
LocalApplicationRunner(config);
PageViewCountApplication app = new
PageViewCountApplication();
runner.run(app);
runner.waitForFinish();
}
job.coordinator.factory=org.apache.samza.zk.
ZkJobCoordinatorFactory
job.coordinator.zk.connect=my-zk.server:2191
Zero code changes
Apache
Kafka
Real Time Processing
(Apache Samza)
Processing
Espresso
Services Tier
Ingestion
Clients(browser,devices ….)
Brooklin
Oracle
AWS
Kinesis
Azure
EventHub
Data Ingestion at LinkedIn
7
Backup
8
Local State -- Throughput
9
remote state 30-150x
worse than local state
on disk w/ caching
comparable with in memory
changelog adds minimal
overhead
Failure Recovery
10
~ constant overhead with
Host Affinity
parallel recovery:
equal recovery time
irrespective of # failed
containers
Samza HDFS Benchmark
Profile count,
group-by country
500 files
250GB input

More Related Content

What's hot

Nick Raienko ''Service-oriented GraphQL''
Nick Raienko ''Service-oriented GraphQL''Nick Raienko ''Service-oriented GraphQL''
Nick Raienko ''Service-oriented GraphQL''OdessaJS Conf
 
Charla ro 2
Charla ro 2Charla ro 2
Charla ro 2GeneXus
 
Flowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification systemFlowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification systemFlowable
 
Timur Shemsedinov "Эволюция архитектуры ИС"
Timur Shemsedinov "Эволюция архитектуры ИС"Timur Shemsedinov "Эволюция архитектуры ИС"
Timur Shemsedinov "Эволюция архитектуры ИС"OdessaJS Conf
 
CMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engagingCMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engagingFlowable
 
Levelling up in Akka
Levelling up in AkkaLevelling up in Akka
Levelling up in AkkaSigmoid
 
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB
 
Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained  Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained Markus Eisele
 
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give upDevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give upDevOps_Fest
 
Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18MongoDB
 
Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events Flowable
 
モダンなアプリ設計っぽい話
モダンなアプリ設計っぽい話モダンなアプリ設計っぽい話
モダンなアプリ設計っぽい話susan335
 
Softwerkskammer Lübeck 08/2018 Event Sourcing and CQRS
Softwerkskammer Lübeck 08/2018 Event Sourcing and CQRSSoftwerkskammer Lübeck 08/2018 Event Sourcing and CQRS
Softwerkskammer Lübeck 08/2018 Event Sourcing and CQRSDaniel Bimschas
 
State management in react applications (Statecharts)
State management in react applications (Statecharts)State management in react applications (Statecharts)
State management in react applications (Statecharts)Tomáš Drenčák
 
Serverless JavaScript
Serverless JavaScriptServerless JavaScript
Serverless JavaScriptgojkoadzic
 
State Management in Angular/React
State Management in Angular/ReactState Management in Angular/React
State Management in Angular/ReactDEV Cafe
 
Samza tech talk_2015 - strata
Samza tech talk_2015 - strataSamza tech talk_2015 - strata
Samza tech talk_2015 - strataYi Pan
 

What's hot (19)

GCF Application server
GCF Application serverGCF Application server
GCF Application server
 
Nick Raienko ''Service-oriented GraphQL''
Nick Raienko ''Service-oriented GraphQL''Nick Raienko ''Service-oriented GraphQL''
Nick Raienko ''Service-oriented GraphQL''
 
Charla ro 2
Charla ro 2Charla ro 2
Charla ro 2
 
Flowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification systemFlowable: Building a crowd sourced document extraction and verification system
Flowable: Building a crowd sourced document extraction and verification system
 
Timur Shemsedinov "Эволюция архитектуры ИС"
Timur Shemsedinov "Эволюция архитектуры ИС"Timur Shemsedinov "Эволюция архитектуры ИС"
Timur Shemsedinov "Эволюция архитектуры ИС"
 
CMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engagingCMMN makes BPMN smarter and engaging
CMMN makes BPMN smarter and engaging
 
Levelling up in Akka
Levelling up in AkkaLevelling up in Akka
Levelling up in Akka
 
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
 
Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained  Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained
 
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give upDevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up
DevOps Fest 2019. Игорь Фесенко. DevOps: Be good, Get good or Give up
 
Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18
 
Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events Flowable Business Processing from Kafka Events
Flowable Business Processing from Kafka Events
 
モダンなアプリ設計っぽい話
モダンなアプリ設計っぽい話モダンなアプリ設計っぽい話
モダンなアプリ設計っぽい話
 
Softwerkskammer Lübeck 08/2018 Event Sourcing and CQRS
Softwerkskammer Lübeck 08/2018 Event Sourcing and CQRSSoftwerkskammer Lübeck 08/2018 Event Sourcing and CQRS
Softwerkskammer Lübeck 08/2018 Event Sourcing and CQRS
 
State management in react applications (Statecharts)
State management in react applications (Statecharts)State management in react applications (Statecharts)
State management in react applications (Statecharts)
 
Serverless JavaScript
Serverless JavaScriptServerless JavaScript
Serverless JavaScript
 
State Management in Angular/React
State Management in Angular/ReactState Management in Angular/React
State Management in Angular/React
 
Realm integration
Realm integrationRealm integration
Realm integration
 
Samza tech talk_2015 - strata
Samza tech talk_2015 - strataSamza tech talk_2015 - strata
Samza tech talk_2015 - strata
 

Similar to Samza Demo @scale 2017

Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaPrateek Maheshwari
 
Samza 0.13 meetup slide v1.0.pptx
Samza 0.13 meetup slide   v1.0.pptxSamza 0.13 meetup slide   v1.0.pptx
Samza 0.13 meetup slide v1.0.pptxYi Pan
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkShashank Gautam
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextPrateek Maheshwari
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
ADF and JavaScript - AMIS SIG, July 2017
ADF and JavaScript - AMIS SIG, July 2017ADF and JavaScript - AMIS SIG, July 2017
ADF and JavaScript - AMIS SIG, July 2017Lucas Jellema
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache KafkaBuilding event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache KafkaGuido Schmutz
 
GigaSpaces PAAS For Cloud Based Java Applications
GigaSpaces PAAS For Cloud Based Java ApplicationsGigaSpaces PAAS For Cloud Based Java Applications
GigaSpaces PAAS For Cloud Based Java ApplicationsIndicThreads
 
Real time Communication with Signalr (Android Client)
Real time Communication with Signalr (Android Client)Real time Communication with Signalr (Android Client)
Real time Communication with Signalr (Android Client)Deepak Gupta
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache KafkaMatthias J. Sax
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Sviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptxSviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptxAmazon Web Services
 
Building Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaBuilding Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaGuido Schmutz
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
 
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...Kai Wähner
 
Building event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka EcosystemBuilding event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka EcosystemGuido Schmutz
 

Similar to Samza Demo @scale 2017 (20)

Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache Samza
 
Samza 0.13 meetup slide v1.0.pptx
Samza 0.13 meetup slide   v1.0.pptxSamza 0.13 meetup slide   v1.0.pptx
Samza 0.13 meetup slide v1.0.pptx
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
ADF and JavaScript - AMIS SIG, July 2017
ADF and JavaScript - AMIS SIG, July 2017ADF and JavaScript - AMIS SIG, July 2017
ADF and JavaScript - AMIS SIG, July 2017
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache KafkaBuilding event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka
 
GigaSpaces PAAS For Cloud Based Java Applications
GigaSpaces PAAS For Cloud Based Java ApplicationsGigaSpaces PAAS For Cloud Based Java Applications
GigaSpaces PAAS For Cloud Based Java Applications
 
Real time Communication with Signalr (Android Client)
Real time Communication with Signalr (Android Client)Real time Communication with Signalr (Android Client)
Real time Communication with Signalr (Android Client)
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache Kafka
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Sviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptxSviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptx
 
Building Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaBuilding Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache Kafka
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
 
Building event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka EcosystemBuilding event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka Ecosystem
 

Recently uploaded

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 

Recently uploaded (20)

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 

Samza Demo @scale 2017

  • 1. Apache Kafka • 2.1 Trillion messages ingested per day • 0.5 PB in, 2 PB out per day (compressed) • 16 million msg/sec peaks Apache Samza • Over 500 applications running in production, • With 10000+ containers • Applications with several TB of local state 1 Scale of Event Processing at LinkedIn
  • 2. Best in Class Support for Stateful Stream Processing • Incremental checkpointing for large state and fast recovery. • Local state that works seamlessly across upgrades and failures. • Async Processing for efficient remote I/O Hardened at Internet Scale • In use at LinkedIn, Uber, Netflix, Intuit, Metamarkets, TripAdvisor, VMWare, Optimizely, Redfin, etc. • Processing events from Kafka, Kinesis, EventHub, HDFS, ZeroMQ, DynamoDB Streams, MongoDB, Databus, Brooklin etc. Why Apache Samza ? 2 Unified API For Stream and Batch Processing • Process data in streams or in hadoop without any code changes. Run as a Service or a Library • Write once run anywhere. • Deploy in a managed cluster, or embed as a library in another application.
  • 3. Stream (data in motion) Processing • Click Stream Processing, Interactive User Feeds • Security, Fraud Detection • Application Monitoring • Internet of Things • Ads, Gaming, Trading etc. Security 3
  • 4. Multi-Stage Dataflow Example 4 Page View in stream Page View per Member out stream Repartition by member id Window Map SendTo public class PageViewCountApplication implements StreamApplication { @Override public void init(StreamGraph graph, Config config) { MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageViewStream" ); MessageStream pageViewPerMember = graph.getOutputStream("pageViewPerMemberStream" ); pageView .partitionBy(m -> m.memberId) .window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5), initialValue, (m, c) -> c + 1)) .map(MyStreamOutput::new) .sendTo(pageViewPerMember); } } built-in transform functions
  • 5. Stream Application in Batch Application logic: Count number of ‘Page Views’ for each member in a 5 minute window and send the counts to ‘Page View Per Member’ 5 Page View in stream Page View per Member out stream Repartition by member id Window Map SendTo HDFS PageView: hdfs://mydbsnapshot/PageViewFiles/ PageViewPerMember: hdfs://myoutputdb/PageViewPerMemberFiles Zero code changes
  • 6. Stream Processing as a Library 6 Page View Page View per Member Repartition by member id Window Map SendTo Launch Stream Processor public static void main(String[] args) { CommandLine cmdLine = new CommandLine(); OptionSet options = cmdLine.parser().parse(args); Config config = cmdLine.loadConfig(options); LocalApplicationRunner runner = new LocalApplicationRunner(config); PageViewCountApplication app = new PageViewCountApplication(); runner.run(app); runner.waitForFinish(); } job.coordinator.factory=org.apache.samza.zk. ZkJobCoordinatorFactory job.coordinator.zk.connect=my-zk.server:2191 Zero code changes
  • 7. Apache Kafka Real Time Processing (Apache Samza) Processing Espresso Services Tier Ingestion Clients(browser,devices ….) Brooklin Oracle AWS Kinesis Azure EventHub Data Ingestion at LinkedIn 7
  • 9. Local State -- Throughput 9 remote state 30-150x worse than local state on disk w/ caching comparable with in memory changelog adds minimal overhead
  • 10. Failure Recovery 10 ~ constant overhead with Host Affinity parallel recovery: equal recovery time irrespective of # failed containers
  • 11. Samza HDFS Benchmark Profile count, group-by country 500 files 250GB input