SlideShare a Scribd company logo
1 of 26
© 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies
CEP - A Simplified Enterprise Architecture
for Real-time Stream Processing
Mathieu Dumoulin, Data Engineer (mdumoulin@mapr.com, @lordxar)
© 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential
Mathieu Dumoulin
• Living in Tokyo, Japan last 3 years
• Data Engineer for MapR Professional Services
• Other jobs: Data Scientist, Search Engineer
• Connect with me:
–Read my blog posts:
https://www.mapr.com/blog/author/mathieu-dumoulin
–Twitter: @Lordxar
–Email: mdumoulin@mapr.com
© 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential
Content Summary
1.Complex Event Processing
2.Streaming Architecture
3.Rules Engines for CEP
4.Simplified Hadoop-based CEP Architecture
5.Live Demo
6.Does it scale?
7.Conclusion
© 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential
Complex Event Processing (CEP)
Some terminology:
• Event: Data with a timestamp (a log event, a transaction, ...)
• Event processing: Track and analyze streaming event data
• Complex event processing is to identify meaningful events and
respond to them as quickly as possible. Usually over a sliding
window on the stream of event data.
CEP is just a fancy way to do
business rules on streaming data
© 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential
IoT: Needs some CEP in There Somewhere
© 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential
CEP in Action
The power of CEP comes from being able to detect complex
situations that could not be detected from any individual data
directly.
Window opened
Motion Sensor
Light turned on
Door opened
© 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential
Actually, CEP Has Been Around For a While
Taken from March 2010 issue of the Dutch Java Magazine (source)
© 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential
Technology Has Been Holding Rule Engines Back
• Rule engines are not new
– First papers from the 90’s, many implementations in early 2000’s
• Engine is running in-memory on single node
– A few GB of memory (or less) was a severe limitation
– Single core CPU can only do so much
• Need modern stream messaging (Kafka, MapR Streams)
– Need persistence
– Need speed
• No standard, no dominant sponsor
– 90’s and early 2000 dominated by Microsoft
– OSS had not come of age in enterprise IT
© 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential
CEP in a Modern Enterprise Data Pipeline
Source: Oracle / Rittman Mead Information Management Reference Architecture
© 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential
Modern Streaming Architecture
• Build flexible systems
– more efficient and easier to build
– Decouples dependencies
• Better model the way business processes take place.
• More value now
– Aggregates data from many sources once
– Serves data to one or many projects immediately
• More value later
– Run batch analytics on the data later
– Reprocess the data with different algorithms later
© 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential
Kafka-esque Messaging for Rule Engines
• Stream Persistence is a key feature
• CEP is only one use case
– Support batch analytics and Ad-hoc analysis from the same data
stream
• Compensate for Current Rule Engine limitations
– Enables Hot Replacement for fault-tolerance
– Enables simple horizontal scaling by partitioning data and rules
• Convergence
– Run this use case on your existing, standard, big data technology
– Use OSS frameworks and Open APIs
© 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential
Roy Schulte, vice president, Gartner
Most CEP in IoT [...] is custom coded [...]
rather than
[using a] general purpose stream platform.
See: Complex Event Processing and The Future Of Business Decisions
by David Luckham and W. Roy Schulte
© 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential
Custom Coded CEP: The Good and The Bad
The Good:
• Made to order with a modern framework
• “No limit” to potential for performance and scalability
• Fit to purpose technology
The bad:
• Engineers aren’t business domain experts
• Lots of work to build from scratch every time
• Changes to logic is a pain point (from business side)
• Lack of available talent/organizational capability
© 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential
Declarative Makes Sense For Business
Manage complex behavior through simple rules
working together, executed by a rules Engine.
© 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential
Drools is a business rule management system (BRMS) with a
forward and backward chaining inference based rules engine.
• Project homepage: http://www.drools.org/
• Developer: Red Hat
• Enterprise supported version available
– JBoss Enterprise BRMS
• Enhanced implementation of the Rete algorithm
– A state of the art algorithm for rules engines
• Has a GUI Rules Editor: Workbench
An Open Source Rule Engine:
© 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential
An Open Source Rule Engine:
Production
Memory
(Rules)
Working
Memory
(Facts)
Pattern
Matcher
AgendaDomain Expert
Rules
Editor
Actions
© 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential
STATELESS
Session
CEP in Drools: Stateful Session and Sliding Window
STATELESS
Session
Rule:
Is the ball red?
Rule:
Are there 2+ red
balls in the last 4
balls I’ve seen?
© 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential
STATEFUL
Session
CEP in Drools: Stateful Session + Sliding Window
STATELESS
Session
Rule:
Is the ball red?
Rule:
Are there 2+ red
balls in the last 4
balls I’ve seen?
© 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential
Streaming Architecture for CEP
Sensors -
Real-time Data
Producer
Distributed
Cluster (Kafka,
MapR)
Consumer Server
(Edge node, cluster
node)
Integrate with other
systems
© 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential
The Case for CEP on Streaming Architecture
• Decouple rules maintenance from code and infrastructure
– Manage the cluster separately
– The application code may need only minimal maintenance
• Rules maintenance in the hands of the business domain experts
– Easily supports multiple projects & teams
• Data is persisted in the stream (input and output)
– Open to new use cases
• Send data back to the stream
– Integrate with other downstream use cases
© 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential
But Does It Scale? Yes, But Only to a Point
• Drools and other rule engines are in-memory and the
memory is not distributed
– This is only a technical limitation that can be
overcome (Ex: Alluxio, Apache Ignite)
• Streams make it easy to provide reasonable fault-
tolerance and quick disaster recovery
• Run multiple servers, split rules logically, fan out data
into multiple topics
• A single session can handle 100K+/sec events. How
much scale is needed?
© 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential
Live Demo: Smart City Traffic Management
© 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential
● Try out integration with Spark
Streaming and Flink
● Run serious performance
benchmarks
● Deploy into production
© 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential
Recap
• It’s not Rule Engine vs. Spark and Flink Stream processing
– It’s Rules + Stream Processing
– Spark Flink, Java are just an implementation choice
• Focus on business value from applying rules to data
– Think of benefits of SQL vs. Java, C++, Scala, …
• Great use case for a Streaming Architecture and microservices
An in-depth blog post on this talk topic will be available on
MapR blog: https://www.mapr.com/blog/author/mathieu-dumoulin
© 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential
Suggested Reading
● Get Ted & Ellen’s book and many
more for free:
○ https://www.mapr.com/ebooks/
● More more great blog content
about CEP and IoT applications
○ Eric Bruno on Linkedin
○ Karzel et al. on InfoQ
© 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential
Q & A
@mapr
mdumoulin@mapr.com
@lordxar
Engage with us!
mapr-technologies

More Related Content

What's hot

Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesYevgeniy Brikman
 
Isv cloud business readiness assessment
Isv cloud business readiness assessmentIsv cloud business readiness assessment
Isv cloud business readiness assessmentMIS
 
Prometheus
PrometheusPrometheus
Prometheuswyukawa
 
(책 소개) 레거시 코드 활용 전략
(책 소개) 레거시 코드 활용 전략(책 소개) 레거시 코드 활용 전략
(책 소개) 레거시 코드 활용 전략Jay Park
 
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Red Hat Developers
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream ProcessingSuneel Marthi
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
DevSecOps: Key Controls for Modern Security Success
DevSecOps: Key Controls for Modern Security SuccessDevSecOps: Key Controls for Modern Security Success
DevSecOps: Key Controls for Modern Security SuccessPuma Security, LLC
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container PlatformDLT Solutions
 
Loki - like prometheus, but for logs
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logsJuraj Hantak
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenParis Container Day
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
Tracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQubeTracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQubePatroklos Papapetrou (Pat)
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Amazon Web Services
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 

What's hot (20)

Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
 
Isv cloud business readiness assessment
Isv cloud business readiness assessmentIsv cloud business readiness assessment
Isv cloud business readiness assessment
 
Prometheus
PrometheusPrometheus
Prometheus
 
(책 소개) 레거시 코드 활용 전략
(책 소개) 레거시 코드 활용 전략(책 소개) 레거시 코드 활용 전략
(책 소개) 레거시 코드 활용 전략
 
Hands-on Helm
Hands-on Helm Hands-on Helm
Hands-on Helm
 
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
DevSecOps: Key Controls for Modern Security Success
DevSecOps: Key Controls for Modern Security SuccessDevSecOps: Key Controls for Modern Security Success
DevSecOps: Key Controls for Modern Security Success
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container Platform
 
Loki - like prometheus, but for logs
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logs
 
Terraform
TerraformTerraform
Terraform
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max Inden
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Tracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQubeTracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQube
 
DevSecOps: What Why and How : Blackhat 2019
DevSecOps: What Why and How : Blackhat 2019DevSecOps: What Why and How : Blackhat 2019
DevSecOps: What Why and How : Blackhat 2019
 
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
Using HashiCorp’s Terraform to build your infrastructure on AWS - Pop-up Loft...
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 

Similar to CEP - simplified streaming architecture - Strata Singapore 2016

Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016Nitin Kumar
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 

Similar to CEP - simplified streaming architecture - Strata Singapore 2016 (20)

Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 

More from Mathieu Dumoulin

Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comMathieu Dumoulin
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduceMathieu Dumoulin
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMathieu Dumoulin
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop QuébecMathieu Dumoulin
 

More from Mathieu Dumoulin (7)

Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduce
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifié
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop Québec
 
Introduction à Hadoop
Introduction à HadoopIntroduction à Hadoop
Introduction à Hadoop
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 

CEP - simplified streaming architecture - Strata Singapore 2016

  • 1. © 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies CEP - A Simplified Enterprise Architecture for Real-time Stream Processing Mathieu Dumoulin, Data Engineer (mdumoulin@mapr.com, @lordxar)
  • 2. © 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential Mathieu Dumoulin • Living in Tokyo, Japan last 3 years • Data Engineer for MapR Professional Services • Other jobs: Data Scientist, Search Engineer • Connect with me: –Read my blog posts: https://www.mapr.com/blog/author/mathieu-dumoulin –Twitter: @Lordxar –Email: mdumoulin@mapr.com
  • 3. © 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential Content Summary 1.Complex Event Processing 2.Streaming Architecture 3.Rules Engines for CEP 4.Simplified Hadoop-based CEP Architecture 5.Live Demo 6.Does it scale? 7.Conclusion
  • 4. © 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential Complex Event Processing (CEP) Some terminology: • Event: Data with a timestamp (a log event, a transaction, ...) • Event processing: Track and analyze streaming event data • Complex event processing is to identify meaningful events and respond to them as quickly as possible. Usually over a sliding window on the stream of event data. CEP is just a fancy way to do business rules on streaming data
  • 5. © 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential IoT: Needs some CEP in There Somewhere
  • 6. © 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential CEP in Action The power of CEP comes from being able to detect complex situations that could not be detected from any individual data directly. Window opened Motion Sensor Light turned on Door opened
  • 7. © 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential Actually, CEP Has Been Around For a While Taken from March 2010 issue of the Dutch Java Magazine (source)
  • 8. © 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential Technology Has Been Holding Rule Engines Back • Rule engines are not new – First papers from the 90’s, many implementations in early 2000’s • Engine is running in-memory on single node – A few GB of memory (or less) was a severe limitation – Single core CPU can only do so much • Need modern stream messaging (Kafka, MapR Streams) – Need persistence – Need speed • No standard, no dominant sponsor – 90’s and early 2000 dominated by Microsoft – OSS had not come of age in enterprise IT
  • 9. © 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential CEP in a Modern Enterprise Data Pipeline Source: Oracle / Rittman Mead Information Management Reference Architecture
  • 10. © 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential Modern Streaming Architecture • Build flexible systems – more efficient and easier to build – Decouples dependencies • Better model the way business processes take place. • More value now – Aggregates data from many sources once – Serves data to one or many projects immediately • More value later – Run batch analytics on the data later – Reprocess the data with different algorithms later
  • 11. © 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential Kafka-esque Messaging for Rule Engines • Stream Persistence is a key feature • CEP is only one use case – Support batch analytics and Ad-hoc analysis from the same data stream • Compensate for Current Rule Engine limitations – Enables Hot Replacement for fault-tolerance – Enables simple horizontal scaling by partitioning data and rules • Convergence – Run this use case on your existing, standard, big data technology – Use OSS frameworks and Open APIs
  • 12. © 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential Roy Schulte, vice president, Gartner Most CEP in IoT [...] is custom coded [...] rather than [using a] general purpose stream platform. See: Complex Event Processing and The Future Of Business Decisions by David Luckham and W. Roy Schulte
  • 13. © 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential Custom Coded CEP: The Good and The Bad The Good: • Made to order with a modern framework • “No limit” to potential for performance and scalability • Fit to purpose technology The bad: • Engineers aren’t business domain experts • Lots of work to build from scratch every time • Changes to logic is a pain point (from business side) • Lack of available talent/organizational capability
  • 14. © 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential Declarative Makes Sense For Business Manage complex behavior through simple rules working together, executed by a rules Engine.
  • 15. © 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential Drools is a business rule management system (BRMS) with a forward and backward chaining inference based rules engine. • Project homepage: http://www.drools.org/ • Developer: Red Hat • Enterprise supported version available – JBoss Enterprise BRMS • Enhanced implementation of the Rete algorithm – A state of the art algorithm for rules engines • Has a GUI Rules Editor: Workbench An Open Source Rule Engine:
  • 16. © 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential An Open Source Rule Engine: Production Memory (Rules) Working Memory (Facts) Pattern Matcher AgendaDomain Expert Rules Editor Actions
  • 17. © 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential STATELESS Session CEP in Drools: Stateful Session and Sliding Window STATELESS Session Rule: Is the ball red? Rule: Are there 2+ red balls in the last 4 balls I’ve seen?
  • 18. © 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential STATEFUL Session CEP in Drools: Stateful Session + Sliding Window STATELESS Session Rule: Is the ball red? Rule: Are there 2+ red balls in the last 4 balls I’ve seen?
  • 19. © 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential Streaming Architecture for CEP Sensors - Real-time Data Producer Distributed Cluster (Kafka, MapR) Consumer Server (Edge node, cluster node) Integrate with other systems
  • 20. © 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential The Case for CEP on Streaming Architecture • Decouple rules maintenance from code and infrastructure – Manage the cluster separately – The application code may need only minimal maintenance • Rules maintenance in the hands of the business domain experts – Easily supports multiple projects & teams • Data is persisted in the stream (input and output) – Open to new use cases • Send data back to the stream – Integrate with other downstream use cases
  • 21. © 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential But Does It Scale? Yes, But Only to a Point • Drools and other rule engines are in-memory and the memory is not distributed – This is only a technical limitation that can be overcome (Ex: Alluxio, Apache Ignite) • Streams make it easy to provide reasonable fault- tolerance and quick disaster recovery • Run multiple servers, split rules logically, fan out data into multiple topics • A single session can handle 100K+/sec events. How much scale is needed?
  • 22. © 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Live Demo: Smart City Traffic Management
  • 23. © 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential ● Try out integration with Spark Streaming and Flink ● Run serious performance benchmarks ● Deploy into production
  • 24. © 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential Recap • It’s not Rule Engine vs. Spark and Flink Stream processing – It’s Rules + Stream Processing – Spark Flink, Java are just an implementation choice • Focus on business value from applying rules to data – Think of benefits of SQL vs. Java, C++, Scala, … • Great use case for a Streaming Architecture and microservices An in-depth blog post on this talk topic will be available on MapR blog: https://www.mapr.com/blog/author/mathieu-dumoulin
  • 25. © 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential Suggested Reading ● Get Ted & Ellen’s book and many more for free: ○ https://www.mapr.com/ebooks/ ● More more great blog content about CEP and IoT applications ○ Eric Bruno on Linkedin ○ Karzel et al. on InfoQ
  • 26. © 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential Q & A @mapr mdumoulin@mapr.com @lordxar Engage with us! mapr-technologies

Editor's Notes

  1. It’s just not true ML solves all problems. ML seeks to make predictions, which is very useful. But most business processes don’t need prediction every step of the way, they are rather more like a series of steps with conditionals arranged in a DAG
  2. Rules need to be: Independent Easily Updated (Add, Change, Delete) Rules apply to only minimum set of relevant data Allow business domain experts to contribute
  3. Integrate Flink/Spark Streaming with Drools Performance and Scalability Testing Flink brings “for free” lots of benefits: State is saved automatically by checkpoints Fault-recovery for Drools state is simplified Record-at-a-time processing is a good model to add data to KieSession