SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Applying multiple
ML Pipelines
to heterogenous data streams
Gevorg Soghomonian, AI Research Engineer
Maciej Dabrowski, Chief Data Scientist
1
About Altocloud
2
• Understand customers better
• Improve customer experience
• Predict customer behaviour and engage
• Accelerate revenue conversion
• Reduce shopping cart abandons
Customer journey analytics
EVENT STREAMS EVENT PROCESSORS
Altocloud platform
BATCH
MODEL
LEARNING
ENRICHMENT PREDICTIONS
STORAGE
QUEUES
WEB EVENTS
CALL, IVR, MESSAGE EVENTS
ACTIONS
SEGMENTATION
Web
Hook
AGGREGATION
ACTION
CREATION
OUTCOME PROBABILITIES
REAL-TIME CUSTOMER JOURNEY
TICKET, LEAD,
GENERIC EVENTS
CONTEXT
Heterogenous data
5
Background
6
Spark Machine Learning framework
7
Spark Pipeline
8
Spark Transformers
High level interface for all operations on Datasets
Act as stages in Pipelines
Require various configuration: inputColumn, outputColumn, data, …
Dataset is the only input/output format
9
The challenge
10
Spark Pipeline deployment flow
11
Altocloud deployment flow
12
Altocloud deployment flow
13
How to operationalise
hundreds of models in one
Spark Streaming job?
Possible (deployment) solutions
Serialisation
✓ Ability to use any streaming technology
x Increased complexity
x Potentially increased deployment latency
Spark ML Pipelines
✓ Simplicity - one technology for training and evaluation
x Support only for the “one model scenario”
14
Heterogeneous datasets in Spark
Spark Pipelines can be applied only to Datasets
Spark Pipelines cannot be combined in a composite pipeline to be
applied on Pipeline-per-row basis
Questions:
How to apply different Pipeline(s) for each row in the Datasets?
How to identify which Pipelines should be applied to a row?
15
Applying hundreds of models to
heterogenous streams in
Spark Streaming
16
The foundation
Redefine Transformer API to allow compositional flexibility.
Adapt Spark Pipelines to redefined Transformers
17
Transformer API definition
18
One-to-One correspondence / adaptation
19
Adapt DataFrame Pipeline to Stream Pipeline
20
Adapt & combine
21
Composition example
22
Composition example
23
Apache Spark ML Pipeline flow revisited
24
Challenges
Re-implementation of transformer logic
Train-serving skew
Scaling beyond 1000s of models
25
Summary
26
Summary
Applying Pipelines to heterogenous data streams is hard.
Spark Pipelines cannot be combined and applied in a composite pipeline
on Pipeline-per-row basis
Our solution:
Extend Transformer API to apply different Pipeline(s) for each row
Extend Streams to include Pipeline composition
macdab@altocloud.com & gevorg@altocloud.com
27

Weitere ähnliche Inhalte

Ähnlich wie Spark Summit Europe 2017 - Applying multiple ML pipelines to heterogenous data streams

Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
Marc Gille
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
DataWorks Summit
 

Ähnlich wie Spark Summit Europe 2017 - Applying multiple ML pipelines to heterogenous data streams (20)

Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
 
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateIntroduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Everything you want to know about microservices
Everything you want to know about microservicesEverything you want to know about microservices
Everything you want to know about microservices
 
Internet of things
Internet of things  Internet of things
Internet of things
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Bootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and SparkBootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and Spark
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
2017 05 Hadoop User Group Meetup Dublin
2017 05 Hadoop User Group Meetup Dublin2017 05 Hadoop User Group Meetup Dublin
2017 05 Hadoop User Group Meetup Dublin
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
126 Golconde
126 Golconde126 Golconde
126 Golconde
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 

Mehr von mdabrowski

Mehr von mdabrowski (10)

The true meaning of data
The true meaning of dataThe true meaning of data
The true meaning of data
 
Near real-time recommendations in enterprise social networks
Near real-time recommendations in enterprise social networksNear real-time recommendations in enterprise social networks
Near real-time recommendations in enterprise social networks
 
Applications of the Social Semantic Web
Applications of the Social Semantic WebApplications of the Social Semantic Web
Applications of the Social Semantic Web
 
Short guide to the Semantic Web
Short guide to the Semantic WebShort guide to the Semantic Web
Short guide to the Semantic Web
 
Introduction to the Social Semantic Web
Introduction to the Social Semantic WebIntroduction to the Social Semantic Web
Introduction to the Social Semantic Web
 
Introduction to the Social Web and its applications
Introduction to the Social Web and its applicationsIntroduction to the Social Web and its applications
Introduction to the Social Web and its applications
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
MarcOnt Initiative - Protege meeting
MarcOnt Initiative - Protege meetingMarcOnt Initiative - Protege meeting
MarcOnt Initiative - Protege meeting
 
Philosophy and Atrificial Inteligence
Philosophy and Atrificial Inteligence Philosophy and Atrificial Inteligence
Philosophy and Atrificial Inteligence
 
MarcOnt Initiative
MarcOnt InitiativeMarcOnt Initiative
MarcOnt Initiative
 

Kürzlich hochgeladen

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 

Spark Summit Europe 2017 - Applying multiple ML pipelines to heterogenous data streams