SlideShare ist ein Scribd-Unternehmen logo
1 von 58
prototype -> production
Make your ML app rock
Agenda
• Problems with current workflow
• Interactive exploration to enterprise API
• Data Science Platforms
• My recommendation
About me @geoHeil
• Data Scientist at T-Mobile Austria
• Business Informatics at Vienna University of Technology
• Built predictive startup (predictr.eu)
• Data science projects at university
Ed, 41
Professional developer
Cares about Testing, CI,
stability
John, 28
Phd. cool kid
Wants to build
awesome app
Simple?
Goal: smart application improves business processes
John’s
Smart app
Ed’s
Business
process
Simple?
Goal: smart application improves business processes
Ed’s
Business
process
ML modes: similarity of environments?
Exploration
• Flexibility
• Easy to use
• reusability
Production
• Performance
• Scalability
• Monitoring
• API
Interaction required to improve business process
ML modes
from https://www.youtube.com/watch?v=R-6nAwLyWCI
flexibility performance
Stackup
Problems
• Move to production means
redevelopment from scratch
Solutions
• Notebooks as API
Prototype problem at current project
Easy move to the JVM?
Consultant
R
Me
Python
Production
JVM
native C dependencies
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
Solutions
• Notebooks as API
• Re develop from scratch
Prototype problem at current project
Easy move to the JVM?
Consultant
R
Me
Python
Production
JVM
native C dependencies
Data exchange possibilities (API)
Pickle – python only
Hadoop file formats (avro/parquet)
Thrift, protobuf
Message queue
REST
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
Solutions
• Notebooks as API
• Use analytics via an API
Big data starts at
20GB. Want to use
fancy hadoop cluster
We can buy a
server with 6 TB
RAM
3 types of big data
1. Fits in memory (6 TB of RAM …)
2. Raw data too large for memory, but aggregated data works
well
3. Too big => ml needs to be big as well
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
• Enterprise operations handle JVM
only
• Inflexible big data tools
Solutions
• Notebooks as API
• Use analytics via an API
• Your data is not “really big” and
still fits in memory
Security is
not my job
Disagree /
infoSec
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
• Inflexible big data tools
• Security not taken care of
Solutions
• Notebooks as API
• Use analytics via an API
• Your data is not “really big” and
still fits in memory ->keep using
python / R / notebooks
• Kerberized hadoop cluster :(
Exploration to
Enterprise API
small data & R prototype
Separation of concerns.
Startup data science – predicting cash flows
• Custom backend (JVM)
• Data science and via an API (OpenCPU / R )
• Partly in backend (Renjin)
Other possibilities
• JNI (java native interface) :(
• JNA (java native access)
• Rkafka (did not have a MQ in infrastructure)
• Custom service (rest call) to JNA enabled server (too
costly)
Music streaming
Anomaly detection big data
Source
https://www.youtube.com/watch?v=t63SF2UkD0A&feature=youtu.be
project facts
• We were using a ms-sql backup (600 GB)
• Spark + parquet compressed it to 3 GB
• No cluster during development of the project, only laptops
+ 60 GB RAM server
• Most of the time spent in garbage collection (15 sec on
real cluster, 17 Minutes on laptop)
Data science stack
• Type 2 big data (aggregation allows for local in memory
processing in python/R)
• Spark as (REST) API
POST /jars/app_name jobserver:port/jars/myjob
POST jobserver:port/contexts/context_for_myapp
POST "paramKey = paramValue"
jobserver:port/myjob?appName=myjob&classPaht=path.to.main&con
text=context_for_myapp
• Aggregated data fed to R via REST-API
Frontend Backend
Data-science
SQL aggregation / spark job-server
Spark cluster
Laptop J
R
via opencpu
Spark aggregaton & R as API
REST call
API
incompatibilities
L
Data science platform
Can the architecture be simplified?
Cloud solutions
• Notebook as API: Databricks workflows / Domino data lab
• Google, Microsoft, Amazon
• Several data science platform startups bigml, dataiku,
...
(+) cluster deploy on click
(+) some integrate notebooks well
(-) control over data?
What is missing?
Custom models, Control over data,
Testing, CI, AB testing, retraining
Several solutions – same problem
Lets try lean
Back to spark architecture overview …
Missing API layer / model deployment
Hydrospheredata/mist notebook, CI -> e2e
CI & testing +1
Notebook e2e +1
But again: a lot of
moving parts
Highly experimental
Seldon –e2e ml platform for enterprise
Seldon architecture
K8s for high availability
Hot model deployments
A-B testing
Holdout group
Containerized micro
services conforming to
seldon’s REST API
Overall verygood
But: outdated python
2.xx
Kubernetes
mandatory
In an ideal world
What I dream of …
Whish list
• Flexibility to experiment (notebooks)on big enough
hardware
• Make these easily available as an API in a pre-production
environment to gain quick business feedback
• A-B testing, holdout group, containers
• More “developer” mindset (Testing, CI, security) for data
scientists
Reality is different.
How I will move forward with my current
project
Write a JVM-based custom backend which operations and existing developers
can maintain. Apparently this is a better fit than a platform turnkey solution.
How to integrate spark?
Spark deployment modes revisited ...
Spark deployment scenarios
• Batch / bulk prediction in cluster -> job scheduling
overhead
• Long running spark application?(SJS, pipeline persistence
àlocal spark context)
• Predictive service without spark
• PMML? jpmml/sklearn2pmml
• scoring without spark -> mleap and SPARK-16365
What is your approach?
Thanks. @geoHeil
PMML - Openscoring
• Based on PMML (predictive markup model language)
(+) stay in java/xml world (enterprise operations J)
(+) quick predictions
(+) mature
(-) not all models suitable for PMML / some algorithms not
implemented
(-) xml
PMML + retraining oryx.io
prediction.IO
h2o steam
E2e platform
Build + deploy
interoparbility
Enterprise
permissions
Based on h2o-flow
pipeline.io notebook à
prediction, e2e
“Extend ml pipelines to
serve production users“
How do tools stack up regarding security?
https://www.youtube.com/watch?v=t63SF2UkD0A&feature=youtu.be
Python (what I learnt later on)
• Easily can deployed on its own (if ops can handle this)
• Python4j/ pyspark/ spylon?
Science in Python, production in java – spylon, Video
• Bring code via custom UDF to data in pySpark
• Model = fitted sk-learn model
• Requires model to be parallelizable
others
• Jupyter notebook to REST API (IBM interactive dashboard
http://blog.ibmjstart.net/2016/01/28/jupyter-notebooks-as-restful-microservices/)
• Apache toree (interactive spark as notebook)

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning and the Elastic Stack
Machine Learning and the Elastic StackMachine Learning and the Elastic Stack
Machine Learning and the Elastic StackYann Cluchey
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
 
Getting started with Splunk - Break out Session
Getting started with Splunk - Break out SessionGetting started with Splunk - Break out Session
Getting started with Splunk - Break out SessionGeorg Knon
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Willy Lulciuc
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineScyllaDB
 
Stream de dados e Data Lake com Debezium, Delta Lake e EMR
Stream de dados e Data Lake com Debezium, Delta Lake e EMRStream de dados e Data Lake com Debezium, Delta Lake e EMR
Stream de dados e Data Lake com Debezium, Delta Lake e EMRCicero Joasyo Mateus de Moura
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack IntroductionVikram Shinde
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Timothy Spann
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningDatabricks
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsHostedbyConfluent
 
ING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdf
ING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdfING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdf
ING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdfThijs Ebbers
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 

Was ist angesagt? (20)

Machine Learning and the Elastic Stack
Machine Learning and the Elastic StackMachine Learning and the Elastic Stack
Machine Learning and the Elastic Stack
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Getting started with Splunk - Break out Session
Getting started with Splunk - Break out SessionGetting started with Splunk - Break out Session
Getting started with Splunk - Break out Session
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
 
Stream de dados e Data Lake com Debezium, Delta Lake e EMR
Stream de dados e Data Lake com Debezium, Delta Lake e EMRStream de dados e Data Lake com Debezium, Delta Lake e EMR
Stream de dados e Data Lake com Debezium, Delta Lake e EMR
 
MicroK8s
MicroK8sMicroK8s
MicroK8s
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Observability
ObservabilityObservability
Observability
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 
Introducing Akka
Introducing AkkaIntroducing Akka
Introducing Akka
 
ING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdf
ING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdfING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdf
ING Container Hosting Platform - 3 years onward_with Kube_for distribution.pdf
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 

Andere mochten auch

Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibDatabricks
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...confluent
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 

Andere mochten auch (11)

Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Ähnlich wie Machine learning model to production

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013Marc Gille
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglotTugdual Grall
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Neotys_Partner
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 

Ähnlich wie Machine learning model to production (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 

Kürzlich hochgeladen

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Kürzlich hochgeladen (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Machine learning model to production

  • 1. prototype -> production Make your ML app rock
  • 2. Agenda • Problems with current workflow • Interactive exploration to enterprise API • Data Science Platforms • My recommendation
  • 3. About me @geoHeil • Data Scientist at T-Mobile Austria • Business Informatics at Vienna University of Technology • Built predictive startup (predictr.eu) • Data science projects at university
  • 4. Ed, 41 Professional developer Cares about Testing, CI, stability John, 28 Phd. cool kid Wants to build awesome app
  • 5. Simple? Goal: smart application improves business processes John’s Smart app Ed’s Business process
  • 6. Simple? Goal: smart application improves business processes Ed’s Business process
  • 7. ML modes: similarity of environments? Exploration • Flexibility • Easy to use • reusability Production • Performance • Scalability • Monitoring • API Interaction required to improve business process ML modes
  • 9. Stackup Problems • Move to production means redevelopment from scratch Solutions • Notebooks as API
  • 10. Prototype problem at current project Easy move to the JVM? Consultant R Me Python Production JVM native C dependencies
  • 11. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only Solutions • Notebooks as API • Re develop from scratch
  • 12. Prototype problem at current project Easy move to the JVM? Consultant R Me Python Production JVM native C dependencies
  • 13. Data exchange possibilities (API) Pickle – python only Hadoop file formats (avro/parquet) Thrift, protobuf Message queue REST
  • 14. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only Solutions • Notebooks as API • Use analytics via an API
  • 15. Big data starts at 20GB. Want to use fancy hadoop cluster We can buy a server with 6 TB RAM
  • 16. 3 types of big data 1. Fits in memory (6 TB of RAM …) 2. Raw data too large for memory, but aggregated data works well 3. Too big => ml needs to be big as well
  • 17. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only • Enterprise operations handle JVM only • Inflexible big data tools Solutions • Notebooks as API • Use analytics via an API • Your data is not “really big” and still fits in memory
  • 18. Security is not my job Disagree / infoSec
  • 19. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only • Inflexible big data tools • Security not taken care of Solutions • Notebooks as API • Use analytics via an API • Your data is not “really big” and still fits in memory ->keep using python / R / notebooks • Kerberized hadoop cluster :(
  • 21. small data & R prototype Separation of concerns.
  • 22. Startup data science – predicting cash flows • Custom backend (JVM) • Data science and via an API (OpenCPU / R ) • Partly in backend (Renjin)
  • 23. Other possibilities • JNI (java native interface) :( • JNA (java native access) • Rkafka (did not have a MQ in infrastructure) • Custom service (rest call) to JNA enabled server (too costly)
  • 25.
  • 26.
  • 27.
  • 29. project facts • We were using a ms-sql backup (600 GB) • Spark + parquet compressed it to 3 GB • No cluster during development of the project, only laptops + 60 GB RAM server • Most of the time spent in garbage collection (15 sec on real cluster, 17 Minutes on laptop)
  • 30. Data science stack • Type 2 big data (aggregation allows for local in memory processing in python/R) • Spark as (REST) API POST /jars/app_name jobserver:port/jars/myjob POST jobserver:port/contexts/context_for_myapp POST "paramKey = paramValue" jobserver:port/myjob?appName=myjob&classPaht=path.to.main&con text=context_for_myapp • Aggregated data fed to R via REST-API
  • 31. Frontend Backend Data-science SQL aggregation / spark job-server Spark cluster Laptop J R via opencpu Spark aggregaton & R as API REST call API incompatibilities L
  • 32. Data science platform Can the architecture be simplified?
  • 33. Cloud solutions • Notebook as API: Databricks workflows / Domino data lab • Google, Microsoft, Amazon • Several data science platform startups bigml, dataiku, ... (+) cluster deploy on click (+) some integrate notebooks well (-) control over data?
  • 34. What is missing? Custom models, Control over data, Testing, CI, AB testing, retraining
  • 35. Several solutions – same problem
  • 36. Lets try lean Back to spark architecture overview …
  • 37. Missing API layer / model deployment
  • 39. CI & testing +1 Notebook e2e +1 But again: a lot of moving parts Highly experimental
  • 40. Seldon –e2e ml platform for enterprise
  • 41. Seldon architecture K8s for high availability Hot model deployments A-B testing Holdout group Containerized micro services conforming to seldon’s REST API Overall verygood But: outdated python 2.xx Kubernetes mandatory
  • 42. In an ideal world What I dream of …
  • 43. Whish list • Flexibility to experiment (notebooks)on big enough hardware • Make these easily available as an API in a pre-production environment to gain quick business feedback • A-B testing, holdout group, containers • More “developer” mindset (Testing, CI, security) for data scientists
  • 44. Reality is different. How I will move forward with my current project
  • 45. Write a JVM-based custom backend which operations and existing developers can maintain. Apparently this is a better fit than a platform turnkey solution.
  • 46. How to integrate spark? Spark deployment modes revisited ...
  • 47. Spark deployment scenarios • Batch / bulk prediction in cluster -> job scheduling overhead • Long running spark application?(SJS, pipeline persistence àlocal spark context) • Predictive service without spark • PMML? jpmml/sklearn2pmml • scoring without spark -> mleap and SPARK-16365
  • 48. What is your approach? Thanks. @geoHeil
  • 49. PMML - Openscoring • Based on PMML (predictive markup model language) (+) stay in java/xml world (enterprise operations J) (+) quick predictions (+) mature (-) not all models suitable for PMML / some algorithms not implemented (-) xml
  • 50. PMML + retraining oryx.io
  • 52. h2o steam E2e platform Build + deploy interoparbility Enterprise permissions Based on h2o-flow
  • 53.
  • 54. pipeline.io notebook à prediction, e2e “Extend ml pipelines to serve production users“
  • 55. How do tools stack up regarding security? https://www.youtube.com/watch?v=t63SF2UkD0A&feature=youtu.be
  • 56. Python (what I learnt later on) • Easily can deployed on its own (if ops can handle this) • Python4j/ pyspark/ spylon?
  • 57. Science in Python, production in java – spylon, Video • Bring code via custom UDF to data in pySpark • Model = fitted sk-learn model • Requires model to be parallelizable
  • 58. others • Jupyter notebook to REST API (IBM interactive dashboard http://blog.ibmjstart.net/2016/01/28/jupyter-notebooks-as-restful-microservices/) • Apache toree (interactive spark as notebook)

Hinweis der Redaktion

  1. Hi Georg. Talk about how to not have a smart prototype script rot in the corner. First talk ;) Question: Who has played with machine learning who is familiar with R / python? Who is using big data technology in production? Who is drving business decisions with ML?
  2. Discussion about how you deploy models
  3. Apache Toree, Jupyter notebooks as REST api (IBM)
  4. Notebooks can execute JVM code as well