SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Productionizing Data Science at
A Startup's (ongoing) Journey building Machine Learning Products
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Who am I?
• Matt Mills
• Born and raised in Atlanta
• BS in Industrial and Systems Engineering 2014, MS in Analytics 2015
• @statmills or www.statmills.com
Experience's mobile commerce, ticketing, and data
solutions empower sports and entertainment
leaders to generate new revenue streams, sell more
tickets, and make smarter decisions.
www.expapp.com/solutions
What is Experience?
What is Experience?
What is Experience?
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
Goal for 2017
• Make an Impact on our Customers (Fans)
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
Goal for 2017
• Make an Impact on our Customers (Fans)
Predictive
Model
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
• Some considerations
• Minimal use of limited Engineering Resources
• Scalable (speed and processing power)
• Cheap, like, super cheap (read: Free)
• Had to handle data cleansing
Some Potential Solutions
Some Potential Solutions
• Build own R/Python Server
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
• Pay for ML Service
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
• Pay for ML Service
Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
• Models exportable as Java Objects
to embed in other apps
• Can embed python pre-processing
scripts within the POJO
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
h2o Architecture
https://github.com/h2oai/h2o-meetups/blob/master/2017_09_12_Dublin/
2017_09_12_H2O_Intro_and_AutoML.pdf
h2o Algorithm List
h2o vs scikit-learn Syntax and Process
https://github.com/h2oai/h2o-meetups/blob/master/2015_05_14_H2O_Overview/H2O_Overview.pdf
Experience Production Pipeline
Experience Production Pipeline
Experience Predictive Modeling Pipeline
1. App Sends Data
2. Data Cleaning in Python 3. Predictions done in h2o
4. App Gets Prediction
input()
sys.stdout.flush()
{JSON}
Benefits of Using Open Source Software
Experience Predictive Modeling Setup
Model Deployment
Code
Pulled via Github
Terraform
to create infrastructure
and manage state
Served
via ECS
Dockerize
via Dockerfile and
stored in ECR
Discovery
via Consul
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Pros and Cons of Current Set-Up
Pros
• Automated process to deploy
models into production
• Can iterate models with no/limited
effort from engineering
Cons
• Can only use algorithms available
to h2o (e.g. no multilevel models,
GAMs, Bayesian)
• h2o drives Python, why not the
other way around?
Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
www.statmills.com
? http://docs.h2o.ai/
https://www.expapp.com/about/#careers

Weitere ähnliche Inhalte

Was ist angesagt?

Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
Sri Ambati
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
confluent
 

Was ist angesagt? (20)

Shiny.collections - Google Docs-like live collaboration in Shiny!
Shiny.collections - Google Docs-like live collaboration in Shiny!Shiny.collections - Google Docs-like live collaboration in Shiny!
Shiny.collections - Google Docs-like live collaboration in Shiny!
 
Google Charts for native Android apps
Google Charts for native Android appsGoogle Charts for native Android apps
Google Charts for native Android apps
 
Introduction to graphQL
Introduction to graphQLIntroduction to graphQL
Introduction to graphQL
 
Webinar: BI Mobile with SpagoBI: be aware everywhere!
Webinar: BI Mobile with SpagoBI: be aware everywhere!Webinar: BI Mobile with SpagoBI: be aware everywhere!
Webinar: BI Mobile with SpagoBI: be aware everywhere!
 
Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...
Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...
Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...
 
Massive Streaming Analytics with Spark Streaming
Massive Streaming Analytics with Spark StreamingMassive Streaming Analytics with Spark Streaming
Massive Streaming Analytics with Spark Streaming
 
2018-10-23 3C - Lean, Scrum and low code approach of SharePoint and O365 proj...
2018-10-23 3C - Lean, Scrum and low code approach of SharePoint and O365 proj...2018-10-23 3C - Lean, Scrum and low code approach of SharePoint and O365 proj...
2018-10-23 3C - Lean, Scrum and low code approach of SharePoint and O365 proj...
 
DevOpsDays Amsterdam 2016 workshop
DevOpsDays Amsterdam 2016 workshopDevOpsDays Amsterdam 2016 workshop
DevOpsDays Amsterdam 2016 workshop
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
 
Hyc program 17.10
Hyc program 17.10Hyc program 17.10
Hyc program 17.10
 
[Jira Day 2018] PPM: The Tempo Story
[Jira Day 2018] PPM: The Tempo Story[Jira Day 2018] PPM: The Tempo Story
[Jira Day 2018] PPM: The Tempo Story
 
GraphQL Advanced
GraphQL AdvancedGraphQL Advanced
GraphQL Advanced
 
Art of K2 Overview
Art of K2 OverviewArt of K2 Overview
Art of K2 Overview
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
 
Webinar: Free inquiry and Ad hoc reporting with SpagoBI
Webinar: Free inquiry and Ad hoc reporting with SpagoBIWebinar: Free inquiry and Ad hoc reporting with SpagoBI
Webinar: Free inquiry and Ad hoc reporting with SpagoBI
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Get Started with Driverless AI Recipes - Hands-on Training
Get Started with Driverless AI Recipes - Hands-on TrainingGet Started with Driverless AI Recipes - Hands-on Training
Get Started with Driverless AI Recipes - Hands-on Training
 
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetupH2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
 
Fifth elephant 2017 Data Pipeline workshop
Fifth elephant 2017 Data Pipeline workshopFifth elephant 2017 Data Pipeline workshop
Fifth elephant 2017 Data Pipeline workshop
 
apidays LIVE Paris 2021 - Localizing OpenAPI Specification by Olga Baybakova,...
apidays LIVE Paris 2021 - Localizing OpenAPI Specification by Olga Baybakova,...apidays LIVE Paris 2021 - Localizing OpenAPI Specification by Olga Baybakova,...
apidays LIVE Paris 2021 - Localizing OpenAPI Specification by Olga Baybakova,...
 

Ähnlich wie Productionizing Data Science at Experience

SharePoint 2013 Dev Features
SharePoint 2013 Dev FeaturesSharePoint 2013 Dev Features
SharePoint 2013 Dev Features
Ricardo Wilkins
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
Andy Lathrop
 

Ähnlich wie Productionizing Data Science at Experience (20)

Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Building Business Applications in Office 365 SharePoint Online Using Logic Apps
Building Business Applications in Office 365 SharePoint Online Using Logic AppsBuilding Business Applications in Office 365 SharePoint Online Using Logic Apps
Building Business Applications in Office 365 SharePoint Online Using Logic Apps
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
PWR 106 Business Process Automation for SharePoint
PWR 106 Business Process Automation for SharePointPWR 106 Business Process Automation for SharePoint
PWR 106 Business Process Automation for SharePoint
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
KTern.AI-SAP-DXaaS-Workshop-PLAN
KTern.AI-SAP-DXaaS-Workshop-PLANKTern.AI-SAP-DXaaS-Workshop-PLAN
KTern.AI-SAP-DXaaS-Workshop-PLAN
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
SharePoint 2013 Dev Features
SharePoint 2013 Dev FeaturesSharePoint 2013 Dev Features
SharePoint 2013 Dev Features
 
Searching for SharePoint Analytics
Searching for SharePoint AnalyticsSearching for SharePoint Analytics
Searching for SharePoint Analytics
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines Kergosien
 
Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Kürzlich hochgeladen (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

Productionizing Data Science at Experience