3. Who am I?
• Matt Mills
• Born and raised in Atlanta
• BS in Industrial and Systems Engineering 2014, MS in Analytics 2015
• @statmills or www.statmills.com
4. Experience's mobile commerce, ticketing, and data
solutions empower sports and entertainment
leaders to generate new revenue streams, sell more
tickets, and make smarter decisions.
www.expapp.com/solutions
What is Experience?
8. Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
9. Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
10. Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
11. Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
12. Goal for 2017
• Make an Impact on our Customers (Fans)
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
13. Goal for 2017
• Make an Impact on our Customers (Fans)
Predictive
Model
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
14. Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
15. Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
• Some considerations
• Minimal use of limited Engineering Resources
• Scalable (speed and processing power)
• Cheap, like, super cheap (read: Free)
• Had to handle data cleansing
21. Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
22. Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
• Models exportable as Java Objects
to embed in other apps
• Can embed python pre-processing
scripts within the POJO
32. Model Deployment
Code
Pulled via Github
Terraform
to create infrastructure
and manage state
Served
via ECS
Dockerize
via Dockerfile and
stored in ECR
Discovery
via Consul
34. Pros and Cons of Current Set-Up
Pros
• Automated process to deploy
models into production
• Can iterate models with no/limited
effort from engineering
Cons
• Can only use algorithms available
to h2o (e.g. no multilevel models,
GAMs, Bayesian)
• h2o drives Python, why not the
other way around?
35. Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
36. Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
www.statmills.com
? http://docs.h2o.ai/
https://www.expapp.com/about/#careers