2. Agenda
2
● What is machine learning
● What’s the same as regular service development
● What’s different (and can go wrong)
● Building ML teams and their place in the company
● How we do ML at Ibotta
● The future
3. Background
3
● I'm Matt Reynolds, a principal platform engineer on the
machine learning team at Ibotta
● Ibotta is a rewarded shopping company with mobile, web
and white label platform components
● Based here in Denver but now fully "remote-friendly"
● We're hiring - https://home.ibotta.com/work-with-us/careers/
4. What Is ML?
4
Machine learning (ML) A program or system that builds
(trains) a predictive model from input data. The system uses
the learned model to make useful predictions from new
(never-before-seen) data drawn from the same distribution
as the one used to train the model.
https://developers.google.com/machine-learning/glossary#machine-learning
5. Types Of ML
5
● “Analytical” ML
One off, exploratory, findings used in reports to
management
● “Engineering” ML
Models deployed to production, called by services
6. What’s The Same For ML?
6
● Frameworks/libraries/tools
● Git/PRs for code
● CI/CD - process automation for repeatability
● Provide a service in production
● Service monitoring*
7. Some Companies Struggle…
7
55% of companies surveyed did not have a model in production
https://info.algorithmia.com/hubfs/2019/Whitepapers/The-State-of-Enterprise-ML-2020/Algo
rithmia_2020_State_of_Enterprise_ML.pdf
87% of data science projects don’t make it to production
https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into
-production/
10. Exploratory Data Analysis (EDA)
10
New Models require:
● Finding data sources that may be suitable
● Checking Data Quality, distribution
● Figuring out label generation
● Building initial Features
● Testing with algorithm(s)
● Validating results and tuning
12. People
12
“Data Scientists” have different skill sets:
● Have their own jargon
● May not be used to writing “production ready” code
● May not be used to being on-call, production support
● Mostly work in Python
14. Different Tools
14
As well as the tooling to run a “regular” service, you also need:
● Data pipeline
● Feature engineering
● Feature store
● Training & hyperparameter tuning infrastructure
● Maybe specialized inference hardware (GPU)
● Inference monitoring (data drift)
18. How Can You Help?
18
● Take some time to learn the lay of the land
● Look for pain points - local dev, process automation
● Make suggestions, listen to feedback
● Jump in and learn the ropes
● Work from the more “engineering” side to the more “ML”
● Teach what you do and learn what they do
● Encourage collaboration, standardization
● Explain why
19. ML In The Larger Organization
19
● Need to work with Data/Analytics & Engineering orgs
● Involve product
● Advocate for big picture concerns like:
● Data catalog, more metadata
● Data quality
● More (timely) data - events from engineering services
20. Our Process - Data & Training
20
● Airflow for job orchestration
● PySpark for Data transformation
● Sagemaker for managing training, hyperparameter jobs
● Local dev with Docker for Airflow DAGs
● Jupyter notebooks for EDA and troubleshooting
21. Our Process - Inference
21
● Sagemaker Endpoints using docker images built on top of
AWS supplied bases
● Postgres DB for storing real time features
● All behind API gateway for consistent API
● Lambda for A/B test, model aggregation
● Local dev with Docker for Inference test along with Jupyter
notebooks, test integration in staging environment
22. The Future
22
● How to scale
● Quality monitoring
● More real-time feature generation
● Serverless inference