Role of ML engineer

Role of Machine Learning
Engineer
Borys Biletskyy
Data Science Amsterdam
28-05-2019

Agenda
1. About Myself
2. Motivation
3. Data Science Process
4. Roles in Data Analytics
5. 3 Challenges for ML Engineer

About Myself
● Software Engineer since 2004
○ Low level, C++ -> Enterprise, Java -> Data Driven, Scala
○ Dev, Tech Lead, Architect, Consultant
● Researcher since 2004
○ PhD in Theoretical Computer Science
○ Complexity and Scalability of ML Methods
● Machine Learning Engineer since 2017
○ Python, Scala
○ LeasePlan, Randstad, VodafoneZiggo

Motivation
● Low success rate of Data Analytics projects
○ Gartner: 60% of Data Analytics projects fail*
● General C-level recommendations
○ The Data Economy: Why do so many analytics projects fail?**
○ 8 Reasons why Data Analytics projects fail***
○ ...
● Often the problem is in a team structure
○ How Machine Learning Engineer role can help
* - https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/
** - https://www.dataversity.net/many-data-analytics-projects-fail-save/
*** - https://www.eastbanctech.com/technology-insights/what-the-tech/why-so-many-analytics-projects-fail.html

Data Science Process*
Define
Goal
Data
Collection
Deploy
Model
Serve Model
(Request|Batch|Stream)
Modeling Validation
Monitor
*https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be
Feature
Engineering
Exploratory
Data
Analysis
Data
Pre-Processing

Data Science Process*
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
poor data quality
can’t scale this method
horizontally
model is too slow
for streaming
*https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be
DS DE
DE-DS handover is slow

Adv. Analytics Math/Stats ML/AI Scripting Programming Distributed Sys. Data Pipelines
Data Scientist & Data Engineer
● Fast insights driven
● Small applications
● Highly dynamic development
● Interactive notebook scripts
● Running on laptop
● Academic background
● Interacts with business/domain experts
● Agile
● Production systems
● QA and processes
● Modular, reusable, maintainable, scalable
● Running on cluster
● Engineering Background
● Interacts with platform engineers

Data Analytics Skills
Adv. Analytics Math/Stats ML/AI Scripting
Data Science Data Engineering
Programming Distributed Sys. Data Pipelines
* https://www.oreilly.com/ideas/data-engineers-vs-data-scientists
1DS ~ 5DE
DS DE DE DE DE DE

DataOps Teams*
1DS ~ 3DE
DS DE DE DE

DataOps Team
● DataOps Team
○ cross-functional
○ owns whole feature life cycle
○ dynamic
○ T-shaped
● Guilds & Feature Teams
● Data Platform AAS
○ Platform Engineers

Machine Learning Engineer Role (Fill The Gap)

Machine Learning Engineer Role (Coordinating)

● Coordination
● Improve communication
● Guards pragmatic development standards
● Sets (Agile) processes
● Makes DE<->DS handover smooth
● Balances the number of DE’s and DS’s
● Can work in both disciplines
● ML Engineer speciﬁc skills:
○ Custom ML algorithms
○ Custom ML solutions
○ ML model logistics
○ ML pipelines
ML Engineer
DE DEDS DEMLDS DS
ML Engineering

Challenge 1: Data Platform
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
Poor data quality

Challenge 1: Data Platform
● Before:
○ Data samples insights
○ Different teams: DS, DE, PE
○ Unsynchronized sprints
○ Loss of Focus
○ Too long time to market
○ Different levels of problem solving
■ Connectivity (PE)
■ Data Ingestion (DE)
■ EDA & Feature Engineering (D)
● After:
○ Feature teams: DE, DS, ME (PE)
○ Continuous Data Platform Improvements
○ Uniﬁed:
■ Data storage
■ Data Ingestion
■ Data Pre-processing
○ Early data injection from new sources
○ All data is available for experimenting
○ Less rework and handover iterations
○ Faster time to market

Challenge 2: Scalability of ML Methods (Tools)
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
This method is
not scalable

Challenge 2: Scalability of ML Methods (Tools)
● Before:
○ Horizontally scalable Data Platform AAS
○ Different teams
■ Different tools and standards
■ Unsynchronized sprints
○ No DE-DS coordination before deployment
■ Rework iterations
○ Lack of understanding of scalability
■ horizontal / vertical
○ Lack of understanding of ML stages
■ training / scoring
○ Unscalable tools: scikit-learn, R
○ Unscalable methods: Neural Nets
● After:
○ Feature teams: DE, DS, ME (PE)
○ Shared codebase
○ Standardised tooling
○ Reusable building blocks for ML Pipelines:
■ Notebooks (easy to use)
■ Cluster (production ready)
○ Testing strategy
○ Automated Deployment
○ DS modifying and deploying ML pipelines

Challenge 3: Model Serving
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
This model is too
slow for real-time
scoring

Challenge 3: Model Serving
● Before:
○ Single team: DS, DE
○ Lack of DS-DE coordination
○ Poorly scalable design
■ In-memory (big) data processing
○ Poorly scalable methods
■ Cos-nearest neighbors search
■ O(n) instead of const
○ Rework
○ Problems with real time scoring
● After:
○ Single team: DE, DS, ME
○ Models serving is planned early
○ Efﬁcient reﬁnements
○ Serving strategy drives solution design
○ Less rework

Role of ML engineer

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Role of ML engineer

Ähnlich wie Role of ML engineer (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Role of ML engineer