SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Role of Machine Learning
Engineer
Borys Biletskyy
Data Science Amsterdam
28-05-2019
Agenda
1. About Myself
2. Motivation
3. Data Science Process
4. Roles in Data Analytics
5. 3 Challenges for ML Engineer
About Myself
● Software Engineer since 2004
○ Low level, C++ -> Enterprise, Java -> Data Driven, Scala
○ Dev, Tech Lead, Architect, Consultant
● Researcher since 2004
○ PhD in Theoretical Computer Science
○ Complexity and Scalability of ML Methods
● Machine Learning Engineer since 2017
○ Python, Scala
○ LeasePlan, Randstad, VodafoneZiggo
Motivation
● Low success rate of Data Analytics projects
○ Gartner: 60% of Data Analytics projects fail*
● General C-level recommendations
○ The Data Economy: Why do so many analytics projects fail?**
○ 8 Reasons why Data Analytics projects fail***
○ ...
● Often the problem is in a team structure
○ How Machine Learning Engineer role can help
* - https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/
** - https://www.dataversity.net/many-data-analytics-projects-fail-save/
*** - https://www.eastbanctech.com/technology-insights/what-the-tech/why-so-many-analytics-projects-fail.html
Data Science Process*
Define
Goal
Data
Collection
Deploy
Model
Serve Model
(Request|Batch|Stream)
Modeling Validation
Monitor
*https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be
Feature
Engineering
Exploratory
Data
Analysis
Data
Pre-Processing
Data Science Process*
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
(Request|Batch|Stream)
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
poor data quality
can’t scale this method
horizontally
model is too slow
for streaming
*https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be
DS DE
DE-DS handover is slow
Adv. Analytics Math/Stats ML/AI Scripting Programming Distributed Sys. Data Pipelines
Data Scientist & Data Engineer
● Fast insights driven
● Small applications
● Highly dynamic development
● Interactive notebook scripts
● Running on laptop
● Academic background
● Interacts with business/domain experts
● Agile
● Production systems
● QA and processes
● Modular, reusable, maintainable, scalable
● Running on cluster
● Engineering Background
● Interacts with platform engineers
Data Analytics Skills
Adv. Analytics Math/Stats ML/AI Scripting
Data Science Data Engineering
Programming Distributed Sys. Data Pipelines
* https://www.oreilly.com/ideas/data-engineers-vs-data-scientists
1DS ~ 5DE
DS DE DE DE DE DE
DataOps Teams*
1DS ~ 3DE
Adv. Analytics Math/Stats ML/AI Scripting
Data Science Data Engineering
Programming Distributed Sys. Data Pipelines
DS DE DE DE
DataOps Team
● DataOps Team
○ cross-functional
○ owns whole feature life cycle
○ dynamic
○ T-shaped
● Guilds & Feature Teams
● Data Platform AAS
○ Platform Engineers
Adv. Analytics Math/Stats ML/AI Scripting
Data Science Data Engineering
Programming Distributed Sys. Data Pipelines
Machine Learning Engineer Role (Fill The Gap)
Adv. Analytics Math/Stats ML/AI Scripting
Data Science Data Engineering
Programming Distributed Sys. Data Pipelines
Machine Learning Engineer Role (Coordinating)
Adv. Analytics Math/Stats ML/AI Scripting
Data Science Data Engineering
Programming Distributed Sys. Data Pipelines
● Coordination
● Improve communication
● Guards pragmatic development standards
● Sets (Agile) processes
● Makes DE<->DS handover smooth
● Balances the number of DE’s and DS’s
● Can work in both disciplines
● ML Engineer specific skills:
○ Custom ML algorithms
○ Custom ML solutions
○ ML model logistics
○ ML pipelines
ML Engineer
DE DEDS DEMLDS DS
Adv. Analytics Math/Stats ML/AI Scripting
Data Science Data Engineering
Programming Distributed Sys. Data Pipelines
ML Engineering
3 Challenges for ML Engineer
Challenge 1: Data Platform
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
(Request|Batch|Stream)
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
Poor data quality
Challenge 1: Data Platform
● Before:
○ Data samples insights
○ Different teams: DS, DE, PE
○ Unsynchronized sprints
○ Loss of Focus
○ Too long time to market
○ Different levels of problem solving
■ Connectivity (PE)
■ Data Ingestion (DE)
■ EDA & Feature Engineering (D)
● After:
○ Feature teams: DE, DS, ME (PE)
○ Continuous Data Platform Improvements
○ Unified:
■ Data storage
■ Data Ingestion
■ Data Pre-processing
○ Early data injection from new sources
○ All data is available for experimenting
○ Less rework and handover iterations
○ Faster time to market
Challenge 2: Scalability of ML Methods (Tools)
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
(Request|Batch|Stream)
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
This method is
not scalable
Challenge 2: Scalability of ML Methods (Tools)
● Before:
○ Horizontally scalable Data Platform AAS
○ Different teams
■ Different tools and standards
■ Unsynchronized sprints
○ No DE-DS coordination before deployment
■ Rework iterations
○ Lack of understanding of scalability
■ horizontal / vertical
○ Lack of understanding of ML stages
■ training / scoring
○ Unscalable tools: scikit-learn, R
○ Unscalable methods: Neural Nets
● After:
○ Feature teams: DE, DS, ME (PE)
○ Shared codebase
○ Standardised tooling
○ Reusable building blocks for ML Pipelines:
■ Notebooks (easy to use)
■ Cluster (production ready)
○ Testing strategy
○ Automated Deployment
○ DS modifying and deploying ML pipelines
○ Faster time to market
Challenge 3: Model Serving
Define
Goal
Data
Collection
DS
Feature
Engineering
DS
Exploratory
Data
Analysis
DSDE
Data
Pre-Processing
DE
Deploy
Model
DE
Serve Model
(Request|Batch|Stream)
DE
Modeling
DS
Validation
DSDE
Monitor
DS DE
This model is too
slow for real-time
scoring
Challenge 3: Model Serving
● Before:
○ Single team: DS, DE
○ Lack of DS-DE coordination
○ Poorly scalable design
■ In-memory (big) data processing
○ Poorly scalable methods
■ Cos-nearest neighbors search
■ O(n) instead of const
○ Rework
○ Problems with real time scoring
● After:
○ Single team: DE, DS, ME
○ Models serving is planned early
○ Efficient refinements
○ Serving strategy drives solution design
○ Less rework
○ Faster time to market
Q&A

Weitere ähnliche Inhalte

Ähnlich wie Role of ML engineer

Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD
 

Ähnlich wie Role of ML engineer (20)

How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
 
Research skills
Research skillsResearch skills
Research skills
 
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
 
How Data Science Can Grow Your Business?
How Data Science Can Grow Your Business?How Data Science Can Grow Your Business?
How Data Science Can Grow Your Business?
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Microsoft Machine Learning Smackdown
Microsoft Machine Learning SmackdownMicrosoft Machine Learning Smackdown
Microsoft Machine Learning Smackdown
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career Guidance
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Data Engineer Course In Bangalore-October
Data Engineer Course In Bangalore-OctoberData Engineer Course In Bangalore-October
Data Engineer Course In Bangalore-October
 

Kürzlich hochgeladen

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Kürzlich hochgeladen (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Role of ML engineer

  • 1. Role of Machine Learning Engineer Borys Biletskyy Data Science Amsterdam 28-05-2019
  • 2. Agenda 1. About Myself 2. Motivation 3. Data Science Process 4. Roles in Data Analytics 5. 3 Challenges for ML Engineer
  • 3. About Myself ● Software Engineer since 2004 ○ Low level, C++ -> Enterprise, Java -> Data Driven, Scala ○ Dev, Tech Lead, Architect, Consultant ● Researcher since 2004 ○ PhD in Theoretical Computer Science ○ Complexity and Scalability of ML Methods ● Machine Learning Engineer since 2017 ○ Python, Scala ○ LeasePlan, Randstad, VodafoneZiggo
  • 4. Motivation ● Low success rate of Data Analytics projects ○ Gartner: 60% of Data Analytics projects fail* ● General C-level recommendations ○ The Data Economy: Why do so many analytics projects fail?** ○ 8 Reasons why Data Analytics projects fail*** ○ ... ● Often the problem is in a team structure ○ How Machine Learning Engineer role can help * - https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/ ** - https://www.dataversity.net/many-data-analytics-projects-fail-save/ *** - https://www.eastbanctech.com/technology-insights/what-the-tech/why-so-many-analytics-projects-fail.html
  • 5. Data Science Process* Define Goal Data Collection Deploy Model Serve Model (Request|Batch|Stream) Modeling Validation Monitor *https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be Feature Engineering Exploratory Data Analysis Data Pre-Processing
  • 6. Data Science Process* Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE poor data quality can’t scale this method horizontally model is too slow for streaming *https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be DS DE DE-DS handover is slow
  • 7. Adv. Analytics Math/Stats ML/AI Scripting Programming Distributed Sys. Data Pipelines Data Scientist & Data Engineer ● Fast insights driven ● Small applications ● Highly dynamic development ● Interactive notebook scripts ● Running on laptop ● Academic background ● Interacts with business/domain experts ● Agile ● Production systems ● QA and processes ● Modular, reusable, maintainable, scalable ● Running on cluster ● Engineering Background ● Interacts with platform engineers
  • 8. Data Analytics Skills Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines * https://www.oreilly.com/ideas/data-engineers-vs-data-scientists 1DS ~ 5DE DS DE DE DE DE DE
  • 9. DataOps Teams* 1DS ~ 3DE Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines DS DE DE DE
  • 10. DataOps Team ● DataOps Team ○ cross-functional ○ owns whole feature life cycle ○ dynamic ○ T-shaped ● Guilds & Feature Teams ● Data Platform AAS ○ Platform Engineers Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines
  • 11. Machine Learning Engineer Role (Fill The Gap) Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines
  • 12. Machine Learning Engineer Role (Coordinating) Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines
  • 13. ● Coordination ● Improve communication ● Guards pragmatic development standards ● Sets (Agile) processes ● Makes DE<->DS handover smooth ● Balances the number of DE’s and DS’s ● Can work in both disciplines ● ML Engineer specific skills: ○ Custom ML algorithms ○ Custom ML solutions ○ ML model logistics ○ ML pipelines ML Engineer DE DEDS DEMLDS DS Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines ML Engineering
  • 14. 3 Challenges for ML Engineer
  • 15. Challenge 1: Data Platform Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE Poor data quality
  • 16. Challenge 1: Data Platform ● Before: ○ Data samples insights ○ Different teams: DS, DE, PE ○ Unsynchronized sprints ○ Loss of Focus ○ Too long time to market ○ Different levels of problem solving ■ Connectivity (PE) ■ Data Ingestion (DE) ■ EDA & Feature Engineering (D) ● After: ○ Feature teams: DE, DS, ME (PE) ○ Continuous Data Platform Improvements ○ Unified: ■ Data storage ■ Data Ingestion ■ Data Pre-processing ○ Early data injection from new sources ○ All data is available for experimenting ○ Less rework and handover iterations ○ Faster time to market
  • 17. Challenge 2: Scalability of ML Methods (Tools) Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE This method is not scalable
  • 18. Challenge 2: Scalability of ML Methods (Tools) ● Before: ○ Horizontally scalable Data Platform AAS ○ Different teams ■ Different tools and standards ■ Unsynchronized sprints ○ No DE-DS coordination before deployment ■ Rework iterations ○ Lack of understanding of scalability ■ horizontal / vertical ○ Lack of understanding of ML stages ■ training / scoring ○ Unscalable tools: scikit-learn, R ○ Unscalable methods: Neural Nets ● After: ○ Feature teams: DE, DS, ME (PE) ○ Shared codebase ○ Standardised tooling ○ Reusable building blocks for ML Pipelines: ■ Notebooks (easy to use) ■ Cluster (production ready) ○ Testing strategy ○ Automated Deployment ○ DS modifying and deploying ML pipelines ○ Faster time to market
  • 19. Challenge 3: Model Serving Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE This model is too slow for real-time scoring
  • 20. Challenge 3: Model Serving ● Before: ○ Single team: DS, DE ○ Lack of DS-DE coordination ○ Poorly scalable design ■ In-memory (big) data processing ○ Poorly scalable methods ■ Cos-nearest neighbors search ■ O(n) instead of const ○ Rework ○ Problems with real time scoring ● After: ○ Single team: DE, DS, ME ○ Models serving is planned early ○ Efficient refinements ○ Serving strategy drives solution design ○ Less rework ○ Faster time to market
  • 21. Q&A