SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
A DATA SCIENTIST JOURNEY TO
INDUSTRIALIZATION OF MACHINE
LEARNING MODELS
DataXDay 2018
17th May 2018
@DataXDay
DATA SCIENCE
FOUNDATIONS FOR DATA SCIENCE AT AIR FRANCE
3
Adoption of Operations
Research for crew
scheduling
Extension to other
business domains:
Revenue Management,
Cargo, Ground
services, …
Adoption of
Hadoop
Focus on Machine
Learning
Ops Research is
now 120 engineers
in Paris and
Amsterdam
Adoption of data science within AFKL IT
was favored by existing Operations Research practice
@DataXDay
DATA SCIENCE
MACHINE LEARNING, SPONSORED BY ORGANIZATION
4
Organization, through Customer Data Management, is one of the key sponsors of
industrialized data science within AFKL
Customer
Data
Management
Customer data
strategy
Customer
knowledge
PersonalizationCoordinates IT efforts
@DataXDay
DATA SCIENCE
STARTING POINT FOR DATA SCIENCE PROJECT IS A POC LOGIC
DWH
Historical
Data
Business
Intelligence
LOCAL
Data
Sample
Proof of
Concept
5
@DataXDay
DATA SCIENCE
WHAT IS AN « INDUSTRIALIZED » ENGINE?
Jupyter notebook, R Executable package
On my own
Integrated within AFKL IT
live ecosystem
Manual launch or crontab
Automated calibration and
prediction
I guess my code is flawless Unit tested
Theoretical performance
Live feedback on
performance
6
@DataXDay
LOCAL
Data
Sample
Proof of
Concept
LIVE
Data feed
DATA SCIENCE
FROM LOCAL STUDIES… TO A ROBUST LIVE DATA PRODUCT
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA API
7
@DataXDay
DATA SCIENTISTS X DATA ENGINEERS
Fellowship
@DataXDay
DATA SCIENTISTS X DATA ENGINEERS
IT TAKES TWO TO BRING DATA PRODUCTS LIVE (AT LEAST)
9
PoC
Start of
industrialization
Help!
How to ingest and
expose data?
Live
Product
V1
Translates
business ideas into
data science
Stats,
ML, AI
Data Scientist
Dev,
Big data,
project
architecture
Data Engineer
@DataXDay
DATA SCIENTISTS X DATA ENGINEERS
KEEP THE FRONTIER LOOSE
10
Data scientist and data engineer
are roles, not persons
Awareness of data scientist role on
live environments is key
@DataXDay
LIVE
Data feed
DATA SCIENTISTS X DATA ENGINEERS
A LIVE ECOSYSTEM
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA API
11
@DataXDay
PACKAGING DATA SCIENCE
Spark and PEX
@DataXDay
PACKAGING DATA SCIENCE
WHAT DO YOU EXPECT?
13
Features
engineering
Algorithm « Model »
Model Training data
Trained
model
Trained
model
Prediction
data
Predictions
Setup
Train
Predict
We are expecting two main functionalities, training and predicting
@DataXDay
PACKAGING DATA SCIENCE
STANDARDIZATION WITH THE PIPELINE PATTERN
14
LogisticRegressionModel
.transform(dataset)
LogisticRegression
.fit(dataset)
Model training
Dataset
Dataset
+
Predictions
SQLTransformer VectorAssembler
Feature Engineering
Pipeline Model
@DataXDay
PACKAGING DATA SCIENCE
PEX, JUST LIKE UBERJAR
15
PEX
Project
package
External
packages
Company
packages
Company
packages
Company
packages
Company
packages
External
packages
External
packages
External
packages
@DataXDay
LIVE
Data feed
PACKAGING DATA SCIENCE
A LIVE ECOSYSTEM
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA
16
API
@DataXDay
LIVE
Data feed
PACKAGING DATA SCIENCE
A LIVE ECOSYSTEM… BUT TRAINING DATA AND LIVE DATA ARE DIFFERENT
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA
17
API
@DataXDay
FROM DWH TO DATALAKE
A detour
@DataXDay
FROM DWH TO DATALAKE
TRAINING DATA MUST BE THE SAME AS PRODUCTION
• Data warehouse has a full historical data
• Production platform processes just what is
needed from raw data for live apps
• Data processing on both side are not
identical
• Production platform has to create a full
historical data
19
@DataXDay
LIVE
Data feed
FROM DWH TO DATALAKE
FROM A HISTORICAL/LIVE SYSTEM
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA API
20
@DataXDay
LIVE
FROM DWH TO DATALAKE
TO A FULL LIVE SYSTEM
EXPLORATION
Historical
Data
Proof of
Concept
Predictions
DATA
21
Data feed Historical
Data
API
MODELS
Repository
@DataXDay
CONTINUOUS IMPROVEMENT
Growing up
22
@DataXDay
CONTINUOUS IMPROVEMENT
FROM BUD TO FLOWER
• Ease to deploy new model
• Ease to extract new feature
• Ease to access new data
• Stay innovative
• Time To Market
23
@DataXDay
CONTINUOUS IMPROVEMENT
CRAFTSMANSHIP FROM DATA SCIENTIST SIDE
24
@DataXDay
Goal
Make sure each code modification is
not breaking anything
What to do ?
Regularly fetch sources, build project
and run tests
Needs
Tools to automate all tedious
and repetitive tasks
Because we are lazy
CONTINUOUS IMPROVEMENT
CONTINUOUS INTEGRATION
25
@DataXDay
CDCIDevelopment
CONTINUOUS IMPROVEMENT
DATA SCIENTIST - SOFTWARE FACTORY
26
Exploration
Build PEX Expose PEX for
other IT teams
@DataXDay
CONTINUOUS IMPROVEMENT
TRACK MODEL VERSIONING
• Calibration meta data
• Dataset used
• Timestamp + Code version
• Keep track between models and
predictions
• Model used
• Unique ID of prediction
• Input dataset
27
@DataXDay
LIVE
CONTINUOUS IMPROVEMENT
FEEDBACK LOOP
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA
28
Data feed Historical
Data
API
feedback
Metrics
@DataXDay
NEXT STEP
Improve and share best practices
@DataXDay
NEXT STEP
TOO MANY JOURNEYS
• How to maintain the momentum, after few
teams started the adventure ?
• Every teams experienced a different
journey
• But every teams find different paths
30
DataXDay - A data scientist journey to industrialization of machine learning

Weitere ähnliche Inhalte

Was ist angesagt?

WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015Penn State University
 
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...WG_ Events
 
Collected List of Business Intelligence Software
Collected List of Business Intelligence SoftwareCollected List of Business Intelligence Software
Collected List of Business Intelligence SoftwareMaurice Dawson
 
Rule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armadaRule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armadaDaniel Buchholz
 
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)datacite
 
Big data and hadoop lightining talk
Big data and hadoop   lightining talkBig data and hadoop   lightining talk
Big data and hadoop lightining talkEsther Kundin
 
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge SpagoWorld
 
Cloud computing major project
Cloud computing major projectCloud computing major project
Cloud computing major projectayk115
 
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...SigortaTatbikatcilariDernegi
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftDr. Haxel Consult
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
 
Airline traffic management analysis
Airline traffic management analysisAirline traffic management analysis
Airline traffic management analysisSumit Mendiratta
 
OVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceOVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceMojtaba Imani
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMassimiliano Assante
 
VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3Vineet Anand
 
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...OW2
 
Real Time Reporting Platform
Real Time Reporting PlatformReal Time Reporting Platform
Real Time Reporting PlatformKyle Burke
 

Was ist angesagt? (20)

new-D2
new-D2new-D2
new-D2
 
SciDB
SciDBSciDB
SciDB
 
WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015
 
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Collected List of Business Intelligence Software
Collected List of Business Intelligence SoftwareCollected List of Business Intelligence Software
Collected List of Business Intelligence Software
 
Rule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armadaRule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armada
 
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
 
Big data and hadoop lightining talk
Big data and hadoop   lightining talkBig data and hadoop   lightining talk
Big data and hadoop lightining talk
 
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
 
Cloud computing major project
Cloud computing major projectCloud computing major project
Cloud computing major project
 
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
Airline traffic management analysis
Airline traffic management analysisAirline traffic management analysis
Airline traffic management analysis
 
OVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceOVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a Service
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU project
 
VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3
 
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
 
Real Time Reporting Platform
Real Time Reporting PlatformReal Time Reporting Platform
Real Time Reporting Platform
 

Ähnlich wie DataXDay - A data scientist journey to industrialization of machine learning

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorNicolas Sarramagna
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior
 
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsAutomatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsFabian Hardt
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...Thorsten Huelsmann
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningIRJET Journal
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaCloudera, Inc.
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4Ferdin Joe John Joseph PhD
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftPerficient, Inc.
 

Ähnlich wie DataXDay - A data scientist journey to industrialization of machine learning (20)

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sector
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsAutomatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine Learning
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
Ibisa platform EN
Ibisa platform ENIbisa platform EN
Ibisa platform EN
 
Oracle big data publix sector 1
Oracle big data publix sector 1Oracle big data publix sector 1
Oracle big data publix sector 1
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and Microsoft
 

Mehr von DataXDay Conference by Xebia

DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay Conference by Xebia
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay Conference by Xebia
 
DataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay Conference by Xebia
 
DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay Conference by Xebia
 

Mehr von DataXDay Conference by Xebia (6)

DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis
 
DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML
 
DataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at Scale
 
DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

DataXDay - A data scientist journey to industrialization of machine learning

  • 1. A DATA SCIENTIST JOURNEY TO INDUSTRIALIZATION OF MACHINE LEARNING MODELS DataXDay 2018 17th May 2018
  • 2. @DataXDay DATA SCIENCE FOUNDATIONS FOR DATA SCIENCE AT AIR FRANCE 3 Adoption of Operations Research for crew scheduling Extension to other business domains: Revenue Management, Cargo, Ground services, … Adoption of Hadoop Focus on Machine Learning Ops Research is now 120 engineers in Paris and Amsterdam Adoption of data science within AFKL IT was favored by existing Operations Research practice
  • 3. @DataXDay DATA SCIENCE MACHINE LEARNING, SPONSORED BY ORGANIZATION 4 Organization, through Customer Data Management, is one of the key sponsors of industrialized data science within AFKL Customer Data Management Customer data strategy Customer knowledge PersonalizationCoordinates IT efforts
  • 4. @DataXDay DATA SCIENCE STARTING POINT FOR DATA SCIENCE PROJECT IS A POC LOGIC DWH Historical Data Business Intelligence LOCAL Data Sample Proof of Concept 5
  • 5. @DataXDay DATA SCIENCE WHAT IS AN « INDUSTRIALIZED » ENGINE? Jupyter notebook, R Executable package On my own Integrated within AFKL IT live ecosystem Manual launch or crontab Automated calibration and prediction I guess my code is flawless Unit tested Theoretical performance Live feedback on performance 6
  • 6. @DataXDay LOCAL Data Sample Proof of Concept LIVE Data feed DATA SCIENCE FROM LOCAL STUDIES… TO A ROBUST LIVE DATA PRODUCT DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA API 7
  • 7. @DataXDay DATA SCIENTISTS X DATA ENGINEERS Fellowship
  • 8. @DataXDay DATA SCIENTISTS X DATA ENGINEERS IT TAKES TWO TO BRING DATA PRODUCTS LIVE (AT LEAST) 9 PoC Start of industrialization Help! How to ingest and expose data? Live Product V1 Translates business ideas into data science Stats, ML, AI Data Scientist Dev, Big data, project architecture Data Engineer
  • 9. @DataXDay DATA SCIENTISTS X DATA ENGINEERS KEEP THE FRONTIER LOOSE 10 Data scientist and data engineer are roles, not persons Awareness of data scientist role on live environments is key
  • 10. @DataXDay LIVE Data feed DATA SCIENTISTS X DATA ENGINEERS A LIVE ECOSYSTEM DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA API 11
  • 12. @DataXDay PACKAGING DATA SCIENCE WHAT DO YOU EXPECT? 13 Features engineering Algorithm « Model » Model Training data Trained model Trained model Prediction data Predictions Setup Train Predict We are expecting two main functionalities, training and predicting
  • 13. @DataXDay PACKAGING DATA SCIENCE STANDARDIZATION WITH THE PIPELINE PATTERN 14 LogisticRegressionModel .transform(dataset) LogisticRegression .fit(dataset) Model training Dataset Dataset + Predictions SQLTransformer VectorAssembler Feature Engineering Pipeline Model
  • 14. @DataXDay PACKAGING DATA SCIENCE PEX, JUST LIKE UBERJAR 15 PEX Project package External packages Company packages Company packages Company packages Company packages External packages External packages External packages
  • 15. @DataXDay LIVE Data feed PACKAGING DATA SCIENCE A LIVE ECOSYSTEM DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA 16 API
  • 16. @DataXDay LIVE Data feed PACKAGING DATA SCIENCE A LIVE ECOSYSTEM… BUT TRAINING DATA AND LIVE DATA ARE DIFFERENT DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA 17 API
  • 17. @DataXDay FROM DWH TO DATALAKE A detour
  • 18. @DataXDay FROM DWH TO DATALAKE TRAINING DATA MUST BE THE SAME AS PRODUCTION • Data warehouse has a full historical data • Production platform processes just what is needed from raw data for live apps • Data processing on both side are not identical • Production platform has to create a full historical data 19
  • 19. @DataXDay LIVE Data feed FROM DWH TO DATALAKE FROM A HISTORICAL/LIVE SYSTEM DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA API 20
  • 20. @DataXDay LIVE FROM DWH TO DATALAKE TO A FULL LIVE SYSTEM EXPLORATION Historical Data Proof of Concept Predictions DATA 21 Data feed Historical Data API MODELS Repository
  • 22. @DataXDay CONTINUOUS IMPROVEMENT FROM BUD TO FLOWER • Ease to deploy new model • Ease to extract new feature • Ease to access new data • Stay innovative • Time To Market 23
  • 24. @DataXDay Goal Make sure each code modification is not breaking anything What to do ? Regularly fetch sources, build project and run tests Needs Tools to automate all tedious and repetitive tasks Because we are lazy CONTINUOUS IMPROVEMENT CONTINUOUS INTEGRATION 25
  • 25. @DataXDay CDCIDevelopment CONTINUOUS IMPROVEMENT DATA SCIENTIST - SOFTWARE FACTORY 26 Exploration Build PEX Expose PEX for other IT teams
  • 26. @DataXDay CONTINUOUS IMPROVEMENT TRACK MODEL VERSIONING • Calibration meta data • Dataset used • Timestamp + Code version • Keep track between models and predictions • Model used • Unique ID of prediction • Input dataset 27
  • 27. @DataXDay LIVE CONTINUOUS IMPROVEMENT FEEDBACK LOOP EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA 28 Data feed Historical Data API feedback Metrics
  • 28. @DataXDay NEXT STEP Improve and share best practices
  • 29. @DataXDay NEXT STEP TOO MANY JOURNEYS • How to maintain the momentum, after few teams started the adventure ? • Every teams experienced a different journey • But every teams find different paths 30