Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
© OCTO 2015
Tél : +41 (0) 21 312 94 15
www.octo.com
Avenue du théatre, 7
1005 Lausanne SUISSEData Science & Machine Learni...
Alexandre Masselot
amasselot@octo.com
@alex_mass
Catherine Zwahlen
czwahlen@octo.com
2016 is the Year
of Big Data
@OCTO Switzerland
Big Data Romandie
OCTO PUBLICATIONS
OCTO TECHNOLOGY > THERE IS A BETTER WAY 4
WE ARE CONSUMING DATA SCIENCE EVERY DAY!
Facial recognition
Spam detection
WE ARE CONSUMING DATA SCIENCE EVERY DAY!
Voice
recognition
WE ARE CONSUMING DATA SCIENCE EVERY DAY!
Movie
recommendation
WE ARE CONSUMING DATA SCIENCE EVERY DAY!
9
DATA SCIENCE, A DOMAIN DRIVEN BY COMPETITION
To solve your business problems!
Problem Data Crowd
Knowledge
& Tools
Model...
OCTO Folks Work Hard, Play Hard
◉ Caisse de dépôts - score de délivrance d'un brevet européen
◉ Argus - prédiction du prix...
DATA SCIENCE TONIGHT
OCTO TECHNOLOGY > THERE IS A BETTER WAY 11
Visualization
1
2
3
4
Why the buzz about data science?
Dem...
12
“Data science is an interdisciplinary field about
processes and systems to extract knowledge
or insights from data”
OCT...
13OCTO TECHNOLOGY > THERE IS A BETTER WAY
Cray 2 iPhone 4=1 1
15OCTO TECHNOLOGY > THERE IS A BETTER WAY
16
AGILE DATA SCIENCE
OCTO TECHNOLOGY > THERE IS A BETTER WAY
DATA SCIENCE TONIGHT
OCTO TECHNOLOGY > THERE IS A BETTER WAY 17
Visualization
1
2
3
4
Why the buzz about data science?
Dem...
18
“Machine learning explores the study and
construction of algorithms that can learn
from and make predictions on data”
O...
19
MACHINE LEARNING
Conditions
OCTO TECHNOLOGY > THERE IS A BETTER WAY
1
2
3
A pattern exists
The problem cannot be descri...
20OCTO TECHNOLOGY > THERE IS A BETTER WAY
21
FLIGHT CHARACTERISTICS
OCTO TECHNOLOGY > THERE IS A BETTER WAY 21
Flight #
Dep
Airport
Dep
Hour
Dep
Week Day
Aircraft
M...
22
EVENTS
OCTO TECHNOLOGY > THERE IS A BETTER WAY 22
Flight #
Dep
Airport
Dep
Hour
Dep
Week Day
Aircraft
Model
…
Actual
De...
23
EVENTS
OCTO TECHNOLOGY > THERE IS A BETTER WAY 23
Flight #
Dep
Airport
Dep
Hour
Dep
Week Day
Aircraft
Model
…
Actual
De...
24
LABEL
OCTO TECHNOLOGY > THERE IS A BETTER WAY 24
Flight #
Dep
Airport
Dep
Hour
Dep
Week Day
Aircraft
Model
…
Actual
Del...
25
BUILD A MODEL
OCTO TECHNOLOGY > THERE IS A BETTER WAY
1 SYD 8:10 1 A330 0
2 SYD 14:15 2 B777 0
3 MEL 18:10 1 B777 0
4 P...
26
LOGISTIC REGRESSION
Classification algorithm
OCTO TECHNOLOGY > THERE IS A BETTER WAY
27
DECISION TREE
Classification algorithm
OCTO TECHNOLOGY > THERE IS A BETTER WAY
DoW
>5
Month
>5
PAX
>35%
AoD
=“SYD”
no
n...
28
RANDOM FOREST
Classification algorithm
OCTO TECHNOLOGY > THERE IS A BETTER WAY 28
29
TEST CLASSIFIER
OCTO TECHNOLOGY > THERE IS A BETTER WAY 29
Flight #
Dep
Airport
Dep
Hour
Dep
Week Day
Aircraft
Model
…
...
30
A PERFECT CLASSIFIER
OCTO TECHNOLOGY > THERE IS A BETTER WAY 30
Flight #
Dep
Airport
Dep
Hour
Dep
Week Day
Aircraft
Mod...
31
1
1
0
1
0
1
0
0
0
0
0
0
1
0
A MORE REALISTIC CLASSIFIER
OCTO TECHNOLOGY > THERE IS A BETTER WAY 31
Flight #
Dep
Airport...
32
CONFUSION MATRIX
The summary to optimize
OCTO TECHNOLOGY > THERE IS A BETTER WAY
32
Actually
delayed on time
Predicted
...
33
PERFORMANCE INDICATORS
OCTO TECHNOLOGY > THERE IS A BETTER WAY
33
Actually
delayed on time
Predicted
+
(delayed)
3 2
-
...
34
0.9
0.8
0.8
0.3
0.2
0.1
0.5
0.4
0.5
0.4
0.3
0.7
0.8
0.5
CLASSIFIER
Assigning a continuous score of being delayed
OCTO T...
35
PREDICTOR SCORE DISTRIBUTION
OCTO TECHNOLOGY > THERE IS A BETTER WAY 35
Score
Delayed flights
On time
flights
Eventscou...
36
PREDICTOR SCORE DISTRIBUTION
Fixing a score cutoff leads to false positive and negative
OCTO TECHNOLOGY > THERE IS A BE...
37
ROC CURVES TO COMPARE CLASSIFIERS
Fixing score cutoffs lead to different false positive and negative rates
OCTO TECHNOL...
38
ROC AND ROLL
 ROC allow to compare different models
 Area Under the Curve (AUC) is only a projection of the overall
p...
39
MODELS & DATA
Precision score for the TOP 20%
Traditional models Advanced models Advanced models
with more data
Advance...
40
MODELS & DATA
Traditional models Advanced models Advanced models
with more data
Advanced models
with more data
and more...
MODELS & DATA
Traditional models Advanced models Advanced models
with more data
Advanced models
with more data
and more fe...
42
FIGHT DELAY PREDICTION: RESULTS
All reasons for delays
 Overall improvement by a factor 3
Focus on air traffic
 Overa...
43
PREDICT NUMBER OF PASSENGERS ON A PLANE
Optimize catering
OCTO TECHNOLOGY > THERE IS A BETTER WAY 43
t0 - 4 hours t0
Fl...
44
RESULTS
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Passenger
difference
No model Model
< 5 55% 69%
< 10 80% 89%
$1-2M per ...
45
UNSTRUCTURED DATA
OCTO TECHNOLOGY > THERE IS A BETTER WAY
47
1
48
WHAT ARE THE FEATURES?
mimagesfortraining
n features
X
…
6
…
Y
49
WHAT ARE THE FEATURES?
5
4
3
2
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
4
5
5
5
5
4
1
4
4
1
0
1
5
4
5
1
0
0
0
1
5
1
5
0
0
...
50
WHAT ARE THE FEATURES?
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
4
5
5
5
5
4
1
4
4
1
0
1
5
4
5
1
0
0
0
1
5
1
5
0
0
0
0
0
5
4
4
...
51
NEURAL NETWORK
OCTO TECHNOLOGY > THERE IS A BETTER WAY
CAN COMPUTER VISION SPOT DISTRACTED DRIVERS?
 24 Juin 2016 – Julien Krywyk
OCTO TECHNOLOGY > THERE IS A BETTER WAY 52
Pho...
OCTO TECHNOLOGY > THERE IS A BETTER WAY 53
Build classifier
Train 22K images Test 80K images
Predicted
classes
X Y
Make pr...
DEEP LEARNING
OCTO TECHNOLOGY > THERE IS A BETTER WAY 54
Identify pixels
Identify edges and
simple shape
Identify complex
...
DEEP LEARNING
Transfer learning
OCTO TECHNOLOGY > THERE IS A BETTER WAY 55
n features
X Y
Features
extractions
pre-trained...
DATA SCIENCE TONIGHT
OCTO TECHNOLOGY > THERE IS A BETTER WAY 56
Visualization
1
2
3
4
Why the buzz about data science?
Dem...
57
VISUALIZATION
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Understand
Communicate
results & analysis
58
1880: TEXTILE PRODUCTION IN ENGLAND (OTTO NEURATH, ~1920)
Changing the world by educating people about the world around...
59
NAPOLEON 1812 CAMPAIGN (CHARLES MINARD, 1869)
OCTO TECHNOLOGY > THERE IS A BETTER WAY
60
HOW TRUMP PUSHED THE ELECTION MAP TO THE RIGHT (NEW YORK TIMES)
OCTO TECHNOLOGY > THERE IS A BETTER WAY
61
VISUALIZATION TO GET ACQUAINTED WITH DATA
OCTO TECHNOLOGY > THERE IS A BETTER WAY
EXPLORATION: FLIGHT DELAY PER MONTH AND DAY OF WEEK
63
DATA VISUALIZATION
Correlation between ‘Departure Hour’ and passenger delta
OCTO TECHNOLOGY > THERE IS A BETTER WAY 63
64
NOTEBOOKS
Interactive data analysis
OCTO TECHNOLOGY > THERE IS A BETTER WAY
65
VISUALIZATION AS A GAME CHANGER
OCTO TECHNOLOGY > THERE IS A BETTER WAY
66
VALIDATION
OCTO TECHNOLOGY > THERE IS A BETTER WAY
https://github.com/genentech/fishtones-js
DATA SCIENCE TONIGHT
OCTO TECHNOLOGY > THERE IS A BETTER WAY 69
Visualization
1
2
3
4
Why the buzz about data science?
Dem...
70
I WANT A DATA SCIENTIST!
OCTO TECHNOLOGY > THERE IS A BETTER WAY
71OCTO TECHNOLOGY > THERE IS A BETTER WAY
72
AGILE DATA SCIENCE
OCTO TECHNOLOGY > THERE IS A BETTER WAY
Agile Data science
Feature
Team
Operations
Business
analyst
Developper
tech expertProject
Manager
Data
scientist
Architect...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
BUILDING A DATALAB
OCTO TECHNOLOGY > THERE IS A BETTER WAY 75
Source System Collect, storage et data preparation Analysis ...
DEVOPS – EMBRACING NEW KNOW HOW
And new collaborations…
Data Scientist
• Innovates
• With new technologies
“What !? A unit...
OCTO TECHNOLOGY > THERE IS A BETTER WAY
78
DEMOCRATIZATION
 cours
OCTO TECHNOLOGY > THERE IS A BETTER WAY
1 million
enrollments
OCTO TECHNOLOGY > THERE IS A BETTER WAY
81
Business must be aware of opportunities to use
algorithms
BUSINESS & DATA SCIENCE
OCTO TECHNOLOGY > THERE IS A BETTER W...
USE CASES CLASSES AND THEIR BUSINESS VALUE
OCTO TECHNOLOGY > THERE IS A BETTER WAY 82
The prediction is a
support for deci...
OCTO TECHNOLOGY > THERE IS A BETTER WAY 83
???
???
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre et prédire
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre et prédire
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre et prédire
Nächste SlideShare
Wird geladen in …5
×

Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre et prédire

Pour notre troisième Afterwork sur le thème du « Big Data », nous proposons une introduction aux pratiques et bénéfices de la Data Science. Si les précédentes sessions ont dévoilé comment stocker et traiter de gros volumes de données à moindre coût, nous aborderons un nouvel aspect : comment découvrir les trésors d’information présents dans vos données.

Nous vous présenterons les grands principes du Machine Learning et la puissance de la visualisation. S’appuyant sur des retours d’expériences OCTO, nous réaliserons un tour d’horizon des méthodes et des outils disponibles.

A l’issue de cette présentation. vous aurez découvert des approches pragmatiques pour explorer et comprendre vos données. Voire prédire votre futur …

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre et prédire

  1. 1. © OCTO 2015 Tél : +41 (0) 21 312 94 15 www.octo.com Avenue du théatre, 7 1005 Lausanne SUISSEData Science & Machine Learning
  2. 2. Alexandre Masselot amasselot@octo.com @alex_mass Catherine Zwahlen czwahlen@octo.com
  3. 3. 2016 is the Year of Big Data @OCTO Switzerland Big Data Romandie
  4. 4. OCTO PUBLICATIONS OCTO TECHNOLOGY > THERE IS A BETTER WAY 4
  5. 5. WE ARE CONSUMING DATA SCIENCE EVERY DAY! Facial recognition
  6. 6. Spam detection WE ARE CONSUMING DATA SCIENCE EVERY DAY!
  7. 7. Voice recognition WE ARE CONSUMING DATA SCIENCE EVERY DAY!
  8. 8. Movie recommendation WE ARE CONSUMING DATA SCIENCE EVERY DAY!
  9. 9. 9 DATA SCIENCE, A DOMAIN DRIVEN BY COMPETITION To solve your business problems! Problem Data Crowd Knowledge & Tools Model for Prediction
  10. 10. OCTO Folks Work Hard, Play Hard ◉ Caisse de dépôts - score de délivrance d'un brevet européen ◉ Argus - prédiction du prix de vente de véhicules d'occasion ◉ SNCF - prédiction de la fréquentation des gares en Ile de France ◉ Imperial College London - Loan Default Prediction ◉ Allstate – purchase prediction challenge ◉ Tradeshift – Text classification ◉ Microsoft - Malware classification OCTO, there is a better way to learn, recruit and have fun! 1st 2&4 3rd 6th 13th 2nd 5th
  11. 11. DATA SCIENCE TONIGHT OCTO TECHNOLOGY > THERE IS A BETTER WAY 11 Visualization 1 2 3 4 Why the buzz about data science? Demystifying machine learning Data science in your business
  12. 12. 12 “Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data” OCTO TECHNOLOGY > THERE IS A BETTER WAY https://en.wikipedia.org/wiki/Data_science
  13. 13. 13OCTO TECHNOLOGY > THERE IS A BETTER WAY Cray 2 iPhone 4=1 1
  14. 14. 15OCTO TECHNOLOGY > THERE IS A BETTER WAY
  15. 15. 16 AGILE DATA SCIENCE OCTO TECHNOLOGY > THERE IS A BETTER WAY
  16. 16. DATA SCIENCE TONIGHT OCTO TECHNOLOGY > THERE IS A BETTER WAY 17 Visualization 1 2 3 4 Why the buzz about data science? Demystifying machine learning Data science in your business
  17. 17. 18 “Machine learning explores the study and construction of algorithms that can learn from and make predictions on data” OCTO TECHNOLOGY > THERE IS A BETTER WAY https://en.wikipedia.org/wiki/Machine_learning
  18. 18. 19 MACHINE LEARNING Conditions OCTO TECHNOLOGY > THERE IS A BETTER WAY 1 2 3 A pattern exists The problem cannot be described analytically by a mathematical formula Data, data, data Machine learning algorithms exists for many years In general, model performances improve with more data
  19. 19. 20OCTO TECHNOLOGY > THERE IS A BETTER WAY
  20. 20. 21 FLIGHT CHARACTERISTICS OCTO TECHNOLOGY > THERE IS A BETTER WAY 21 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model … 1 SYD 8:10 1 A330 2 SYD 14:15 2 B777 3 MEL 18:10 1 B777 4 PER 6:50 4 A320 5 SYD 9:50 3 A320 6 PER 12:10 1 A320 7 TZN 14:50 1 B777 8 MEL 14:15 4 A320 9 SYD 8:30 3 A320 10 MEL 16:40 1 A320 11 MEL 9:30 3 B747 12 TZN 9:30 1 A320 13 PER 9:50 3 A320 14 SYD 13:10 1 A320
  21. 21. 22 EVENTS OCTO TECHNOLOGY > THERE IS A BETTER WAY 22 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model … Actual Delay 1 SYD 8:10 1 A330 0 2 SYD 14:15 2 B777 3 3 MEL 18:10 1 B777 0 4 PER 6:50 4 A320 17 5 SYD 9:50 3 A320 0 6 PER 12:10 1 A320 23 7 TZN 14:50 1 B777 0 8 MEL 14:15 4 A320 0 9 SYD 8:30 3 A320 0 10 MEL 16:40 1 A320 12 11 MEL 9:30 3 B747 32 12 TZN 9:30 1 A320 20 13 PER 9:50 3 A320 0 14 SYD 13:10 1 A320 9
  22. 22. 23 EVENTS OCTO TECHNOLOGY > THERE IS A BETTER WAY 23 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model … Actual Delay 1 SYD 8:10 1 A330 0 2 SYD 14:15 2 B777 3 3 MEL 18:10 1 B777 0 4 PER 6:50 4 A320 17 5 SYD 9:50 3 A320 0 6 PER 12:10 1 A320 23 7 TZN 14:50 1 B777 0 8 MEL 14:15 4 A320 0 9 SYD 8:30 3 A320 0 10 MEL 16:40 1 A320 12 11 MEL 9:30 3 B747 32 12 TZN 9:30 1 A320 20 13 PER 9:50 3 A320 0 14 SYD 13:10 1 A320 9 A flight is labeled “delayed” if actual delay >= 15min
  23. 23. 24 LABEL OCTO TECHNOLOGY > THERE IS A BETTER WAY 24 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model … Actual Delay 1 SYD 8:10 1 A330 0 2 SYD 14:15 2 B777 3 3 MEL 18:10 1 B777 0 4 PER 6:50 4 A320 17 5 SYD 9:50 3 A320 0 6 PER 12:10 1 A320 23 7 TZN 14:50 1 B777 0 8 MEL 14:15 4 A320 0 9 SYD 8:30 3 A320 0 10 MEL 16:40 1 A320 12 11 MEL 9:30 3 B747 32 12 TZN 9:30 1 A320 20 13 PER 9:50 3 A320 0 14 SYD 13:10 1 A320 9 Class 0 0 0 1 0 1 0 0 0 0 1 1 0 0
  24. 24. 25 BUILD A MODEL OCTO TECHNOLOGY > THERE IS A BETTER WAY 1 SYD 8:10 1 A330 0 2 SYD 14:15 2 B777 0 3 MEL 18:10 1 B777 0 4 PER 6:50 4 A320 1 5 SYD 9:50 3 A320 0 6 PER 12:10 1 A320 1 7 TZN 14:50 1 B777 0 8 MEL 14:15 4 A320 0 9 SYD 8:30 3 A320 0 10 MEL 16:40 1 A320 0 … … … … … … 11 MEL 9:30 3 B747 1 12 TZN 9:30 1 A320 1 13 PER 9:50 3 A320 0 14 SYD 13:10 1 A320 0 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model Delay θ1 θ2 θ3 … θn X Y
  25. 25. 26 LOGISTIC REGRESSION Classification algorithm OCTO TECHNOLOGY > THERE IS A BETTER WAY
  26. 26. 27 DECISION TREE Classification algorithm OCTO TECHNOLOGY > THERE IS A BETTER WAY DoW >5 Month >5 PAX >35% AoD =“SYD” no no no yes yes yes yesno +- -+-
  27. 27. 28 RANDOM FOREST Classification algorithm OCTO TECHNOLOGY > THERE IS A BETTER WAY 28
  28. 28. 29 TEST CLASSIFIER OCTO TECHNOLOGY > THERE IS A BETTER WAY 29 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model … 9 SYD 8:30 3 A320 1 positive (delayed) 0 negative (on time)
  29. 29. 30 A PERFECT CLASSIFIER OCTO TECHNOLOGY > THERE IS A BETTER WAY 30 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model 1 SYD 8:10 1 A330 2 SYD 14:15 2 B777 3 MEL 18:10 1 B777 4 PER 6:50 4 A320 5 SYD 9:50 3 A320 6 PER 12:10 1 A320 7 TZN 14:50 1 B777 8 MEL 14:15 4 A320 9 SYD 8:30 3 A320 10 MEL 16:40 1 A320 11 MEL 9:30 3 B747 12 TZN 9:30 1 A320 13 PER 9:50 3 A320 14 SYD 13:10 1 A320 1 1 1 1 0 0 0 0 0 0 0 0 0 0
  30. 30. 31 1 1 0 1 0 1 0 0 0 0 0 0 1 0 A MORE REALISTIC CLASSIFIER OCTO TECHNOLOGY > THERE IS A BETTER WAY 31 Flight # Dep Airport Dep Hour Dep Week Day Aircraft Model 1 SYD 8:10 1 A330 2 SYD 14:15 2 B777 3 MEL 18:10 1 B777 4 PER 6:50 4 A320 5 SYD 9:50 3 A320 6 PER 12:10 1 A320 7 TZN 14:50 1 B777 8 MEL 14:15 4 A320 9 SYD 8:30 3 A320 10 MEL 16:40 1 A320 11 MEL 9:30 3 B747 12 TZN 9:30 1 A320 13 PER 9:50 3 A320 14 SYD 13:10 1 A320 Wrongly classified
  31. 31. 32 CONFUSION MATRIX The summary to optimize OCTO TECHNOLOGY > THERE IS A BETTER WAY 32 Actually delayed on time Predicted + (delayed) 3 2 - (on time) 1 8 True Positive False Negative False Positive True Negative
  32. 32. 33 PERFORMANCE INDICATORS OCTO TECHNOLOGY > THERE IS A BETTER WAY 33 Actually delayed on time Predicted + (delayed) 3 2 - (on time) 1 8 TP FN FP TN False Positive Rate = True Positive Rate = TP TP + FN FP FP + TN(1 – Specificity) (Sensitivity) Precision = TP TP + FP Recall = TP TP + FN
  33. 33. 34 0.9 0.8 0.8 0.3 0.2 0.1 0.5 0.4 0.5 0.4 0.3 0.7 0.8 0.5 CLASSIFIER Assigning a continuous score of being delayed OCTO TECHNOLOGY > THERE IS A BETTER WAY 34 0 1 +-
  34. 34. 35 PREDICTOR SCORE DISTRIBUTION OCTO TECHNOLOGY > THERE IS A BETTER WAY 35 Score Delayed flights On time flights Eventscount A perfect score cutoff
  35. 35. 36 PREDICTOR SCORE DISTRIBUTION Fixing a score cutoff leads to false positive and negative OCTO TECHNOLOGY > THERE IS A BETTER WAY 36 Score False Positive False Negative Eventscount
  36. 36. 37 ROC CURVES TO COMPARE CLASSIFIERS Fixing score cutoffs lead to different false positive and negative rates OCTO TECHNOLOGY > THERE IS A BETTER WAY 37 False Positive Rate TruePositiveRate 0 1 0 1
  37. 37. 38 ROC AND ROLL  ROC allow to compare different models  Area Under the Curve (AUC) is only a projection of the overall performance  Significantly different models can have close ROC  Other comparisons methods exists (and are intimately related to ROC): > Precision/Recall > LIFT A few comments about ROC curves OCTO TECHNOLOGY > THERE IS A BETTER WAY 38 AUC
  38. 38. 39 MODELS & DATA Precision score for the TOP 20% Traditional models Advanced models Advanced models with more data Advanced models with more data and more features Precision
  39. 39. 40 MODELS & DATA Traditional models Advanced models Advanced models with more data Advanced models with more data and more features Precision Precision score for the TOP 20%
  40. 40. MODELS & DATA Traditional models Advanced models Advanced models with more data Advanced models with more data and more features Precision Precision score for the TOP 20%
  41. 41. 42 FIGHT DELAY PREDICTION: RESULTS All reasons for delays  Overall improvement by a factor 3 Focus on air traffic  Overall improvement by a factor 6 Delay caused by passengers  No improvement 10% LIFT score OCTO TECHNOLOGY > THERE IS A BETTER WAY
  42. 42. 43 PREDICT NUMBER OF PASSENGERS ON A PLANE Optimize catering OCTO TECHNOLOGY > THERE IS A BETTER WAY 43 t0 - 4 hours t0 Flight Number Booked Departure port … Departure hour 0777 152 PER … 14 1116 201 SYD … 9 0961 92 BNE … 6 0538 189 MEL … 12 1078 136 SYD … 23 Final Number of passengers 164 186 125 189 87 t ? ~ 50 explanatory variables X y t0 - 1 hour
  43. 43. 44 RESULTS OCTO TECHNOLOGY > THERE IS A BETTER WAY Passenger difference No model Model < 5 55% 69% < 10 80% 89% $1-2M per year
  44. 44. 45 UNSTRUCTURED DATA OCTO TECHNOLOGY > THERE IS A BETTER WAY
  45. 45. 47 1
  46. 46. 48 WHAT ARE THE FEATURES? mimagesfortraining n features X … 6 … Y
  47. 47. 49 WHAT ARE THE FEATURES? 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 4 5 5 5 5 4 1 4 4 1 0 1 5 4 5 1 0 0 0 1 5 1 5 0 0 0 0 0 5 4 4 0 0 0 0 0 2 5 2 0 0 0 0 0 0 0 0
  48. 48. 50 WHAT ARE THE FEATURES? 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 4 5 5 5 5 4 1 4 4 1 0 1 5 4 5 1 0 0 0 1 5 1 5 0 0 0 0 0 5 4 4 0 0 0 0 0 2 5 2 0 0 0 0 0 0 0 0 = 6 (…) 6 6 3 … 0 7 n features mimagesfortraining X Y
  49. 49. 51 NEURAL NETWORK OCTO TECHNOLOGY > THERE IS A BETTER WAY
  50. 50. CAN COMPUTER VISION SPOT DISTRACTED DRIVERS?  24 Juin 2016 – Julien Krywyk OCTO TECHNOLOGY > THERE IS A BETTER WAY 52 Phone right Safe Text right Phone left Text left Speaking Makeup Behind Drink Radio
  51. 51. OCTO TECHNOLOGY > THERE IS A BETTER WAY 53 Build classifier Train 22K images Test 80K images Predicted classes X Y Make predictions ?
  52. 52. DEEP LEARNING OCTO TECHNOLOGY > THERE IS A BETTER WAY 54 Identify pixels Identify edges and simple shape Identify complex shapes and object Identify which shape to be used to define a human face
  53. 53. DEEP LEARNING Transfer learning OCTO TECHNOLOGY > THERE IS A BETTER WAY 55 n features X Y Features extractions pre-trained CNN
  54. 54. DATA SCIENCE TONIGHT OCTO TECHNOLOGY > THERE IS A BETTER WAY 56 Visualization 1 2 3 4 Why the buzz about data science? Demystifying machine learning Data science in your business
  55. 55. 57 VISUALIZATION OCTO TECHNOLOGY > THERE IS A BETTER WAY Understand Communicate results & analysis
  56. 56. 58 1880: TEXTILE PRODUCTION IN ENGLAND (OTTO NEURATH, ~1920) Changing the world by educating people about the world around them OCTO TECHNOLOGY > THERE IS A BETTER WAY
  57. 57. 59 NAPOLEON 1812 CAMPAIGN (CHARLES MINARD, 1869) OCTO TECHNOLOGY > THERE IS A BETTER WAY
  58. 58. 60 HOW TRUMP PUSHED THE ELECTION MAP TO THE RIGHT (NEW YORK TIMES) OCTO TECHNOLOGY > THERE IS A BETTER WAY
  59. 59. 61 VISUALIZATION TO GET ACQUAINTED WITH DATA OCTO TECHNOLOGY > THERE IS A BETTER WAY
  60. 60. EXPLORATION: FLIGHT DELAY PER MONTH AND DAY OF WEEK
  61. 61. 63 DATA VISUALIZATION Correlation between ‘Departure Hour’ and passenger delta OCTO TECHNOLOGY > THERE IS A BETTER WAY 63
  62. 62. 64 NOTEBOOKS Interactive data analysis OCTO TECHNOLOGY > THERE IS A BETTER WAY
  63. 63. 65 VISUALIZATION AS A GAME CHANGER OCTO TECHNOLOGY > THERE IS A BETTER WAY
  64. 64. 66 VALIDATION OCTO TECHNOLOGY > THERE IS A BETTER WAY
  65. 65. https://github.com/genentech/fishtones-js
  66. 66. DATA SCIENCE TONIGHT OCTO TECHNOLOGY > THERE IS A BETTER WAY 69 Visualization 1 2 3 4 Why the buzz about data science? Demystifying machine learning Data science in your business
  67. 67. 70 I WANT A DATA SCIENTIST! OCTO TECHNOLOGY > THERE IS A BETTER WAY
  68. 68. 71OCTO TECHNOLOGY > THERE IS A BETTER WAY
  69. 69. 72 AGILE DATA SCIENCE OCTO TECHNOLOGY > THERE IS A BETTER WAY
  70. 70. Agile Data science Feature Team Operations Business analyst Developper tech expertProject Manager Data scientist Architect Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more
  71. 71. OCTO TECHNOLOGY > THERE IS A BETTER WAY
  72. 72. BUILDING A DATALAB OCTO TECHNOLOGY > THERE IS A BETTER WAY 75 Source System Collect, storage et data preparation Analysis delivery External sources Datalab Existing infrastructure (multiples sources) ETL Extract cleanup, transfor m load Staging area Datawarehouse technical layer (referential/ Operation) Datamart technique (zone de collecte) Datamart (management, marketing, sales User access (Reporting, Analytics) Batch • Analyses • Indicators • Statistics Online • Dashboards • Reporting • Requests Administratio n • Admini • Validation
  73. 73. DEVOPS – EMBRACING NEW KNOW HOW And new collaborations… Data Scientist • Innovates • With new technologies “What !? A unit test on my neural network??? OPS • Look after rationalization “What!? Your piece of Scala calls a Python library embedding C ???”
  74. 74. OCTO TECHNOLOGY > THERE IS A BETTER WAY
  75. 75. 78 DEMOCRATIZATION  cours OCTO TECHNOLOGY > THERE IS A BETTER WAY 1 million enrollments
  76. 76. OCTO TECHNOLOGY > THERE IS A BETTER WAY
  77. 77. 81 Business must be aware of opportunities to use algorithms BUSINESS & DATA SCIENCE OCTO TECHNOLOGY > THERE IS A BETTER WAY Data must be easily accessible Focus on lowest time to market possible
  78. 78. USE CASES CLASSES AND THEIR BUSINESS VALUE OCTO TECHNOLOGY > THERE IS A BETTER WAY 82 The prediction is a support for decision Analyses support data understanding The prediction is the decision Business value
  79. 79. OCTO TECHNOLOGY > THERE IS A BETTER WAY 83 ??? ???

×