SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
NASDAG.org
Data Science in the Automotive Industry
I am an Automotive Management Professional and a Computer
Science Engineer from France, with an extensive experience in managing
complex projects in Supply Chain and IT, as well as starting, developing
and acquiring businesses in France, Russia, USA and the Middle East.
I came to Metis to understand, learn and practice how data science is
transforming the Automotive Business. During my projects, I focused on:
● Sentiment Analysis / Topic Modeling
● Predictive Behavior Modeling
● Driver Telematics
Philippe Dagher
Objective:
Categorize drivers based on their behaviour on the roads - their driving style
and the type of roads that they follow.
Challenge:
Identify uniquely a driver (and hence his proper “driving behaviour”) based on
the GPS log of a mobile phone located inside the car.
Idea:
Experiment Topic Modeling techniques especially Latent Semantic
Indexing/Analysis (LSI/LSA) and Latent Dirichlet Allocation (LDA) to explain the
observed trips by the unobserved behaviour of drivers.
Final Project @ Metis
Raw data for one trip
Machine learning approach (1/2)
❖ Preprocess the data using statistical smoothing and compression algorithms
➢ Kalman Filtering
➢ Ramer–Douglas–Peucker
❖ Extract road and driving style features
➢ per Segment: Length, Slip Angle, Convexity, Radius
➢ per Meter: Speed, Accelerations (tangential and normal), Jerk, Yaw, Pauses
❖ Bin the ouput and generate the Driving Alphabet
➢ ex: d0, d1, d2… v0, v1, v2… a0, a1, a2… etc
❖ Build the Driving Vocabulary - “Driving Slides” per meter
➢ ex: d3L4v2n3y1
➢ for various preprocessing sensitivities or features combinations (langages)
❖ Translate trips from GPS log into documents
➢ Tokenize, filter, … data is ready!
d1L6Br1 d1L8Sr1 d1L5Sr2 d1L6Ur2 d2L8Ur2 d3L4Sr3 d2L5Ur3 d3L4Ur4 d3L6Sr4 d3L7Sr3 d4L4Ur5 d4L3Ur5 d4L2Ur7 d5L4Sr6 d3L3Ur5 d4L3Sr6 d5L4Ur6 d4L3Ur7 d5L9Sr5
d2L5Ur4 d3L2Ur7 d6L1Sr9 d5L0Sr9 d5L1Sr9 d5L7Ur5 d2L6Ur2 d2L3Ur5 d4L1Ur8 d5L2Ur7 d6L10Sr5 d6L8Sr5 d2L4Ur3 d3L3Ur6 d5L4Srp1 v2a6n0j0y0p1 v1a6n0j3y0p1
v1a1n0j6y0p1 v1a11n0j6y0p1 v1a7n0j11y0p1 v1a16n0j7y0p1 v2a7n0j1y0p1 v2a6n0j2y0p1 v2a10n0j2y0p1 v3a6n1j3y0p1 v3a2n2j3y0p1 v3a5n2j3y0p1 v4a2n2j3y1p1
v4a5n2j5y1p1 v4a5n3j5y1p1 v4a4n3j1y1p1 v4a6n3j6y1p1 v4a5n4j5y1p1 v4a4n3j6y1p1 v4a5n4j0y1p1 v4a5n3j6y1p1 v4a5n2j9y1p1 v4a11n3j7y1p1 v3a2n2j7y0p1 v3a12n2j7y0p1
v2a1n1j3y0p1 v2a5n1j9y0p1 v2a11n1j9y0p1 v3a6n1j7y0p1 v3a5n1j7y0p1 v3a6n2j6y0p1 v3a6n1j34y0p1 v3a62n2j71y0p1 v8a56n11j38y2p1 v4a13n3j7y1p1 v4a4n3j4y1p1
v4a5n3j6y1p1 v4a4n2j6y1p1 v4a6n3j1y1p1 v3a5n2j2y0p1 v3a3n2j6y0p1 v3a11n1j4y0p1 v2a8n1j0y0p1 v2a7n1j7y0p1 v2a17n1j1y0p1 v2a10p1 v6a0n3j4y0p1 v6a6n3j7y0p1
v6a6n3j3y0p1 v6a1n3j3y0p1 v6a6n3j3y0p1 v6a5n2j1y0p1 v5a6n2j4y0p1 v5a6n2j3y0p1 v5a12n1j2y0p1 v4a9n1j0y0p1 v3a9n1j2y0p1 v3a5n0j3y0p1 v3a1n0j6y0p1 v3a11n0j6y0p1
v3a0n1j3y0p1 v3a6n1j0y0p1 v3a5n1j3y0p1 v3a11n0j6y0p1 v4a1n0j4y0p1 v4a6n0j3y0p1 v4a2n0j7y0p1 v4a13n0j11y0p1 v5a7n0j4y0p1 v5a1n0j0y0p1 v5a1n0j3y0p1
v5a6n0j6y0p1 v5a6n0j2y0p1 v5a2n0j7y0p1 v6a11n0j10y0p1 v6a6n0j3y0p1 v6a0n0j3y0p1 v6a5n0j6y0p1 v6a5n0j2y0p1 v6a1n0j1y0p1 v6a0n0j3y0p1 v6a6n0j7y0p1 v6a6n0j7y0p1
v6a6n0j7y0p1 v6a6n0j3y0p1 v6a0n0j2y0p1 v6a5n0j6y0p1 v6a5n0j7y0p1 v6a6n0j4y0p1 v6a0n1j3y1j3y0p1 v6a6n1j6y0p1 v6a5n1j2y0p1 v7a1n1j4y0p1 v5a3n1j1y0p1
v5a6n1j3y0p1 v5a10n1j3y0p1 v4a8n0j0y0p1 v3a8n0j0y0p1 v3a8n0j3y0p1 v2a10n0j1y0p1 v2a7n0j3y0p1 v2a6n0j7y0p1 v3a7n0j3y0p1 v2a7n0j6y0p1 v3a14n0j7y0p1
v3a4n0j4y0p1 v3a2n0j6y0p1 v3a12n0j3y0p1 v3a8n0j2y0p1 v3a5n0j0y0p1 v3a6n0j4y0p1 v4a1n0j3y0p1 v4a5n0j2y0p1 v4a1n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p2
v4a1n0j3y0p1 v4a6n0j7y0p1 v4a6n0j10y0p1 v4a11n0j6y0p1 v3a2n0j0y0p1 v3a1n0j3y0p1 v3a6n0j0y0p1 v3a6n0j0y0p1 v2a5n0j2y0p1 v2a3n0j5y0p1 v2a10n0j5y0p1
v1a2n0j0y0p1 v1a1n0j3y0p1 v1a5n0j10y0p1 v1a11n0j7y0p1 v1a3n0j7y0p1 v1a12n0j7y0p1 v2a3n0j1y0p1 v2a1n0j6y0p1 v2a11n0j10y0p1 v3a6n0j10y0p1 v3a12n0j7y0p1
v4a1n0j3y0p1 v4a5n0j10y0p1 v3a11n0j6y0p1 v4a2n0j3y0p1 v4a6n0j3y0p1 v5a0n0j7y0p1 v5a12n0j8y0p1 v5a4n0j4y0p1 v5a2n3j3y0p1 v5a3n3j4y0p1 v5a6n3j7y0p1
v5a6n3j5y0p1 v5a4n3j2y0p1 v5a1n3j3y0p1 v5a6n3j2y0p1 v5a1n2j4y0p1 v5a6n2j3y0p1 v5a2n3j4y0p1 v5a6n3j2y0p1 v5a6n2j3y0p1 v4a0n2j1y0p1 v4a2n2j1y0p1 v4a0n2j4y0p1
v4a6n2j7y0p1 v5a6n2j4y0p1 v4a5n2j0y0p1 v4a5n2j2y0p1 v4a9n2j2y0p1 v5a5n2j3y0p1 v5a9n3j1y0p1 v5a9n3j1y0p1 v5a7n1j2y0p1 d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0
d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v5n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v3n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v2n0y0
d5L0v2n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1vy1 d5L7v4n4y1 d5L7v4n3y1
d5L7v0n0y0 d5L7v0n0y0 d5L7v0n0y0 d5L7v1n0y0 d2L6v1n6y5 d2L6v2n8y6 d2L3v2n0y0 d2L3v2n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v4n0y0 d4L1v4n0y0
d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v5n0y0
d5L2v5n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d d6L10v3n2y0 d6L10v4n2y0
d6L10v3n1y0 d6L10v3n1y0 d6L10v2n1y0 d6L10v2n1y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v1n0y0
d6L8v1n0y0 d6L8v1n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v3n1y0 d6L8v3n2y0 d6L8v3n2y0 d6L8v4n2y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v4n3y1 d6L8v4n3y1
d6L8v4n4y1 d6L8v4n3y1 d6L8v4n4y1 d6L8v4n3y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v3n2y0 d6L8v3n2y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v3n1y0 d2L5v1n3y2
d2L5v1n2y2 d3L5v1n2y1 d3L5v2n3y2 d3L5v2n4y2 d3L5v2n6y3 d3L5v2n2y1 d3L5v2n2y1 d3L5v3n4y2 d4L6v2n5y3 d4L6v2n6y3 d4L6v3n8y3 d4L6v3n7y3 d4L6v3n7y3
d4L6v2n6y3 d4L6v2n4y2 d4L6v2n3y2 d2L6v1n12y11 d2L6v1n10y10 d1L1v1n0y0 d3L3v1n1y1 d3L3v1n1y0 d3L3v1n0y0 d3L3v1n0y0 d3L3v1n0y0 d2L8v0n3y6
Example of a translated trip
LDA: Bayesian Topic Model
Per trip
“Driving Behaviour”
proportions
for each trip select a distribution of
“Driving Behaviours”
Dirichlet
parameter
Corpus: possible “Driving
Behaviour” distributions
for trips
Per “Driving Slide”
“Driving Behaviour” assignment
for each “Driving Slide” select a “Driving Behaviour”
Observed
“Driving Slide”
select actual “Driving Slide”
from the slected “Driving
Behaviour”
“Driving Behaviours”
each “Driving Behaviour” is a
distribution of “Driving Slides”
“Driving Behaviour” hyperparameter
possible “Driving Slide” distributions
for “Driving Behaviours”
Posterior Inference in LDA
❖ Goal is to obtain this posterior:
➢ How much a trip contain of “Driving Behaviour” k( ) and
➢ “Driving Behaviour” “Driving Slides” assignements z
❖ Which means that I need to calculate:
❖ GENSIM Library
➢ a Python+NumPy implementation of online LDA for inputs larger than the available RAM
Example trip in the new LDA space
❖ 2736 drivers
❖ 200 trips/driver
Total : 547200 csv files (5.92 GB)
Challenge:
To come up with a "telematic fingerprint" capable of distinguishing when a trip
was driven by a given driver, knowing that among the 200 provided trips of
each driver, a few number of trips was not driven by him/her.
Submissions are judged on area under the ROC curve calculated in a global manner (all predictions
together).
Validation on a Kaggle Competition
❖ Transpose all trips into the new Driving Behaviours Space
❖ Take one by one each trip from a selected Driver
❖ Build a prediction model trained with all other trips in the dataset:
➢ Trues if they belong to the selected Driver
➢ Falses if they do not belong to this Driver
❖ Predict with the trained model, the belonging of the selected Trip to the Driver, then Ensemble
several predictions using various sensitivities to enhance the score...
For performance reasons I will proceed by batches of 10 or 20 selected trips and compare each
time to a randomly selected limited number of False trips
Other outlier detection / clustering techniques appear to be less performing
Machine learning approach (2/2)
MongoDB to hold 3.3 MM documents generated
Parallel processing setup on 4 DigitalOcean Droplets with 8CPU each
Gensim Library which implements three methods:
❖ latent semantic indexing (LSI, or LSA - A for Analysis)
❖ latent Dirichlet Allocation (LDA)
❖ random projections (RP)
Also, it implements online versions of each technique.
Setting the infrastructure
Predicting
❖ Achieving an AUC of 0.9 on Kaggle without any ensembling technique
which confirms the robustness of my approach...
Thank you
http://nasdag.org

Weitere ähnliche Inhalte

Andere mochten auch

Maison Fleuries Agen 2015
Maison Fleuries Agen 2015Maison Fleuries Agen 2015
Maison Fleuries Agen 2015villeagen
 
Segment 7
Segment 7Segment 7
Segment 7slhanna
 
Powerpoint software
Powerpoint softwarePowerpoint software
Powerpoint softwareFolguera94
 
Definición Integral Educación
Definición Integral EducaciónDefinición Integral Educación
Definición Integral EducaciónEduardo Mera
 
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...Luis Fernando Tascón Montes
 
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...Luis Fernando Tascón Montes
 
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994FactaMedia
 
Gestion de proyectos en la empresa con software open source slideshare
Gestion de proyectos en la empresa con software open source   slideshareGestion de proyectos en la empresa con software open source   slideshare
Gestion de proyectos en la empresa con software open source slideshareFENA Business School
 
Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)FactaMedia
 
Presentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casaPresentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casaJDdos
 
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996FactaMedia
 
Z la liberté contractuelle 546 pages
      Z la  liberté contractuelle 546 pages      Z la  liberté contractuelle 546 pages
Z la liberté contractuelle 546 pagesRabah HELAL
 
Navegadores y Correo Electr{onico
Navegadores y Correo Electr{onicoNavegadores y Correo Electr{onico
Navegadores y Correo Electr{onicoyamilethe
 

Andere mochten auch (20)

Maison Fleuries Agen 2015
Maison Fleuries Agen 2015Maison Fleuries Agen 2015
Maison Fleuries Agen 2015
 
Segment 7
Segment 7Segment 7
Segment 7
 
Documents
DocumentsDocuments
Documents
 
Powerpoint software
Powerpoint softwarePowerpoint software
Powerpoint software
 
Proyeto uyama
Proyeto uyamaProyeto uyama
Proyeto uyama
 
Tarea 4
Tarea 4Tarea 4
Tarea 4
 
Definición Integral Educación
Definición Integral EducaciónDefinición Integral Educación
Definición Integral Educación
 
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
Boletin de la Alcaldia de Palmira 143 por La Hora de Palmira (miércoles 25 de...
 
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...Boletín de la Alcaldía de Palmira 113  (jueves 7 de junio) por La Hora de Pal...
Boletín de la Alcaldía de Palmira 113 (jueves 7 de junio) por La Hora de Pal...
 
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
Arrêt de la CEDH - Affaire Otto Preminger contre l'Autriche - 20/09/1994
 
Gestion de proyectos en la empresa con software open source slideshare
Gestion de proyectos en la empresa con software open source   slideshareGestion de proyectos en la empresa con software open source   slideshare
Gestion de proyectos en la empresa con software open source slideshare
 
Miniquest
MiniquestMiniquest
Miniquest
 
Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)Conseil constitutionnel déchéance (1)
Conseil constitutionnel déchéance (1)
 
Bloque pacie
Bloque pacieBloque pacie
Bloque pacie
 
.l. .l.
.l. .l..l. .l.
.l. .l.
 
juego de primaria
juego de  primariajuego de  primaria
juego de primaria
 
Presentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casaPresentación colocar documento google docs en mestre a casa
Presentación colocar documento google docs en mestre a casa
 
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
Arrêt de la CEDH - Affaire Wingrove contre le Royaume-Uni - 25/11/1996
 
Z la liberté contractuelle 546 pages
      Z la  liberté contractuelle 546 pages      Z la  liberté contractuelle 546 pages
Z la liberté contractuelle 546 pages
 
Navegadores y Correo Electr{onico
Navegadores y Correo Electr{onicoNavegadores y Correo Electr{onico
Navegadores y Correo Electr{onico
 

Ähnlich wie Driving Behaviour as a Telematic Fingerprint

Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey businessRudy Stricklan
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics EngineLDBC council
 
Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Alexey Grigorev
 
Interpreting the data parallel analysis with sawzall
Interpreting the data  parallel analysis with sawzallInterpreting the data  parallel analysis with sawzall
Interpreting the data parallel analysis with sawzallLee David
 
Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Alexey Grigorev
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Alexey Grigorev
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
Weflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logisticWeflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logisticWeflex Team
 
ANPR based Security System Using ALR
ANPR based Security System Using ALRANPR based Security System Using ALR
ANPR based Security System Using ALRAshok Basnet
 
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"LogeekNightUkraine
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev
 
Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA Dublinked .
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesLeonardo Di Donato
 
Sad 07 drawing dfd supp
Sad 07 drawing dfd suppSad 07 drawing dfd supp
Sad 07 drawing dfd suppmentorrbuddy
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deploymentsOdoo
 
Creative Data Analysis with Python
Creative Data Analysis with PythonCreative Data Analysis with Python
Creative Data Analysis with PythonGrant Paton-Simpson
 
Das QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly PossibleDas QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly PossibleLeipziger Semantic Web Tag
 

Ähnlich wie Driving Behaviour as a Telematic Fingerprint (20)

Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey business
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
 
Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)
 
Interpreting the data parallel analysis with sawzall
Interpreting the data  parallel analysis with sawzallInterpreting the data  parallel analysis with sawzall
Interpreting the data parallel analysis with sawzall
 
Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)Fighting fraud: finding duplicates at scale (Highload+ 2019)
Fighting fraud: finding duplicates at scale (Highload+ 2019)
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
Weflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logisticWeflex - Optimize your warehouse logistic
Weflex - Optimize your warehouse logistic
 
ANPR based Security System Using ALR
ANPR based Security System Using ALRANPR based Security System Using ALR
ANPR based Security System Using ALR
 
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to Production
 
Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA Dublinked Innovation Network Transport Event - Peter Cranny, NTA
Dublinked Innovation Network Transport Event - Peter Cranny, NTA
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on Kubernetes
 
Sad 07 drawing dfd supp
Sad 07 drawing dfd suppSad 07 drawing dfd supp
Sad 07 drawing dfd supp
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deployments
 
Creative Data Analysis with Python
Creative Data Analysis with PythonCreative Data Analysis with Python
Creative Data Analysis with Python
 
Das QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly PossibleDas QROWD-Projekt - Because Big Data Integration is Humanly Possible
Das QROWD-Projekt - Because Big Data Integration is Humanly Possible
 

Kürzlich hochgeladen

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Kürzlich hochgeladen (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Driving Behaviour as a Telematic Fingerprint

  • 1. NASDAG.org Data Science in the Automotive Industry
  • 2. I am an Automotive Management Professional and a Computer Science Engineer from France, with an extensive experience in managing complex projects in Supply Chain and IT, as well as starting, developing and acquiring businesses in France, Russia, USA and the Middle East. I came to Metis to understand, learn and practice how data science is transforming the Automotive Business. During my projects, I focused on: ● Sentiment Analysis / Topic Modeling ● Predictive Behavior Modeling ● Driver Telematics Philippe Dagher
  • 3. Objective: Categorize drivers based on their behaviour on the roads - their driving style and the type of roads that they follow. Challenge: Identify uniquely a driver (and hence his proper “driving behaviour”) based on the GPS log of a mobile phone located inside the car. Idea: Experiment Topic Modeling techniques especially Latent Semantic Indexing/Analysis (LSI/LSA) and Latent Dirichlet Allocation (LDA) to explain the observed trips by the unobserved behaviour of drivers. Final Project @ Metis
  • 4. Raw data for one trip
  • 5. Machine learning approach (1/2) ❖ Preprocess the data using statistical smoothing and compression algorithms ➢ Kalman Filtering ➢ Ramer–Douglas–Peucker ❖ Extract road and driving style features ➢ per Segment: Length, Slip Angle, Convexity, Radius ➢ per Meter: Speed, Accelerations (tangential and normal), Jerk, Yaw, Pauses ❖ Bin the ouput and generate the Driving Alphabet ➢ ex: d0, d1, d2… v0, v1, v2… a0, a1, a2… etc ❖ Build the Driving Vocabulary - “Driving Slides” per meter ➢ ex: d3L4v2n3y1 ➢ for various preprocessing sensitivities or features combinations (langages) ❖ Translate trips from GPS log into documents ➢ Tokenize, filter, … data is ready!
  • 6. d1L6Br1 d1L8Sr1 d1L5Sr2 d1L6Ur2 d2L8Ur2 d3L4Sr3 d2L5Ur3 d3L4Ur4 d3L6Sr4 d3L7Sr3 d4L4Ur5 d4L3Ur5 d4L2Ur7 d5L4Sr6 d3L3Ur5 d4L3Sr6 d5L4Ur6 d4L3Ur7 d5L9Sr5 d2L5Ur4 d3L2Ur7 d6L1Sr9 d5L0Sr9 d5L1Sr9 d5L7Ur5 d2L6Ur2 d2L3Ur5 d4L1Ur8 d5L2Ur7 d6L10Sr5 d6L8Sr5 d2L4Ur3 d3L3Ur6 d5L4Srp1 v2a6n0j0y0p1 v1a6n0j3y0p1 v1a1n0j6y0p1 v1a11n0j6y0p1 v1a7n0j11y0p1 v1a16n0j7y0p1 v2a7n0j1y0p1 v2a6n0j2y0p1 v2a10n0j2y0p1 v3a6n1j3y0p1 v3a2n2j3y0p1 v3a5n2j3y0p1 v4a2n2j3y1p1 v4a5n2j5y1p1 v4a5n3j5y1p1 v4a4n3j1y1p1 v4a6n3j6y1p1 v4a5n4j5y1p1 v4a4n3j6y1p1 v4a5n4j0y1p1 v4a5n3j6y1p1 v4a5n2j9y1p1 v4a11n3j7y1p1 v3a2n2j7y0p1 v3a12n2j7y0p1 v2a1n1j3y0p1 v2a5n1j9y0p1 v2a11n1j9y0p1 v3a6n1j7y0p1 v3a5n1j7y0p1 v3a6n2j6y0p1 v3a6n1j34y0p1 v3a62n2j71y0p1 v8a56n11j38y2p1 v4a13n3j7y1p1 v4a4n3j4y1p1 v4a5n3j6y1p1 v4a4n2j6y1p1 v4a6n3j1y1p1 v3a5n2j2y0p1 v3a3n2j6y0p1 v3a11n1j4y0p1 v2a8n1j0y0p1 v2a7n1j7y0p1 v2a17n1j1y0p1 v2a10p1 v6a0n3j4y0p1 v6a6n3j7y0p1 v6a6n3j3y0p1 v6a1n3j3y0p1 v6a6n3j3y0p1 v6a5n2j1y0p1 v5a6n2j4y0p1 v5a6n2j3y0p1 v5a12n1j2y0p1 v4a9n1j0y0p1 v3a9n1j2y0p1 v3a5n0j3y0p1 v3a1n0j6y0p1 v3a11n0j6y0p1 v3a0n1j3y0p1 v3a6n1j0y0p1 v3a5n1j3y0p1 v3a11n0j6y0p1 v4a1n0j4y0p1 v4a6n0j3y0p1 v4a2n0j7y0p1 v4a13n0j11y0p1 v5a7n0j4y0p1 v5a1n0j0y0p1 v5a1n0j3y0p1 v5a6n0j6y0p1 v5a6n0j2y0p1 v5a2n0j7y0p1 v6a11n0j10y0p1 v6a6n0j3y0p1 v6a0n0j3y0p1 v6a5n0j6y0p1 v6a5n0j2y0p1 v6a1n0j1y0p1 v6a0n0j3y0p1 v6a6n0j7y0p1 v6a6n0j7y0p1 v6a6n0j7y0p1 v6a6n0j3y0p1 v6a0n0j2y0p1 v6a5n0j6y0p1 v6a5n0j7y0p1 v6a6n0j4y0p1 v6a0n1j3y1j3y0p1 v6a6n1j6y0p1 v6a5n1j2y0p1 v7a1n1j4y0p1 v5a3n1j1y0p1 v5a6n1j3y0p1 v5a10n1j3y0p1 v4a8n0j0y0p1 v3a8n0j0y0p1 v3a8n0j3y0p1 v2a10n0j1y0p1 v2a7n0j3y0p1 v2a6n0j7y0p1 v3a7n0j3y0p1 v2a7n0j6y0p1 v3a14n0j7y0p1 v3a4n0j4y0p1 v3a2n0j6y0p1 v3a12n0j3y0p1 v3a8n0j2y0p1 v3a5n0j0y0p1 v3a6n0j4y0p1 v4a1n0j3y0p1 v4a5n0j2y0p1 v4a1n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p1 v4a0n0j0y0p2 v4a1n0j3y0p1 v4a6n0j7y0p1 v4a6n0j10y0p1 v4a11n0j6y0p1 v3a2n0j0y0p1 v3a1n0j3y0p1 v3a6n0j0y0p1 v3a6n0j0y0p1 v2a5n0j2y0p1 v2a3n0j5y0p1 v2a10n0j5y0p1 v1a2n0j0y0p1 v1a1n0j3y0p1 v1a5n0j10y0p1 v1a11n0j7y0p1 v1a3n0j7y0p1 v1a12n0j7y0p1 v2a3n0j1y0p1 v2a1n0j6y0p1 v2a11n0j10y0p1 v3a6n0j10y0p1 v3a12n0j7y0p1 v4a1n0j3y0p1 v4a5n0j10y0p1 v3a11n0j6y0p1 v4a2n0j3y0p1 v4a6n0j3y0p1 v5a0n0j7y0p1 v5a12n0j8y0p1 v5a4n0j4y0p1 v5a2n3j3y0p1 v5a3n3j4y0p1 v5a6n3j7y0p1 v5a6n3j5y0p1 v5a4n3j2y0p1 v5a1n3j3y0p1 v5a6n3j2y0p1 v5a1n2j4y0p1 v5a6n2j3y0p1 v5a2n3j4y0p1 v5a6n3j2y0p1 v5a6n2j3y0p1 v4a0n2j1y0p1 v4a2n2j1y0p1 v4a0n2j4y0p1 v4a6n2j7y0p1 v5a6n2j4y0p1 v4a5n2j0y0p1 v4a5n2j2y0p1 v4a9n2j2y0p1 v5a5n2j3y0p1 v5a9n3j1y0p1 v5a9n3j1y0p1 v5a7n1j2y0p1 d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0 d6L1v5n0y0 d6L1v4n0y0 d6L1v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v5n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v4n0y0 d5L0v3n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v2n0y0 d5L0v2n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L0v2n0y0 d5L0v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1v3n0y0 d5L1vy1 d5L7v4n4y1 d5L7v4n3y1 d5L7v0n0y0 d5L7v0n0y0 d5L7v0n0y0 d5L7v1n0y0 d2L6v1n6y5 d2L6v2n8y6 d2L3v2n0y0 d2L3v2n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v3n0y0 d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d4L1v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v5n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d5L2v5n0y0 d5L2v4n0y0 d d6L10v3n2y0 d6L10v4n2y0 d6L10v3n1y0 d6L10v3n1y0 d6L10v2n1y0 d6L10v2n1y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v2n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L10v1n0y0 d6L8v1n0y0 d6L8v1n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v2n0y0 d6L8v3n1y0 d6L8v3n2y0 d6L8v3n2y0 d6L8v4n2y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v4n3y1 d6L8v4n3y1 d6L8v4n4y1 d6L8v4n3y1 d6L8v4n4y1 d6L8v4n3y1 d6L8v4n2y1 d6L8v4n3y1 d6L8v3n2y0 d6L8v3n2y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v2n1y0 d6L8v3n1y0 d2L5v1n3y2 d2L5v1n2y2 d3L5v1n2y1 d3L5v2n3y2 d3L5v2n4y2 d3L5v2n6y3 d3L5v2n2y1 d3L5v2n2y1 d3L5v3n4y2 d4L6v2n5y3 d4L6v2n6y3 d4L6v3n8y3 d4L6v3n7y3 d4L6v3n7y3 d4L6v2n6y3 d4L6v2n4y2 d4L6v2n3y2 d2L6v1n12y11 d2L6v1n10y10 d1L1v1n0y0 d3L3v1n1y1 d3L3v1n1y0 d3L3v1n0y0 d3L3v1n0y0 d3L3v1n0y0 d2L8v0n3y6 Example of a translated trip
  • 7. LDA: Bayesian Topic Model Per trip “Driving Behaviour” proportions for each trip select a distribution of “Driving Behaviours” Dirichlet parameter Corpus: possible “Driving Behaviour” distributions for trips Per “Driving Slide” “Driving Behaviour” assignment for each “Driving Slide” select a “Driving Behaviour” Observed “Driving Slide” select actual “Driving Slide” from the slected “Driving Behaviour” “Driving Behaviours” each “Driving Behaviour” is a distribution of “Driving Slides” “Driving Behaviour” hyperparameter possible “Driving Slide” distributions for “Driving Behaviours”
  • 8. Posterior Inference in LDA ❖ Goal is to obtain this posterior: ➢ How much a trip contain of “Driving Behaviour” k( ) and ➢ “Driving Behaviour” “Driving Slides” assignements z ❖ Which means that I need to calculate: ❖ GENSIM Library ➢ a Python+NumPy implementation of online LDA for inputs larger than the available RAM
  • 9. Example trip in the new LDA space
  • 10. ❖ 2736 drivers ❖ 200 trips/driver Total : 547200 csv files (5.92 GB) Challenge: To come up with a "telematic fingerprint" capable of distinguishing when a trip was driven by a given driver, knowing that among the 200 provided trips of each driver, a few number of trips was not driven by him/her. Submissions are judged on area under the ROC curve calculated in a global manner (all predictions together). Validation on a Kaggle Competition
  • 11. ❖ Transpose all trips into the new Driving Behaviours Space ❖ Take one by one each trip from a selected Driver ❖ Build a prediction model trained with all other trips in the dataset: ➢ Trues if they belong to the selected Driver ➢ Falses if they do not belong to this Driver ❖ Predict with the trained model, the belonging of the selected Trip to the Driver, then Ensemble several predictions using various sensitivities to enhance the score... For performance reasons I will proceed by batches of 10 or 20 selected trips and compare each time to a randomly selected limited number of False trips Other outlier detection / clustering techniques appear to be less performing Machine learning approach (2/2)
  • 12. MongoDB to hold 3.3 MM documents generated Parallel processing setup on 4 DigitalOcean Droplets with 8CPU each Gensim Library which implements three methods: ❖ latent semantic indexing (LSI, or LSA - A for Analysis) ❖ latent Dirichlet Allocation (LDA) ❖ random projections (RP) Also, it implements online versions of each technique. Setting the infrastructure
  • 13. Predicting ❖ Achieving an AUC of 0.9 on Kaggle without any ensembling technique which confirms the robustness of my approach...