Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches

Driving Style Analysis
based on Trip Segmentation.
A Comparative Multi-Technique Approach
Marco Brambilla, Andrea Mauri, Paolo Mascetti
@marcobrambi

Agenda
Intro
Problem Definition
Dataset
Data Exploration and Preliminaries
Trip Segmentation Techniques
Validation
Conclusions

Intro: Relevance
1.24 million traffic-related fatalities occur annually
world wide
Currently the leading cause of death for people aged
between 15 and 29 years
Majority of cases due to improper or risky driving
behavior
Source: World Health Organisation (WHO)

Intro: Driving Process
Driving Process: driving
a car is a complex task
that requires to take
informed decisions
based on information
pertaining different
levels such as his own
state and other drivers’
behavior.

Intro: Relevant Information
Vehicle’s Status
Contextual Info
• Road State
• Weather
Conditions
• Traffic Info
• Road Risk
• Traffic

Problem Statement
Data-driven driver profiling
with respect to driving risk
Essentially: Multivariate Time Series Segmentation
Application scenarios in insurance, promoting
pay-how-you-drive (PHYD) business models

State of the Art and Challenges
State of the art: many works on identification and
recognition of behavioural patterns (line following,
accelerations, braking etc) and maneuvers
recognition, behavioural scoring, prediction of
driver intentions.
Supervised Learning techniques require intensive
end expensive gathering process.

Proposed Solution
Unsupervised techniques to profile drivers
behaviour based on identified recurrent patterns
on driving path segmentation
Comparison of 3 different approaches and use of
all of them for consolidated results
1. Unsupervised Segmentation Based on Clustering
2. Unsupervised Segmentation Based on HMM
3. Unsupervised Topic Extraction

Contextual Scenes
Observed driving behaviours that are
repeated in each driver's behaviour and
also across different drivers.
A reduced representation of the original
Multivariate Time Series conveying a
simplified characterization
Further reasoning is then applied

ETL Process
3 Steps:
Extract: read collected files and selection of candidate features
Transform:
Filter and Grouping
Features computation
Load: produce a unique dataset
PreProcessing
Transform
Global
dataset.csv
Load
Trip File.csv
Extract

Datasets
Collection Device :
Xsens MTi-G-710 (27 users)
And cell phones (10 users)
Retrieved Signals :
Acceleration measurements
Altitude
GPS Positioning
Speeding
Orientation
Mounted in-vehicle aligned with
direction of movement.
No Ground truth knowledge

Features Selected
Acceleration (on Y and X axes),
Speed (on Y and X axes)
Difference in yaw

Pre-Analysis 1: Data Exploration

Pre-Analysis 2: Application of Driving
Safety Existing Analyses
Vaiana et.al. Propose a Driving Safety Diagram based on longitudinal and
lateral accelerations analysis.
Aggressiveness Index formulation:
(A = Aggressive, S = Safe points)
Graphical representation:

1. Unsupervised Segmentation Based
on DP-Means Clustering
Problem: Bayesian nonparametric techniques require expensive sampling methods or
variational techniques.
DP-means: proposed by Kulis et. al. revisiting k-means: K-means like objective function +
penalty
A new cluster is created whenever a point is farther than λ away from every already existing centroid.
Note:
Clustering results depends on data ordering.

Distribution of features across
clusters

Unsupervised Segmentation based on
HMM
Goal: identify latent structure given observed data points,
assuming existance of Gaussian hidden states.
Assign to each observed point the corresponding hidden state.
Hidden Markov Models (HMM):
Observation and hidden states
Markovian properties
Continous observation

Unsupervised Segmentation based on
HMM
Training:
Baum-Welch EM algorithm to learn model parameters
Decoding:
Viterbi decoding to assign to each observed point the most
likely hidden state

HMM Results
Also a different variation applied: inertial HMM: lower transition
probabilities enforcing state persistence. Sensible for driving.

HMM Results
Clusters as hidden states.

Topic Extraction Approach
What is topic extraction ?
Model topical concepts belonging to a set of textual documents.
Data are described as documents and the components are distributions of
terms that reflect recurring patterns, name Topics.
Hierarchical Dirichlet Processes (HDPs)
soft-clustering technique based on non-parametric Bayesian theory.
number of topics is not set a priori, but learned from data.
Posteriori probability approximated by Variational Inference algorithm by
Wang et.al.
Results:
Most relevant topics for each document and terms distribution in each topic.

Topic Extraction Process
Data Quantization
Documents creation
Topics Extraction
Topics Evaluation

Quantization – Binning Process
with static binning strategy

Terms Relevance on Top 7 Topics
Linguist…

Terms Relevance on Top 7 Topics
… and data analyst perspectives
…

Big Issue: How to Compare?
1) Point-to-point or point distribution
2) Resulting grouping of trips
3) Perceived user similarity of trips

Solution 1: Point-to-Point
Overlap of clusters? Per trip? Overall?

Solution 2: Moving from Points to
Trips
Can we cluster trips based on how observation points have
been clustered?
à Simple K-means clustering of trips for each approach.
à Comparison of overlap of the different clusters
Coherent with original question: grouping of trips (and thus
drivers) by driving behavior

Result of overlap analysis
K-means with K=6 clusters.
DP-means vs. HMM: 74% overlap
DP-means vs. Topic: 44%
HMM vs. Topic: 48%

Human Validation of Trip Groups
Experts (knowledgeable about driving styles and driving
paths recorded) identify possible groups of trips in the
dataset
Problem:
- Unable to distinguish 6 categories of groups
- Only 3 categories are feasible
- Best matching 6à3 categories for each method

Conclusions
Three different clustering techniques of driving
behavior over trips
-> segmentation
Clustering of trips based on behavior
-> up to 74% overlap over 6 clusters
-> 100% overlap over 3 clusters
User Validation
-> 96% precision over 3 clusters

Future Work
About collection process:
Gathering process including contextual information (road
risk, traffic status, weather conditions)
Larger dataset to improve inference performance
About implemented methods:
Smarter data ordering for DP-means
Relax independency assumption in HMM
Improvements in data discretization process for HDP

Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it
http://datascience.deib.polimi.it
Thanks! Questions?

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches

Ähnlich wie Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches (20)

Mehr von Marco Brambilla

Mehr von Marco Brambilla (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches