Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage ``better" driving behaviour through immediate feedback
while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style.
In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour.
This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance
policies and car rentals.
Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different
behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined groundtruth on drivers classification. The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information. Comparison of three unsupervised approaches
1. Driving Style Analysis
based on Trip Segmentation.
A Comparative Multi-Technique Approach
Marco Brambilla, Andrea Mauri, Paolo Mascetti
@marcobrambi
3. Intro: Relevance
1.24 million traffic-related fatalities occur annually
world wide
Currently the leading cause of death for people aged
between 15 and 29 years
Majority of cases due to improper or risky driving
behavior
Source: World Health Organisation (WHO)
4. Intro: Driving Process
Driving Process: driving
a car is a complex task
that requires to take
informed decisions
based on information
pertaining different
levels such as his own
state and other drivers’
behavior.
6. Problem Statement
Data-driven driver profiling
with respect to driving risk
Essentially: Multivariate Time Series Segmentation
Application scenarios in insurance, promoting
pay-how-you-drive (PHYD) business models
7. State of the Art and Challenges
State of the art: many works on identification and
recognition of behavioural patterns (line following,
accelerations, braking etc) and maneuvers
recognition, behavioural scoring, prediction of
driver intentions.
Supervised Learning techniques require intensive
end expensive gathering process.
8. Proposed Solution
Unsupervised techniques to profile drivers
behaviour based on identified recurrent patterns
on driving path segmentation
Comparison of 3 different approaches and use of
all of them for consolidated results
1. Unsupervised Segmentation Based on Clustering
2. Unsupervised Segmentation Based on HMM
3. Unsupervised Topic Extraction
9. Contextual Scenes
Observed driving behaviours that are
repeated in each driver's behaviour and
also across different drivers.
A reduced representation of the original
Multivariate Time Series conveying a
simplified characterization
Further reasoning is then applied
10. ETL Process
3 Steps:
Extract: read collected files and selection of candidate features
Transform:
Filter and Grouping
Features computation
Load: produce a unique dataset
PreProcessing
Transform
Global
dataset.csv
Load
Trip File.csv
Extract
11. Datasets
Collection Device :
Xsens MTi-G-710 (27 users)
And cell phones (10 users)
Retrieved Signals :
Acceleration measurements
Altitude
GPS Positioning
Speeding
Orientation
Mounted in-vehicle aligned with
direction of movement.
No Ground truth knowledge
15. Pre-Analysis 2: Application of Driving
Safety Existing Analyses
Vaiana et.al. Propose a Driving Safety Diagram based on longitudinal and
lateral accelerations analysis.
Aggressiveness Index formulation:
(A = Aggressive, S = Safe points)
Graphical representation:
17. 1. Unsupervised Segmentation Based
on DP-Means Clustering
Problem: Bayesian nonparametric techniques require expensive sampling methods or
variational techniques.
DP-means: proposed by Kulis et. al. revisiting k-means: K-means like objective function +
penalty
A new cluster is created whenever a point is farther than λ away from every already existing centroid.
Note:
Clustering results depends on data ordering.
27. Unsupervised Segmentation based on
HMM
Goal: identify latent structure given observed data points,
assuming existance of Gaussian hidden states.
Assign to each observed point the corresponding hidden state.
Hidden Markov Models (HMM):
Observation and hidden states
Markovian properties
Continous observation
28. Unsupervised Segmentation based on
HMM
Training:
Baum-Welch EM algorithm to learn model parameters
Decoding:
Viterbi decoding to assign to each observed point the most
likely hidden state
29. HMM Results
Also a different variation applied: inertial HMM: lower transition
probabilities enforcing state persistence. Sensible for driving.
34. Topic Extraction Approach
What is topic extraction ?
Model topical concepts belonging to a set of textual documents.
Data are described as documents and the components are distributions of
terms that reflect recurring patterns, name Topics.
Hierarchical Dirichlet Processes (HDPs)
soft-clustering technique based on non-parametric Bayesian theory.
number of topics is not set a priori, but learned from data.
Posteriori probability approximated by Variational Inference algorithm by
Wang et.al.
Results:
Most relevant topics for each document and terms distribution in each topic.
45. Solution 2: Moving from Points to
Trips
Can we cluster trips based on how observation points have
been clustered?
à Simple K-means clustering of trips for each approach.
à Comparison of overlap of the different clusters
Coherent with original question: grouping of trips (and thus
drivers) by driving behavior
46. Result of overlap analysis
K-means with K=6 clusters.
DP-means vs. HMM: 74% overlap
DP-means vs. Topic: 44%
HMM vs. Topic: 48%
47. Human Validation of Trip Groups
Experts (knowledgeable about driving styles and driving
paths recorded) identify possible groups of trips in the
dataset
Problem:
- Unable to distinguish 6 categories of groups
- Only 3 categories are feasible
- Best matching 6à3 categories for each method
49. Conclusions
Three different clustering techniques of driving
behavior over trips
-> segmentation
Clustering of trips based on behavior
-> up to 74% overlap over 6 clusters
-> 100% overlap over 3 clusters
User Validation
-> 96% precision over 3 clusters
50. Future Work
About collection process:
Gathering process including contextual information (road
risk, traffic status, weather conditions)
Larger dataset to improve inference performance
About implemented methods:
Smarter data ordering for DP-means
Relax independency assumption in HMM
Improvements in data discretization process for HDP