SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Catherine Slesnick, Agero @AgeroNews
Scott Frye, Agero @scott_frye
Automobile Route Matching
with Dynamic Time Warping
Using PySpark
#DD3SAIS
2#DD3SAIS
Who Is Agero?
Agero
is a mission-driven
organization obsessed with making
driving safer. Its award-winning
services combine innovative
technologies with human-powered
solutions to safeguard the driving
experience, and ultimately
save lives.
3#DD3SAIS
Mobile
Telematics
Consumer
Affairs
Data &
Analytics
Lead
Generation
Vehicle
Breakdown
Assistance
Accident
Scene
Management
What Agero Offers
Who Is Agero?
4#DD3SAIS
1 in 3
licensed
drivers
Services cover
80M
consumers
10M+
annual events
7 / 10
top insurance
companies
1M+
accident
recoveries
per Year
Connecting
drivers to over
14,000
dealerships
Connecting
drivers to over
74,000 repair
shops
Operating
24 / 7 / 365
6
emergency
response
enabled
locations
Data Science at Agero
• 12 full-time people
• Processed more than 3 PB of
telematics data
• Python, Spark, AWS
• Span wide range of algorithms
5#DD3SAIS
We build algorithms to help
Agero make the roads a safer
place to drive
• Crash detection & prevention
• Bringing services to stranded drivers
MileUp™ − Agero’s Mobile Crash Detection and
Crash Prevention App
MileUp Crowdsources 100% Natural Driving
and Crash Data from Everyday Drivers
Official Launch at
#DD3SAIS 6
https://youtu.be/RyrFqd0jeKo
The Initial MileUp Went Viral and
Reached #10 Most Popular App
On December 14th 2016,
Agero posted:
on the Beer Money Reddit
thread. The initial post
received over 240 comments
and 180 up votes within the
first day.
1
10
100
1,000
10,000
Lifestyle App Ranking (iOS)
Top 10
in the Apple
Lifestyle category
#DD3SAIS
“Get Gift Cards for
Normal Driving”
7
MileUp Beta by the Numbers
300K+
active users
11K+
iOS
accidents
detected
450+severe
accidents
verified
200Mtrips captured
2 billion
miles driven
#DD3SAIS 8
© 2018 AGERO, INC. PROPRIETARY AND CONFIDENTIAL. A CROSS COUNTRY GROUP COMPANY.
MILEUP CAPTURED
~1 MILLION
TRIPS PER DAY
© 2018 AGERO, INC. PROPRIETARY AND CONFIDENTIAL. A CROSS COUNTRY GROUP COMPANY.
How MileUp Works
10#DD3SAIS
Process the
data using
machine learning
Ingest sensor
data from
smartphones
Detect
accidents in
real time
Detect driving
behavior
patterns
Analyzing Driving Patterns Is Data Intensive
11#DD3SAIS
Analyze Route Familiarity Driving Patterns
12#DD3SAIS
• People become familiar with routes they drive often
• Daily commute
• Home grocery store / school
• Route familiarity affects crash risk
• Most accidents occur on familiar roads
• Drivers more likely to be distracted
• Increased speeding
• More aggressive cornering
• Accidents on familiar roads tend to be less severe
Analyze Route Familiarity Driving Patterns
13#DD3SAIS
Comparing a user’s trips is challenging
• Comparing location data point-by-point is very expensive
• Velocities will differ between every trip
• We want to identify similar routes
in addition to same routes
We have A LOT of data for each user
Reduce Amount of Data to Process by Looking
at Trip Endpoints
14#DD3SAIS
Constraints:
• Process all of a user’s data together
• Work in Python / PySpark
RDDs
Python Function
User
Trip
Data
Endpoints Compare
Midpoints,
Distances
(a, b)
(A, B)
(A, C)
(A, E)
(D, F)
(G, H)
….
Matched Trip
Candidates
15#DD3SAIS
Worked sometimes, but not all of the
time à need more sophisticated
analysis to refine results
Processing results for
part 1 of analysis:
• Ran on 2 weeks of data
• ~50k driver sample
• 112 cores
à Took ~ 1hr
à Produced ~ 400k trip pairs
Reduce Amount of Data to Process by Looking
at Trip Endpoints
Use Dynamic Time Warping (DTW)1 to Refine Trip Pairs
DTW is an algorithm for measuring the similarity between 2 temporal
sequences that may vary in speed
Any distance (Euclidean,
Manhattan, …) which
aligns the i-th point on
one time series with the i-
th point on the other will
produce a poor similarity
score
1 The majority of the contents of slides 16-18 borrowed from presentations by:
• Tim Oates: Workshop slides from Boston Big Data Tech Con 2015
• Elena Tsiporkova: http://www.psb.ugent.be/cbd/papers/gentxwarper/DTWAlgorithm.ppt
i
i
time
#DD3SAIS 16
Use Dynamic Time Warping (DTW) to Refine Trip Pairs
DTW is an algorithm for measuring the similarity between 2 temporal
sequences that may vary in speed
i
i+2i
time
A non-linear (elastic)
alignment produces a
more intuitive similarity
measure, allowing similar
shapes to match even if
they are out of phase in
the time axis
#DD3SAIS 17
To find the best alignment between A and B
one needs to find the path through the grid
P = p1, … , ps , … , pk
ps = (is , js )
which minimizes the total distance between
them
P is called a warping function
DTW is expensive computationally!!
Use Dynamic Time Warping to Refine Trip Pairs
We used FastDTW
• A multilevel approach that recursively uses sampling
and space constraint to compute warping function
• Salvador and Philip Chan:
http://cs.fit.edu/~pkc/papers/tdm04.pdf
• Open-source Python implementation available:
https://github.com/slaypni/fastdtw
#DD3SAIS 18
Algorithm Parallelization/Optimization Needed
• ~400k trip pairs to check
• Average driver commute in U.S. is ~ 25 minutes
à 1,500 data points
• 1 DTW comparison took ~ 5.3 seconds
à 600 hours of compute time
19#DD3SAIS
• How I was thinking about the analysis
• How I had written POC in Python
First tried to add DTW onto previous function by driver
Algorithm Parallelization/Optimization Needed
20#DD3SAIS
Problem:
Because different users had very different numbers of trip pairs to check, the
cores were being used unevenly
First tried to add DTW onto previous function by driver
1. Turned candidate trip pair results into Spark data frame (DF)
2. Created new user-defined function (UDF) to perform DTW on each row of
the data frame
21#DD3SAIS
Algorithm Parallelization/Optimization Needed
Solution:
What We Learned
• Detecting a driver’s familiar routes is possible even with LOTS of data
• Transitioning Python algorithms into PySpark can require a shift in thinking
about how to structure the code (and some trial & error)
• When working in PySpark, you can use RDDs and DFs together to
parallelize different parts of the analysis
22#DD3SAIS
Processed ~ 600 million points in 5 hours on 112 cores
Questions?
23#DD3SAIS
Cathy Slesnick: cslesnick@agero.com, @AgeroNews
Scott Frye: sfrye@agero.com, @scott_frye

Weitere ähnliche Inhalte

Was ist angesagt?

2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
智啓 出川
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
DataStax
 

Was ist angesagt? (20)

SQL+NoSQL!? それならMySQL Clusterでしょ。
SQL+NoSQL!? それならMySQL Clusterでしょ。SQL+NoSQL!? それならMySQL Clusterでしょ。
SQL+NoSQL!? それならMySQL Clusterでしょ。
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
hive lab
hive labhive lab
hive lab
 
Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...
Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...
Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...
 
Neo4j Data Loading with Kettle
Neo4j Data Loading with KettleNeo4j Data Loading with Kettle
Neo4j Data Loading with Kettle
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Tiger graph 2021 corporate overview [read only]
Tiger graph 2021 corporate overview [read only]Tiger graph 2021 corporate overview [read only]
Tiger graph 2021 corporate overview [read only]
 
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engine
 
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
 
マイクロソフトが考えるAI活用のロードマップ
マイクロソフトが考えるAI活用のロードマップマイクロソフトが考えるAI活用のロードマップ
マイクロソフトが考えるAI活用のロードマップ
 
データ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラデータ爆発時代のネットワークインフラ
データ爆発時代のネットワークインフラ
 

Ähnlich wie Automobile Route Matching with Dynamic Time Warping Using PySpark with Catherine Slesnick and Scott Frye

Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
 Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML... Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
Databricks
 
PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup
Suman Karumuri
 
Collin Stocks 2016-09-06
Collin Stocks 2016-09-06Collin Stocks 2016-09-06
Collin Stocks 2016-09-06
Collin Stocks
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 

Ähnlich wie Automobile Route Matching with Dynamic Time Warping Using PySpark with Catherine Slesnick and Scott Frye (20)

Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Big Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at UberBig Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at Uber
 
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
 Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML... Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
 
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance Monitoring
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Collin Stocks 2016-09-06
Collin Stocks 2016-09-06Collin Stocks 2016-09-06
Collin Stocks 2016-09-06
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier Aguedes
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
 
Graph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopGraph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPop
 
Trafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at ScaleTrafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at Scale
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 

Mehr von Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Kürzlich hochgeladen (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Automobile Route Matching with Dynamic Time Warping Using PySpark with Catherine Slesnick and Scott Frye

  • 1. Catherine Slesnick, Agero @AgeroNews Scott Frye, Agero @scott_frye Automobile Route Matching with Dynamic Time Warping Using PySpark #DD3SAIS
  • 2. 2#DD3SAIS Who Is Agero? Agero is a mission-driven organization obsessed with making driving safer. Its award-winning services combine innovative technologies with human-powered solutions to safeguard the driving experience, and ultimately save lives.
  • 4. Who Is Agero? 4#DD3SAIS 1 in 3 licensed drivers Services cover 80M consumers 10M+ annual events 7 / 10 top insurance companies 1M+ accident recoveries per Year Connecting drivers to over 14,000 dealerships Connecting drivers to over 74,000 repair shops Operating 24 / 7 / 365 6 emergency response enabled locations
  • 5. Data Science at Agero • 12 full-time people • Processed more than 3 PB of telematics data • Python, Spark, AWS • Span wide range of algorithms 5#DD3SAIS We build algorithms to help Agero make the roads a safer place to drive • Crash detection & prevention • Bringing services to stranded drivers
  • 6. MileUp™ − Agero’s Mobile Crash Detection and Crash Prevention App MileUp Crowdsources 100% Natural Driving and Crash Data from Everyday Drivers Official Launch at #DD3SAIS 6 https://youtu.be/RyrFqd0jeKo
  • 7. The Initial MileUp Went Viral and Reached #10 Most Popular App On December 14th 2016, Agero posted: on the Beer Money Reddit thread. The initial post received over 240 comments and 180 up votes within the first day. 1 10 100 1,000 10,000 Lifestyle App Ranking (iOS) Top 10 in the Apple Lifestyle category #DD3SAIS “Get Gift Cards for Normal Driving” 7
  • 8. MileUp Beta by the Numbers 300K+ active users 11K+ iOS accidents detected 450+severe accidents verified 200Mtrips captured 2 billion miles driven #DD3SAIS 8
  • 9. © 2018 AGERO, INC. PROPRIETARY AND CONFIDENTIAL. A CROSS COUNTRY GROUP COMPANY. MILEUP CAPTURED ~1 MILLION TRIPS PER DAY © 2018 AGERO, INC. PROPRIETARY AND CONFIDENTIAL. A CROSS COUNTRY GROUP COMPANY.
  • 10. How MileUp Works 10#DD3SAIS Process the data using machine learning Ingest sensor data from smartphones Detect accidents in real time Detect driving behavior patterns
  • 11. Analyzing Driving Patterns Is Data Intensive 11#DD3SAIS
  • 12. Analyze Route Familiarity Driving Patterns 12#DD3SAIS • People become familiar with routes they drive often • Daily commute • Home grocery store / school • Route familiarity affects crash risk • Most accidents occur on familiar roads • Drivers more likely to be distracted • Increased speeding • More aggressive cornering • Accidents on familiar roads tend to be less severe
  • 13. Analyze Route Familiarity Driving Patterns 13#DD3SAIS Comparing a user’s trips is challenging • Comparing location data point-by-point is very expensive • Velocities will differ between every trip • We want to identify similar routes in addition to same routes We have A LOT of data for each user
  • 14. Reduce Amount of Data to Process by Looking at Trip Endpoints 14#DD3SAIS Constraints: • Process all of a user’s data together • Work in Python / PySpark RDDs Python Function User Trip Data Endpoints Compare Midpoints, Distances (a, b) (A, B) (A, C) (A, E) (D, F) (G, H) …. Matched Trip Candidates
  • 15. 15#DD3SAIS Worked sometimes, but not all of the time à need more sophisticated analysis to refine results Processing results for part 1 of analysis: • Ran on 2 weeks of data • ~50k driver sample • 112 cores à Took ~ 1hr à Produced ~ 400k trip pairs Reduce Amount of Data to Process by Looking at Trip Endpoints
  • 16. Use Dynamic Time Warping (DTW)1 to Refine Trip Pairs DTW is an algorithm for measuring the similarity between 2 temporal sequences that may vary in speed Any distance (Euclidean, Manhattan, …) which aligns the i-th point on one time series with the i- th point on the other will produce a poor similarity score 1 The majority of the contents of slides 16-18 borrowed from presentations by: • Tim Oates: Workshop slides from Boston Big Data Tech Con 2015 • Elena Tsiporkova: http://www.psb.ugent.be/cbd/papers/gentxwarper/DTWAlgorithm.ppt i i time #DD3SAIS 16
  • 17. Use Dynamic Time Warping (DTW) to Refine Trip Pairs DTW is an algorithm for measuring the similarity between 2 temporal sequences that may vary in speed i i+2i time A non-linear (elastic) alignment produces a more intuitive similarity measure, allowing similar shapes to match even if they are out of phase in the time axis #DD3SAIS 17
  • 18. To find the best alignment between A and B one needs to find the path through the grid P = p1, … , ps , … , pk ps = (is , js ) which minimizes the total distance between them P is called a warping function DTW is expensive computationally!! Use Dynamic Time Warping to Refine Trip Pairs We used FastDTW • A multilevel approach that recursively uses sampling and space constraint to compute warping function • Salvador and Philip Chan: http://cs.fit.edu/~pkc/papers/tdm04.pdf • Open-source Python implementation available: https://github.com/slaypni/fastdtw #DD3SAIS 18
  • 19. Algorithm Parallelization/Optimization Needed • ~400k trip pairs to check • Average driver commute in U.S. is ~ 25 minutes à 1,500 data points • 1 DTW comparison took ~ 5.3 seconds à 600 hours of compute time 19#DD3SAIS • How I was thinking about the analysis • How I had written POC in Python First tried to add DTW onto previous function by driver
  • 20. Algorithm Parallelization/Optimization Needed 20#DD3SAIS Problem: Because different users had very different numbers of trip pairs to check, the cores were being used unevenly First tried to add DTW onto previous function by driver
  • 21. 1. Turned candidate trip pair results into Spark data frame (DF) 2. Created new user-defined function (UDF) to perform DTW on each row of the data frame 21#DD3SAIS Algorithm Parallelization/Optimization Needed Solution:
  • 22. What We Learned • Detecting a driver’s familiar routes is possible even with LOTS of data • Transitioning Python algorithms into PySpark can require a shift in thinking about how to structure the code (and some trial & error) • When working in PySpark, you can use RDDs and DFs together to parallelize different parts of the analysis 22#DD3SAIS Processed ~ 600 million points in 5 hours on 112 cores
  • 23. Questions? 23#DD3SAIS Cathy Slesnick: cslesnick@agero.com, @AgeroNews Scott Frye: sfrye@agero.com, @scott_frye