Full paper: https://abstracts.aetransport.org/paper/download/id/5479
Public transport users are increasingly expecting better service and up to date information, in pursuit of a seamless journey experience. In order to meet these expectations, many transport operators are already offering free mobile apps to help customers better plan their journeys and access real-time travel information. Leveraging the spatio-temporal data that such apps can produce at scale (i.e. timestamped GPS traces), opens an opportunity to bridge the gap between passenger expectations and capabilities of the operators by providing a real-time 360-degree view of the transport network based on the ‘Apps as infrastructure’ paradigm. The first step towards fulfilling this vision is to understand which routes and services the passengers are travelling on at any given time.
Mapping a GPS trace onto a particular transport network is known as ‘network matching’. In this paper, the problem is formulated as a supervised sequence classification task, where sequences are made of geographic coordinates, time, and line and direction of travel as a label. We present and compare two data-driven approaches to this problem: (i) a heuristic algorithm, which looks for nearby stops and makes an estimation based on their timetables -- used as a baseline -- and (ii) a deep learning approach using a recurrent neural network (RNN). Since RNNs require considerable amounts of data to train a good model, and collecting and labelling this data from real users is a challenging task (e.g. asking too often can be overwhelming; privacy concerns on providing GPS location; not reliable labels due to mistakes or misuse), one of our contributions is a synthetic journey data generator. The datasets that we generated have been made as realistic as possible by querying real timetables and adding position and temporal noise to simulate variable GPS accuracy and vehicle delays, sampled from empirical distributions estimated using thousands of real location reports. To validate our approach we have used a separate dataset made of hundreds of real user journeys provided by a UK-based bus operator. Our experimental results are promising and our next step is to deploy a solution in a production environment. From the operator’s point of view, this will enable multiple smart applications like account based ticketing, identification of disruptions, real-time passenger counting, and network analysis. Passengers will also, therefore, benefit from a better service and an increase in the quality of information due to leveraging such big data processing.
Automatic Transport Network Matching using Deep Learning
1. Automatic Transport Network
Matching using Deep Learning
Manuel Martin Salvador, Marcin Budka, Tom Quay
European Transport Conference 2017
04/09/2017 - Barcelona
discoverpassenger.comwearebase.combournemouth.ac.uk
2.
3. Problems to solve
- Passenger counting (real-time and historical)
- Match passenger feedback to a particular vehicle
- Negative: bus is dirty, wifi is not working
- Positive: bus driver has been very friendly
- Customer profiling based on ticket usage and frequent routes
- Micro-targeting campaigns
- Pro-active notification of disruptions
4. Hardware infrastructure is expensive
- Counting sensors in each door
- On-board processing unit in each bus
- Antennas to send data
- Bluetooth beacons
Source: Infodev
5. Apps as infrastructure
- Mobile tickets are replacing paper tickets and smartcards
- Smartphones are powerful devices with many sensors
- Almost everybody owns a smartphone
Source: Synopsys
10. Data collection
- We asked app users to share their journeys and indicate in which line they
were travelling.
- We collected about 200 journeys and kept 164 journeys after manual
cleaning.
- Length of the journey varied from 6 to 56 minutes.
- 1 GPS point every minute.
11. Heuristic
1. Find nearest stops to points.
2. Get list of candidate lines based on
location.
3. Verify candidates based on
direction and time.
4. Return most likely line+direction.
12. Problems of heuristic
- Slow → not scalable
- Based only in timetabling information and bus stop positions
- In real life, GPS points might not be close to bus stops, and buses are delayed
- We need a model able to cope with uncertainty
13. Deep Learning approach
Build a classifier based on input data.
Sequence classification:
(lat1
, lon1
, time1
), …, (latn
, lonn
, timen
) → label
Classic machine learning approaches don’t work with sequential data of
different lengths. Let’s try with Recurrent Neural Networks!
15. Recurrent Neural Network (RNN)
xt
ht
RNN
cell
x0
h0
RNN
cell
x1
h1
RNN
cell
xn
hn
RNN
cell
...
Unfold
(latt
, lont
, timet
) (lat0
, lon0
, time0
) (lat1
, lon1
, time1
) (latn
, lonn
, timen
)
line & direction line & direction line & directionline & direction
16. Challenges
- Needs loads of compute
- Not enough real data with feedback
- Noise due to:
- Low GPS accuracy
- Bus delays
- Missing points
3XS Deep Learning G10
17. ● +15 million journeys
● Covering the whole operator network -- at all times!
● Start from 1 bus stop and track every minute
● Random GPS accuracy based on a real distribution
● Simulation of bus delays
Generating (lots of) data
19. Experimental setup
● Goal: maximise classification accuracy.
● Transport network made of about 140 buses serving 23 lines.
● Number of classes: 46 (23 lines x 2 directions).
● RNN is trained over 15 million sequences of synthetic journeys.
● Sequence length between 5 and 60 minutes.
● Google’s TensorFlow 1.3 on NVIDIA GeForce 1080 and Titan X GPUs.
● RNN cell type: GRU and LSTM.
● Number of layers: between 1 and 5.
● Cell size: 256, 512 and 768.
● Real test set: 164 journeys.
20. Prequential test accuracy on synthetic data
In top 2 predictions? In top 3 predictions?Right prediction?
Overlappings! 37% of stop to stop segments have 2 or more lines.
22. Conclusion and future work
Promising results:
● Best approach: 68% accuracy (RNN GRU 2 layers; cell size: 768; with embeddings).
● Up to 93% accuracy on best of 3 predictions.
Future work:
● Training data from real vehicle journeys instead of only timetables.
● Experiment with different sampling rates (currently 1 per minute).