SlideShare ist ein Scribd-Unternehmen logo
1 von 184
Downloaden Sie, um offline zu lesen
Advanced Routing in Spatial
Networks Using Big Data
Christian S. Jensen
www.cs.aau.dk/~csj
Roadmap
• Big data, motivation, setting, overview
• Modeling and computation of weights
 Spatial network lifting
 EcoMark
 Time-varying, uncertain weights
 Complete weight annotation using incomplete information
 Near-future travel cost inference
• Routing
 Stochastic skyline routing
 Routing based on local-driver behavior
• Closing
 Demos, the future, challenges, acknowledgments, readings
Big Data
Roadmap
• Big data
 Hype or substance?
 Instrumentation of reality and digitization
 The digital universe
 Moore’s Law generalized
 Big data challenges
Hype or Substance?
• We have been pushing the boundaries for decades
 How much data we can handle
 How fast
 Data integration
• Examples
 VLDB: International Conference on Very Large Database
 TODS: ACM Transactions on Database Systems
• So is it all hype?
 No
Instrumentation and Digitization
• Instrumentation of reality
 Notably, smartphones
• Digitization of processes
 E.g., e-commerce, public services, communications, social
interactions
The Vatican, 2005
The Vatican, 2013
2005 vs. 2013
Big Data
Every day, we create 2.5 quintillion bytes of data — so much
that 90% of the data in the world today has been created in
the last two years alone. This data comes from everywhere:
sensors used to gather climate information, posts to social
media sites, digital pictures and videos, purchase transaction
records, and cell phone GPS signals to name a few. This
data is big data.
http://www-01.ibm.com/software/data/bigdata/
The Digital Universe
• The digital universe
 Doubling every ~18-24 months
 Grew 60% in 2009, 50% in 2010
 2009: 0.8 zettabyte, 2010: 1.2 zettabyte, 2020: 35 zettabytes
 2009-20: growth by a factor of 44
http://www.emc.com/collateral/demos/microsites/idc-digital-universe/iview.htm
1 zettabyte = 1024 exabytes
1 exabyte = 1024 petabytes =
260 = 1,152,921,504,606,846,976 ≈ 1018 bytes
Moore’s Law – The Bicycle Analogy
• Moore’s Law: computers double in speed every 24
months.
 Applies also to quality-adjusted microprocessor prices, memory
capacity, disks, networks, sensors, and the number and size of
pixels in cameras.
• How fast would a bicyclist be if Moore’s Law applied?
 50 years of doubling every 24 months
 30 km/h originally
 ~1 billion km/h now
• Three lessons
 Growth rates in computing are dramatic and difficult to imagine.
 Hardware advances are important information technology drivers.
 Humans don’t really improve – they are the constants.
Big Data – Synthesis
• The result is new opportunity.
• Lots of data and unprecedented computing infrastructure
combine to offer potentials for value creation from data.
• To be competitive, society and businesses must be able to
create value from data.
• Data-based decisions and data-driven processes
 Decisions based on good data beat decisions based on feelings or
opinions.
• A finer granularity of services
• Entirely new services
Big Data – Data-Driven Society, Business
Christian S. Jensen
csj@cs.aau.dk
Big Data in Transportation:
Overview
Motivation – ITS
• A safer, greener, and more efficient and cost-effective
transportation infrastructure
• Greenhouse gas emissions reductions via eco-routing
• Congestion, greater Copenhagen region
 ~10 billion DKK/year (2004)
• Bad setting of signalized intersections in Denmark
 ~9,3 billion DKK/year (2012)
• The transportation sector is the second largest
greenhouse gas (GHG) emitting sector and also causes
substantial pollution.
 By 2025, it’s predicted that 6.2 billion commutes will be made
every day, worldwide.
• The reduction of greenhouse gas (GHG) emissions from
transportation is essential to combat global climate
change.
 EU: reduce GHG emissions by 30% by 2020.
 G8: a 50% GHG reduction by 2050.
 China: a 17% GHG reduction by 2015.
• Eco-routing can reduce vehicular impact by up to 20%.
• General context: Smart City
Motivation – Eco-Routing
System of Sensors Model
• The setting may be modeled as a system of (logical)
streams, one per edge.
 Data is emitted from the stream of an edge when a vehicle
traverses the edge
 Spatial
 Spatio-temporally correlated
 Sparse
• Real, unlike early, envisioned smart dust applications!
Methodology
• The same methodology underlies the studies covered.
• Define precisely a problem of (perceived) real-world
interest.
• Develop solutions
 Concepts, data structures, algorithms
• Carry out mathematical analyses
 Correctness, complexity, storage size
• Prototype the solutions and perform empirical studies
 Often, real data is needed
 Offers detailed insight in the design properties of the solutions
 If you cannot measure it, you cannot improve it
• Iterate!
Data Foundation
• A road network model
 E.g., OpenStreetMap
 Germany: 10 million edges (rough estimate)
• Aerial laser scan data
 For lifting, optional
• Tracking data from vehicles
 E.g., GPS data
• Fuel consumption data from vehicles
 Often called CAN bus data
GPS Data – DW Statistics
• Number of data sources (and NDAs): ca. 20
 Daily data from 4 sources; data in batches from 4 sources (2 small,
2 big); data from ca. 10 finished projects
• Daily data
 4 sources
 ~3 million rows
 3500 vehicles
• Number of fact table rows: More than 4 billion
• Number of vehicles: ca. 25,000
• Rows with fuel data: 500 million (estimate; ~140,000 per
day)
• Rows from EVs: 100-200 million
Software and Hardware
• Experimental infrastructure
• Software
 Have complete software stack for handling traffic data
 Map-matching, data cleansing, multiple map support
• Hardware
 Very modern server farm
 Newest machine has 2TB main memory
Basic Eco Routing – CPH to the Train
Good 17:08 8.06 km 0.80 l
Bad 18:06 13.64 km 1.23 l
Roadmap to Achieving Eco-Routing
EcoMark 3D Road Networks
1. Understand how to quantify environmental impact
2. Assign Eco-Weights to a Road Network
3. Novel Routing Algorithms
Understand Vehicular Environmental Impact
• EcoMark and EcoMark 2.0
 An evaluation framework that compares and analyzes 11 state-of-
the-art vehicular environmental impact models using GPS data
and a 3D road network.
 What do the models need to quantify vehicular environmental
impact?
 Speeds (instantaneous or average), and accelerations: from GPS data
 Road slopes: from 3D road network
 Model categorization
 Instantaneous models: appropriate for eco-driving
 Aggregated models: appropriate for eco-routing
 It is possible to assign eco-weights to a large extent using GPS
data, 3D maps, and appropriate models.
 An extended evaluation framework, EcoMark 2.0, is developed on
top of EcoMark using actual fuel consumption data that is obtained
from CAN bus data.
a1'
a11'
a1
a11
a5'
a5
Elevation matters – 3D Road Network
• Efficient methods that can build a 3D road network from
3D laser scan data and a 2D road network.
• Use 3D laser scan data to formulate a terrain that is
represented as a Triangulated Irregular Network (TIN).
• Project 2D polylines to the TIN in order to obtain 3D
polylines, and thus result in a 3D road network.
Roadmap to Achieving Eco-Routing
EcoMark 3D Road Networks
Simple vs. Advanced
Eco-Weights
Static vs. Dynamic
Eco-Weights
1. Understand how to quantify environmental impact
2. Assign Eco-Weights to a Road Network
3. Novel Routing Algorithms
Eco-Weights – Simple vs. Advanced
• Simple: time dependent, deterministic.
 Each interval has a deterministic weight.
 Ex: (0:00, 7:00]: 10 mg; (7:00, 9:00]: 18 mg; (9:00, 15:00]: 12 mg; …
• Advanced: time dependent, uncertain.
 Each interval has an uncertain weight, i.e., a random variable that
follows some distribution (e.g., a uniform/normal distribution)
 Ex: (0:00, 7:00]: 8 to 12 mg, N(9, 10); (7:00, 9:00]: 13 to 23 mg,
N(18, 10); (9:00, 15:00]: 8 to 12 mg, N(10, 20); …
Representation Coverage Temporal
Granularity
Accuracy
Simple
Weights
Intervals, each
with a single
value
High
(all edges)
Coarse
(e.g., pre-
defined PEAK)
Good (better than
speed limits)
Advanced
Weights
Intervals, each
with a histogram
or a probability
density function
Low
(only ―hot‖
edges)
Fine
(e.g., 15-min
intervals)
High level of
detail (capture
the travel cost
distributions)
Eco-Weights – Static vs. Dynamic
• Static Weights
 Obtained based on historical GPS data and remain static
 Capture typical traffic patterns
• Dynamic Weights
 Updated as the real-time GPS data streams in
 Capture dynamic and instant traffic patterns
• Our solutions
Static Dynamic
Simple Graph Weight
Annotation
Spatio-Temporal
Hidden Markov
Model
Advanced Time-Dependent
Histograms
Spatio-Temporal
Hidden Markov
Model
Assigning Eco-Weights
• Static, simple weights, all edges
 Challenge: infer weights for ―cold‖ edges – the edges that are not
covered by any GPS data
 Graph weight annotation
 Quantify edge similarities based on traffic flows that are derived from the
topology of a road network
 Propagate the weights on hot edges to similar cold edges
• Static, advanced weights, hot edges
 Capturing the weights of hot edges using time dependent histograms
 Challenge: compact representation of the histograms
• Dynamic, simple and advanced weights, hot edges
 Inferring near-future weights of hot edges as GPS data streams in
 Challenge: sparseness, dependency, and heterogenity
Time-Dependent Histograms
• We need to contend with three challenges.
• Multiple travel costs
 Consider not only GHG emissions, but also distance, travel time.
• Time-dependence
 Some travel costs are time-dependent.
 E.g., travel times on the same road may vary during peak vs. off-
peak periods.
• Uncertainty
 Some travel costs are uncertain.
 E.g., aggressive and moderate driving on the same road may
produce different GHG emissions.
Road Network Model: MTUG
• MTUG: Multi-cost, Time-dependent, Uncertain Graph
• N different costs are of interest.
 Distance (DI), travel time (TT), GHG emissions (GE).
• An MTUG is a graph G = (V, E, MM, W)
 V is the vertex set, and E is the edge set.
 MM = <MM(1), …, MM(N)>
 Function MM(i) maps an edge to the minimum and maximum i-th
cost of using the edge.
 MM(TT) (ei) = (150 seconds, 500 seconds)
 MM(GE) (ei) = (10 ml, 85 ml)
 W = <W(1), …, W(N)>
 Function W(i) maps an edge to a set of (interval, random variable)
pairs of the i-th cost type.
 W(TT) (ej) = {([0:00, 7:15), N(300, 120)), ([7:15, 8:45), N(450, 100)), …}
 W(GE) (ej) = {([0:00, 7:00), N(30, 100)), ([7:00, 9:00), N(50, 80)), …}
Instantiation of MM in an MTUG
• GPS records are map matched to edges.
• Each edge is associated with a set of edge records of the
form (e, t, C)
 An edge record indicates that a traversal on edge e at time t takes
costs C, where C is a vector of all costs of interest.
 (e1, 8:08, <55 seconds, 80 ml>)
 (e1, 9:08, < 45 seconds , 63 ml>)
 (e1, 9:10, < 43 seconds , 60 ml>)
 (e1, 10:03, < 45 seconds , 62 ml>)
• MM(TT) (e1) = (43 seconds, 55 seconds)
• MM(GE) (e1) = (60 ml, 80 ml)
Instantiation of W in an MTUG
• Partition a day into 96 15-min intervals.
• For each edge and interval, we obtain a multi-set containing
the costs on the edge during the interval.
 ms = {(10 s, 3), (20 s, 10), (25 s, 20), (30 s, 10), (40 s, 5), …}
• Estimate a random variable (RV) based on the multi-set.
• Combine adjacent intervals with similar RVs and estimate a
new RV. The long intervals along with the RVs instantiate W.
Route Costs in MTUG
• Given a route Ri = <r1, r2, …, rX>, where ri ∈ E is an edge.
• RC(Ri, t) indicates the costs of using route Ri at time t
 RC(Ri, t) = <RVDI, RVTT, RVGE> is a vector of RVs, where each RV
corresponds to a travel cost.
• RVDI is deterministic and equals the sum of the length of
each edge in route Ri.
• RVTT is the convolution of the travel time RV of each edge
in route Ri.
 The travel time RV of the first edge r1 is dependent on the trip start
time t.
 The travel time RV of the k-th edge rk is dependent on the travel
time of the previous k-1 edges, which are generally uncertain.
• RVGE is the convolution of the GHG emissions RV of each
edge in route Ri.
Roadmap to Achieving Eco-Routing
EcoMark 3D Road Networks
Simple vs. Advanced
Eco-Weights
Static vs. Dynamic
Eco-Weights
1. Understand how to quantify environmental impact
2. Assign Eco-Weights to a Road Network
3. Novel Routing Algorithms
Skyline
Eco-Routing
Continuous
Eco-Routing
Personalized
Eco-Routing
Novel Routing Algorithms
• Stochastic skyline route planning under time-varying
uncertainty.
 Identify pareto-optimal routes considering multiple travel costs,
e.g., travel times, GHG emissions, distances.
 R1 (15 min, 500 ml, 25 km)
 R2 (13 min, 520 ml, 26 km)
 R3 (16 min, 530 ml, 30 km)
• Continuous routing that supports real-time updates on
eco-weights (i.e., dynamic eco-weights).
 Real-time re-navigate vehicles according to recently updated
eco-weights and the current locations of vehicles.
• Context-aware, personalized routing
 Identify context-aware preferences for each driver (e.g., time-
efficient driving or fuel-efficient driving).
 Choose a single route that best satisfies the driver’s
preferences.
Deterministic Skyline Routes
• Route cost: cost(Ri) = <DI, TT, GE>
 A vector of deterministic values.
 Each value corresponds to a travel cost.
• Dominance relationship
 Ri dominates Rj iff all the costs of Ri are no greater than those of Rj,
and there is at lest one cost of Ri is smaller than that of Rj.
• Consider multiple routes for the same source-destination.
 R1: 3.5 km, 230 mg, 10 min;
 R2: 5.1 km, 250 mg, 11 min;
 R3: 5.1 km, 200 mg, 12 min;
• The skyline routes are the non-dominated routes.
 Since R2 is dominated by R1, R1 and R3 are the skyline routes.
Stochastic Dominance
• Route cost: RC(Ri) = <RVDI, RVTT, RVGE>
 A vector of random variables (RVs), where each RV represents the
distribution of a travel cost.
• Stochastic Dominance between two RVs
 Given two RVs X and Y, if cdfX(a) >= cdfY(a), for all possible value
a in R+, we say ―X stochastically dominates Y‖.
 Cost(R1).RVTT stochastically dominates Cost(R2). RVTT.
 No stochastic dominance between Cost(R3). RVTT and Cost(R4).
RVTT.
Stochastic Skyline Routes
• Dominance between two routes Ri and Rj
 If each RV of cost(Ri) stochastically dominates the corresponding
RV of cost(Rj), then Ri dominates Rj.
• Stochastic Skyline Routes
 Given a source-destination pair, and a trip starting time.
 Stochastic skyline routes are the routes that are not dominated by
any other routes.
• Brute force stochastic skyline route planning
 Enumerate all possible routes, compute the route costs, and check
whether one route dominates another.
 Very inefficient and works only for small road networks.
• Efficient stochastic skyline route planning
 Prune early some routes that cannot be skyline routes.
 Efficient stochastic dominance checking.
Example Result
• Skyline routes R1, R2, and R3, identified by our algorithm
• R1: 94,849 m; R2: 106,216 m; R3: 91,382 m;
 DI: R3 dominates R1 and R2.
 TT: R1 dominates R2 and R3
 GE: R2 dominates R1 and R3.
Personalized Routing
• Different drivers may take different routes
because they may have quite different
preferences.
• The same drivers may take different
routes in different contexts.
 Morning: try to save time to avoid being late.
 Weekend afternoon: try to save fuel
consumption.
• Challenges
 Identify contexts for drivers and identify driving
preference in each context.
 Deal with time-dependent uncertain travel
costs, e.g., travel time and fuel consumption,
while considering individual drivers’ driving
behaviors, e.g., aggressive vs. moderate
driving.
Drivers’
Trajectories
Framework
Context and Preference
Identification
Contexts &
Preferences
Driver,
Source,
Destination,
Time
RoutingModule
OptimalRoutes
Offline Phase Online Phase
Example Results
 Dark, bold routes: actual routes used by drivers.
 Red routes: shortest routes.
 Green routes: fastest routes.
 Blue routes: predicted routes using the identified contexts and
driving preferences.
Summary
• Eco-Routing is able to effectively reduce environmental
impact from transportation section.
• Eco-routing is achieved by
 Reliably and accurately quantifying vehicles’ environmental impact
using GPS data and 3D road networks;
 Assigning various types of eco-weights to edges;
 Developing novel routing algorithms that enable effective and
practical eco-routing service.
Building Accurate 3D
Spatial Networks
Outline
• Motivation
• Solution
 Phase A: Filtering
 Phase B: Lifting
• Experiments
• Summary
High-Res 3D Model Usage
Eco-Routing
•USD 6 Billion per annum in Fuel
savings with 3D model –
TomTom
•EcoMark: Benchmark shows
better CO2 emissions estimation.
The Idea
2D Road Network Aerial Laser Scan
3D Road Network
Classifying 3D Models
• Need for accurate 3D spatial networks by
applications.
• Ease of availability of rich terrain datasets.
3D Road Model
High-Res Model
•± 20 cm Accuracy
•Higher Storage Cost
Low-Res Model
•± 2 mts Accuracy
•Lower Storage Cost
Current 3D Map Generation Methods
• Van fitted with Differential GPS driven on
roads
Current 3D Map Generation Methods
• Crowd-sourcing from ―Personal
Navigation Devices‖ (PNDs).
Solution
GOAL: Design two methods to
generate High and Low resolution
3D maps.
Problem Definition
 Input
 A 2D spatial network, where an edge is modeled as a 2D polyline
 A laser scan point cloud containing a set of 3D points
 Output
 A 3D spatial network, where an edge is modeled as a 3D polyline
2D Spatial Network
+ 3D points
3D Spatial Network
3D Points
ε-Neighborhood
Framework
2D Spatial Network Laser Scan Point Cloud
One pass
Filtering
External Memory
based Filtering
Filtering
Delaunay Triangulation
or
Lifting
Interpolation
3D Spatial Network
A: Filtering
One-Pass Filter (OPF)
•3D laser scan points are ―Inherently sorted spatially‖!
•Data Guarantee: A 1m x 1m cell contains two 3-D points.
•Assumption: The road is much smaller and fits in main memory.
 Build a grid-index Hg from the road
segments, mapping road points to
cell IDs.
 Scan the massive 3D point cloud in a
single pass and build a 3D point to
cell mapping Hp ”seeded” by Hg
 If 3D points exceed a threshold cell
occupancy of α x δ x δ where α is
the user defined threshold in [0,1]
and δ is the grid cell width -> Lift cell
contents and release cell memory!
One-Pass Filter (OPF)
Advantages of OPF
Needs no pre-processing of massive 3D point cloud for sorting or
indexing.
Builds parts of the 3D road network as 3D points are streaming
through.
Faster runtimes, lower resolution. (± 2m accuracy)
External Memory based Filtering (EMF)
• No assumption about road network size.
Neither road points nor 3D points fit in
memory. Hence, stored on disk.
• Limited memory budget available to the
method.
 Moore Neighborhood: 3,5,8-cell
neighborhood. Ex: grey shaded
region around cell (1,1).
External Memory based Filtering (EMF)
Locality Preserving Space-Filling Curves
External Memory based Filtering (EMF)
 Read road point disk blocks Br from
disk into main memory.
 For each block Br - read 3D points
Moore neighborhood blocks from
disk into a limited LRU cache
(size given in multiples of the grid
width c).
 Continue this process in a space-
filling curve (Z-order or Row-Major)
fashion to avoid re-reading 3D point
blocks from disk.
Z-Order vs. Row-Major Disk Access
Z-Order vs. Row-Major Disk Access
 Why is Row-Major doing well at 3c?
LRU Cache
Block to process
Neighbor Block needed in memory
Blocks in cache
Z-Order vs. Row-Major Disk Access
 Why is Row-Major doing well at 3c?
LRU Cache (Filled up)
Block to process
Neighbor Block needed in memory
Blocks in cache
Z-Order vs. Row-Major Disk Access
 Why is Row-Major doing well at 3c?
 Answer: Disk Blocks are read into memory exactly once !
LRU Cache (Filled up)
Block to process
Neighbor Block needed in memory
Blocks in cache
Blocks evicted
External Memory based Filtering (EMF)
Advantages of EMF
Requires little main memory
Slower runtimes, higher resolution (± 20 cm accuracy)
Can work to ―Zoom-in‖ on OPF created maps to improve
resolution on mobile devices
B: Lifting
Triangulation
• Delaunay Triangulation
 After filtering, a set of 3D points is associated with a cell of road
segments.
 The set of 3D points is triangulated.
3D Points
2D Road Points
OPF vs. EMF
Interpolation
a1'
a11'
a1
a11
a5'
a5
 Project the TIN triangles
to the x-y plane.
 Compute the
intersections of triangles
and road 2D polyline.
 Re-project road points +
intersections back to TIN
surface again.
 Interpolate road points
on TIN to get elevations.
Experimental Study
Dataset Statistics
•Aalborg (AA) : 7km x 5km
•North Jutland (NJ) : 185km × 130km
•NJ approx. 120 times larger than AA.
Experimental Study (Accuracy)
OPF vs. EMF
Experimental Study (Scalability)
• North Jutland (NJ) covering a region of 185km × 130km
• EMF
 Accuracy ±20 cm in 21 hrs with memory use of 230 MB
• OPF
 Accuracy ±2 m in 2 hrs, which is an order of magnitude
faster than EMF, using 2.4 GB of main memory
Summary
• Augment a 2D road network with elevation values from 3D
point cloud
• Propose two methods that work with limited memory budgets
• Extensive experiments to offer insight into both methods
** 3D Spatial Network available from
UCI Machine Learning Repository
(http://archive.ics.uci.edu/ml/)
EcoMark:
Evaluating Models of Vehicular
Environmental Impact
Outline
• Motivation
• EcoMark overview
• Model analysis
• Experiments
• Context
• Summary
Motivation
• The reduction of greenhouse gas (GHG) emissions from
transportation is essential to combat global climate
change.
 EU: reduce GHG emissions by 30% by 2020.
 G8: a 50% GHG reduction by 2050.
 China: a 17% GHG reduction by 2015.
• Eco-driving and eco-routing can reduce GHG emissions.
 Eco-driving: driving behavior
 Eco-routing: routes that minimize fuel usage
• We need to be able to measure vehicular environmental
impact.
 Several impact models have been developed, but a
comprehensive comparison of these is missing.
 The utility of the models for eco-driving and eco-routing is not well
understood.
EcoMark
• EcoMark aims to enable a better understanding of how to
compute the environmental impact of vehicles.
• EcoMark is a benchmark that compares and analyzes 11
state-of-the-art vehicular environmental impact models.
 No existing such comprehensive benchmark.
• Key characteristics of EcoMark
 Uses GPS data from vehicles and a 3D spatial network
 Categorizes the 11 models into two major categories
 Compares and analyzes the models in the same categories, and
the models between the two categories
EcoMark Framework
GPS observations 2D Spatial Network Laser Scan Points
Input
Instantaneous Impact
6 Instantaneous Models 5 Aggregated Models
Aggregated Impact
Comparison and Analysis
Insights
EcoMark
Map Matching
GPS Trajectories
Spatial Network Lifting
3D Spatial Network
3D Spatial Network
• Spatial network lifting
 2D spatial network: OpenStreetMap
 Aerial laser scan of Denmark (1+ point per m2; 2.5 TB for
Denmark)
• A 3D spatial network is a directed, weighted graph
denoted as G = (V, E, F, H), where
 V and E are the vertex and edge sets.
 F maps vertices to 3D coordinates.
 H maps edges to grades.
Benchmark Experiments
• Exp1: Eco driving: Comparisons of instantaneous models
 Compare the behaviors of the 6 models.
 Identify models that behave similarly.
• Exp2: Eco routing: Comparisons of aggregated models
 Compare the behaviors of the 5 models.
 Identify models that behave similarly.
• Exp3: Cross-comparisons of instantaneous and
aggregated models
 Identity the similarity of the aggregation of instantaneous models
and aggregated models.
• Exp4: Study of models that support grade information
 How important is 3D – effects of using a 3D spatial network.
Insights
• Instantaneous models
 Five of six models behaved consistently.
 They can be used to measure the impact of driving behaviors.
 They can be used to classify good and bad driving behaviors.
• Aggregated models
 All the five models behaved consistently.
 Not suitable to identify impact at operational levels.
 They can be used to assign eco-weights to road segments, and
are helpful for eco-routing.
• Importance of 3D Spatial Networks
 Road grades affect all models substantially.
 Eco-driving and eco-routing both benefit.
Summary
• EcoMark
 Evaluates empirically 11 existing models of vehicular
environmental impact using a large trajectory database and a
lifted, 3D OpenStreetMap
 Findings: Instantaneous models support eco driving, while
aggregated models support eco-routing.
• Next steps
 Accuracy comparison with ground truth CAN bus data (EcoMark
2.0)
 Requires access to corresponding GPS and CAN bus data.
 Eco driving and eco routing
Time-Dependent Uncertain Eco-
Weights for Road Networks
Outline
• Motivation
• Problem definition
• Framework
• ERN building
• GHG emissions estimation
• Empirical study
• Summary
Motivation – Advanced Eco-Weights
• Eco-routing is an effective way to reduce vehicular
environmental impact.
• The capture of the environmental costs of traversing
road network edges is key to eco-routing.
 Eco-weights are uncertain.
 Eco-weights are time-dependent.
 Once obtained, existing routing algorithms can be
applied to compute eco-routes.
Problem Definition
• Goal: Obtain an Eco Road Network G = (V, E, F)
 V: Vertex set. Each vertex indicates a road intersection.
 E: Edge set. Each edge indicates a road segment.
 Function F assigns a time-dependent, uncertain eco-weight to
each edge in E.
• Input
 A set of map-matched trajectories TR.
 An accompanying road network G' = (V, E, Null).
• Output
 The Eco Road Network G = (V, E, F).
Approach
• Time-dependent uncertain histograms.
 A vector of (period, histogram) tuples <Ti, Hi>.
 Hi is the histogram describing the distribution of cost values
observed in period Ti.
• Time-dependent uncertain histograms are used to
represent the eco-weight of each road network edge.
• Several data compression techniques are applied to
reduce the storage space while retaining accuracy.
[10 a.m., 11 a.m.)[9 a.m., 10 a.m.)[8 a.m., 9 a.m.)
0
0,5
1
0
0,5
1
0
0,5
1
Framework
Trajectories TR Road Network
G = (V, E, NULL)
Traversal Record Analysis
Initial Histogram Construction
Histogram Merging
Bucket Reduction
Eco Road Network
G = (V, E, F)
Traversal Record Analysis
• GPS records are map-matched to the corresponding
edges.
• Map-matched records are transformed into traversal
records.
 A traversal record r = (e, t, tt, ge) indicates that edge e is traversed
by a trajectory trj starting at time t and has travel time tt and GHG
emissions ge.
 The VT-micro environmental impact model is used to estimate the
GHG emissions of each traversal record.
Initial Histogram Building
• Each edge is associated with a set of traversal records
• Divide the time space into intervals with equal width
 The default value is 1 hour, (24 intervals in total).
• For each edge e
 Build equi-width histograms for each time interval.
 The number of buckets per time interval is configurable.
 The histograms are isomorphic.
0.1
0.3
0.5
0.7
[0, 20) [20, 40) [40, 60) [60, 80]
Probability
GHG emissions (mL)
[8 a.m., 9 a.m.)
[9 a.m., 10 a.m.)
Histogram Merging
• For each edge, merge two temporally adjacent histograms
if they are sufficiently similar.
• Use cosine similarity to quantify similarity.
• We use a merge threshold Tmerge to decide when to stop
merging.
sim(Hi ,Hj ) =
V(Hi )ÄV(Hj )
V(Hi ) · V(Hj )
0.1
0.3
0.5
0.7
[0,20) [20,40) [40,60) [60,80]
Probability
GHG emissions (mL)
[8 a.m., 9 a.m.)
[9 a.m., 10 a.m.)
0.1
0.3
0.5
0.7
[0,20) [20,40) [40,60) [60,80]
Probability
GHG emissions (mL)
[8 a.m., 10 a.m.)
Histogram Bucket Reduction
• Further reduce the storage size of an individual histogram
by merging adjacent buckets.
 Use SSE to measure the merge cost (accuracy loss).
 Merge buckets when the cost does not exceed threshold Tred.
 Iteratively merge adjacent buckets in all the histograms of a road
segment.
0.1
0.3
0.5
0.7
[0,20) [20,40) [40,60) [60,80]
Probability
GHG emissions (mL)
[8 a.m., 10 a.m.)
0.1
0.3
0.5
0.7
[0,40) [40,60) [60,80]
Probability
GHG emissions (mL)
[8 a.m., 10 a.m.)
GHG Emissions Estimation
• For a route
 Estimate the distribution of GHG emissions as a histogram.
 Aggregate the histograms of the edges in the route.
• Given two histograms H1 and H2 for adjacent edges
 A histogram H′ is computed that represents the aggregated GHG
emissions distribution for traversing both edges.
21
'
HHH 
0.1
0.3
0.5
0.7
[0,20) [20,40)[40,60)
Probability
GHG emissions (mL)
e1
e2
0.1
0.3
0.5
0.7
[0,40) [40,80)[80,120)
Probability
GHG emissions (mL)
e1 + e2
Histogram Aggregation
• Given buckets bj in H1 and bk in H2
• A bucket bi in H’ is generated as
• Enumerate all bucket pairs of H1[j] and H2[k].
• Rearrange the buckets in H′ to obtain an equi-width
histogram.
H'
[i]. p= H1[ j]. p×å H2[k]. p
Range(H'
[i]) = Range(H1[ j])+ Range(H2[k])
Experiments
• Experimental settings
 Data Set: 200 million GPS records.
 Road network of Denmark: 414K vertices and 818K edges.
 GHG emissions: VT-micro is applied to evaluate vehicular
environmental impact.
• Evaluation
 Baseline: without any compression.
 State-of-the-art: T-drive.
 Accuracy, storage, and efficiency.
Storage Study
• Histogram Merging
• Bucket Reduction
MCRm =
Minit - Mmerge
Minit
0.6
0.7
0.8
0.9
1
0.9 0.92 0.94 0.96 0.98
MCR
Merge Threshold
MCRr =
Mmerge - Mredu
Mmerge 0.6
0.7
0.8
0.9
1
35 40 45 50 55 60
MCR
Bucket Limit
TMerge = 0.98
TMerge = 0.95
Accuracy Study
• Histogram Aggregation
• MCR vs. Err
0.8
0.9
1
2 6 10 14 18
HSimilarity
Number of Edges
0
0.1
0.2
0.3
0.4
35 40 45 50 55 60
Err
Bucket Limit
Tmerge = 0.98
Tmerge = 0.95
0.2
0.4
0.6
0.8
0.1 0.12 0.14 0.16 0.18
MCR
Err
ERN
T-drive
• Error Metric – Err
 The relative accumulative deviation of all the buckets in a
histograms from the original distribution.
0
0.1
0.2
0.9 0.92 0.94 0.96 0.98
Err
Merge Threshold
After Merging
Equi-width
• Histogram Merging
• Bucket Reduction
Summary
• We propose technique to estimate the time-dependent
environmental travel costs in a road network based on
histograms.
• An empirical study suggests that we can efficiently
compute eco-weights of road segments while retaining
accuracy.
• The proposed eco-weights can be used for eco routing
and real-time traffic monitoring.
Using Incomplete Information for
Complete Weight Annotation
of Road Networks
Overview
• Motivation
• Problem definition
• Cost modeling
• Objective functions
 Residual Sum of Squares
 Topology constraint
 Adjacency constraint
• Experiments
Motivation
• Why weight annotation?
 Vehicle routing relies on a weighted-graph representation of the
underlying road network.
 Given a graph with appropriate weights, different types of routes
can be computed.
• Why incomplete information?
 Weights that capture travel costs such as travel times can be
extracted from GPS trajectory data.
 GPS trajectory data typically does not cover all edges.
• Why complete weight annotation?
 To use a graph for routing, all edges must have weights.
Problem Definition
GPS Records
Pre-Processing Module
A set of (trip, cost) pairs
{(t(i), c(i))}
G''(V, E, null)
Weight Annotation Module
G(V, E, H)
Temporal Road Network Cost Modeling
 A road network is modeled as a directed graph.
 Temporal tags: during the same temporal tag, the traffic has similar
behavior, e.g., PEAK and OFFPEAK.
 Each edge has 2 cost values, which indicate the cost per unit
length for traversing the corresponding road segment during PEAK
and OFFPEAK, respectively.
 d(AB, PEAK) indicates the cost per unit length for traversing the road
segment AB during PEAK tag.
 d: a vector of size |E| * |TAGS| (e.g., 6 * 2 =12)
A
B C
D
d(AB, PEAK)
d(AB, OFFPEAK)
d(BC, PEAK)
d(BC, OFFPEAK)
d(BA, PEAK)
d(BA, OFFPEAK)
d(CB, PEAK)
d(CB, OFFPEAK)
d(BD, PEAK)
d(BD, OFFPEAK)
E
d(CE, PEAK)
d(CE, OFFPEAK)
Trip Cost Modeling
• GPS trajectory
 A sequence of records of the form (x,y,t).
• Trip
 A sequence of records of the form (edge, ts, te).
• The cost of a trip is the sum of the cost of each record.
 E.g., t1= <AB, 7:55, 8:00>, <BC, 8:00, 8:10>
 cost_model(t1) = d(AB, PEAK) * length(AB) + d(BC, OFFPEAK) *
length(BC)
• If we know all cost variables in d, we can compute the
estimated cost of any trip.
• For every trip t in T’
 The square of difference between the actual cost of t and the
estimated cost using the proposed model, i.e., cost_model(t).
 E.g., t1= <AB, 7:55, 8:00>, <BC, 8:00, 8:10>, actual_cost(t1) =
100.
 RSS(t1) = (100 – cost_model(t1))2
 Problem: Training data T’ may not cover the whole road network.
By minimizing the RSS function, we can only get cost variables for
the road segments traversed by T’.
d(BC, OFFPEAK)
Objective Functions: Residual Sum of Squares
A
B C
D
d(AB, OFFPEAK)
d(BC, PEAK)
d(BA, PEAK)
d(BA, OFFPEAK)
d(CB, PEAK)
d(CB, OFFPEAK)
d(BD, PEAK)
d(BD, OFFPEAK)
E
d(CE, PEAK)
d(CE, OFFPEAK)
d(AB, PEAK)
Objective Function: Topological Constraint
• If two road segments locate in similar topology in a road
network, they tend to have similar cost during the same
tag.
• Given a pair of road segments ei and ej
 The weighted square loss between the cost variables of ei and ej.
 weight_TC(ei, ej) * (d(ei, PEAK) – d(ej, PEAK))2
 If weight_TC (ei, ej) is big, d(ei, tag) and d(ej, tag) tend to be
similar in order to make (d(ei, PEAK) – d(ej, PEAK))2
small.
A
B C
D
d(AB, PEAK)
d(AB, OFFPEAK)
d(BC, PEAK)
d(BC, OFFPEAK)
d(BA, PEAK)
d(BA, OFFPEAK)
d(CB, PEAK)
d(CB, OFFPEAK)
d(BD, PEAK)
d(BD, OFFPEAK)
Ed(CE, OFFPEAK)
d(CE, PEAK)
Weighted PageRank Computation
• PageRank gives a value to each vertex in a graph.
• Primal graph -> dual graph
 Vertex -> Edge
 Edge -> Vertex
A
B C
D
E
BA CB
BD
AB BC
CE
Weighted PageRank Computation
• Original PageRank
 A vertex evenly gives its PR value to all its adjacent vertices.
 Vertex AB gives ½ * PR(AB) to BC and BD, respectively.
• Weighted PageRank
 Weight(AB, BD) =
𝑇𝑟𝑖𝑝𝑁𝑢𝑚 𝐴𝐵,𝐵𝐷 +1
𝑇𝑟𝑖𝑝𝑁𝑢𝑚 𝐴𝐵,𝐵𝐷 +𝑇𝑟𝑖𝑝𝑁𝑢𝑚 𝐴𝐵,𝐵𝐶 +|𝑑𝑒𝑔𝑟𝑒𝑒(𝐴𝐵)|
 100 trips from AB to BC, 50 trips from AB to BD
 weight(AB, BD) =
50+1
50+100+2
=
51
152
 weight(AB, BC) =
100+1
50+100+2
=
101
152
 No trips information
 weight(AB, BD) = weight(AB, BC) =
1
2
BA CB
BD
AB BC
CE
Weighted PageRank based Topological Constraint
• Use weighted PageRank value to quantify the topological
similarity.
 If two edges have similar weighted PageRank values, they also
tend to have similar cost.
 weight_TC(ei, ej) =
m𝑖𝑛⁡( 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘 𝑒𝑖 ,⁡𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘(𝑒𝑗))
max⁡( 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘 𝑒𝑖 ,⁡𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘(𝑒𝑗))
Objective Function: Adjacency Constraint
 If two road segments are directionally adjacent in a road network,
they tend to have similar cost during the same tag.
 d(AB, PEAK) may be similar to d(BC, PEAK)
 But not necessary to be similar to d(CB, PEAK) or d(BC, OFFPEAK)
 Given a pair of directionally adjacent road segments ei and ej
 The weighted square loss between the cost variables of ei and ej.
 weight (ei, ej) * (d(ei, PEAK) – d(ej, PEAK))
2
A
B C
D
d(AB, PEAK)
d(AB, OFFPEAK)
d(BC, PEAK)
d(BC, OFFPEAK)
d(BA, PEAK)
d(BA, OFFPEAK)
d(CB, PEAK)
d(CB, OFFPEAK)
d(BD, OFFPEAK)
E
d(CE, PEAK)
d(CE, OFFPEAK)
d(BD, PEAK)
Objective Functions
• Residual Sum of Squares
 RSS(d) = (actual_cost(t) – cost_model(t))2
t⁡∈T′
• Topological Constraint
 PRTC(d) = 𝑤𝑒𝑖𝑔𝑕𝑡_𝑇𝐶 𝑒𝑖, 𝑒𝑗 ∗⁡(𝑑 𝑒𝑖, 𝑡𝑎𝑔 − 𝑑(𝑒𝑗, 𝑡𝑎𝑔))2
𝑖,𝑗⁡
• Adjacency Constraint
 ACTC(d) = 𝑤𝑒𝑖𝑔𝑕𝑡 𝑒𝑖, 𝑒𝑗 ∗⁡(𝑑 𝑒𝑖, 𝑡𝑎𝑔 − 𝑑(𝑒𝑗, 𝑡𝑎𝑔))2
𝑖,𝑗⁡
• Overall objective function
 The sum of the above three (plus a regularizer term)
• Differentiating the objective function w.r.t. vector d and
setting it to 0 yields an equation that can be solved using
standard techniques to obtain a solution for d.
Experimental Results
• Two weeks GPS trajectories from SPF
• 18K trips in total, and 20% as training, 80% as testing
• Two road networks: Skagen and North Jutland
• Objective functions used
 F1: RSS; F2: RSS + PRTC; F3: RSS + ACTC
 F4: RSS + PRTC + ACTC
Travel Cost Inference from Sparse,
Spatio-Temporally Correlated Time
Series Using Markov Models
Outline
• Motivation
• Travel-cost time series
• Modeling travel-cost time series in a road network
• Learning a spatio-temporal hidden Markov model
• Empirical study
• Summary
• Vehicle routing plays an important role in transportation.
 E.g., fastest routes, eco-routes.
• Effective vehicle routing relies on a weighted graph
representation of a road network.
 The weight of an edge should accurately capture the travel cost of
traversing the edge.
 E.g., travel times, greenhouse gas (GHG) emissions.
• Edge weights should be dynamically changing due to the
dynamics of traffic.
To support dynamic edge weights, we propose
techniques that are able to infer near-future travel costs
based on incoming GPS data.
Motivation
Travel-Cost Time Series
• The travel costs on edge ei are modeled as a
time series TSi = <Ci
(1), Ci
(2), …, Ci
(T)>.
• Travel cost set Ci
(j) contains travel costs
observed in the j-th time interval on edge ei.
 An interval is the finest time interval of interest.
 E.g., 15 minutes, yielding 96 intervals per day.
 A travel cost is the travel time or GHG emissions of a
traversal of edge ei during the j-th interval.
 E.g., during the j-th interval [8:00, 8:15), if 3 vehicles
traversed ei taking 30, 32, 40 s, Ci
(j) = {30, 32, 40}.
e1
e2
e3
TS1
TS2
TS3
Time Series Characteristics
• Dependent
 Travel costs on adjacent edges may be dependent.
• Heterogeneous
 The traffic on different edges is often different.
 Some edges have clear peak/off-peak periods while others do not.
• Sparse
 Some intervals do not have any travel cost observations.
Modeling Travel-Cost Time Series
• We use a Spatio-Temporal Hidden Markov Model
(STHMM) to model the interactions among multiple travel-
cost time series in a road network.
State transition
on edge e1
s1
t-1=
cong
s1
t=
cong
s1
t+1
= clr
C1
t-1 C1
t C1
t+1Time Series:
TS1
TP TP
OP OP OP
An edge ei has a set of hidden states Si, where each
state indicates a particular traffic status on the edge.
S1={cong, clr}
During an interval, the traffic of an edge is in a state.
Intervals: t-1 (previous) t (current) t+1 (next)
The traffic on an edge evolves over time, transiting
from one state to another according to transition
probabilities (TP).
In an interval, the travel costs observed on an edge
is dependent on the state of the edge according to
output probabilities (OP).
Modeling Multiple Time Series using STHMM
e1
e2
e3
e1
e2
e3
s1
t-1=
cong
s1
t =
cong
s1
t+1=
clr
s2
t-1=
clr
s2
t =
clr
s2
t+1=
cong
s3
t-1=
cong
s3
t =
clr
s3
t+1=
clr
Intervals: t-1 (previous) t (current) t+1 (next)
TP: P(S1
t|S1
t-1, S2
t-1)
TP: P(S2
t|S1
t-1, S2
t-1, S3
t-1)
TP: P(S3
t|S2
t-1, S3
t-1)
Capture the traffic dependency among edges.
Travel-Cost Time
Series Derived from
Historical GPS Data
Framework Overview
State Formulation &
Parameter Learning
Spatio-Temporal
Hidden Markov Model
Routing Algorithms
Near Future Edge
Weights
Real-Time
GPS Data
Eco-routes &
Fastest-routes
State Formulation – Intuitions
• Since edges are heterogeneous, we need to identify a
unique state set for each edge.
 Edges with heavy and complicated traffic: more states
 Edges with light and simple traffic: fewer states
• Target properties of the state set of an edge
 The traffic in the same state behaves similarly.
 The current state depends on its previous state.
State Formulation – Methods
• For each edge, identify a Gaussian Mixture Model (GMM)
with K Gaussian components to capture the distribution of
all the travel costs on the edge.
 Due to heterogeneity, different edges may have different Ks.
• Transform the travel costs in each interval in the edge’s
time series into a K+1 dimensional point.
 The first K dimensions correspond to the travel cost distribution in
the interval, still using the K Gaussian components.
 The K+1st dimension captures the difference between the travel
cost distributions in the interval and its previous interval.
• Cluster these K+1 dimensional points into several clusters,
and each cluster is a state.
Parameter Learning
• Output probabilities
 Since each state is a cluster, we use the cluster center to derive a
GMM as the output probability for the state.
• Transition probabilities
 For each edge, we need the edge’s travel-cost time series, along
with its adjacent edges’ travel-cost time series.
 Based on the obtained output probabilities, transition probabilities
are then estimated using maximum likelihood estimation.
Travel Cost Inference based on an STHMM
e1
e2
e3
Est.
s1
t
Est.
s2
t
Est.
s2
t+1
Est.
s3
t
Intervals: t (current) t+1 (next)
Real Time
GPS data
On e3
Real Time
GPS data
On e2
Real Time
GPS data
On e1
OP +
Bayes
OP +
Bayes
OP +
Bayes
OP
Est.
Travel
Costs
TP
TP
TP
Empirical Settings
• Data
 181 million GPS records from North Jutland, Denmark from April
2007 to March 2008.
 The data is split into training and testing sets.
• Periods of interest
 Weekdays during [6:00, 20:00).
• Road network
 1,916 edges in North Jutland with more than 500 records.
• Travel costs
 Travel time and GHG emissions.
• Accuracy measurements
 Average sum of squared loss (ASSL) between estimated costs and
real costs.
Experimental Study
• Aspects studied
 Effect of the length of interval.
 Effect of the sizes of training data
 Effect of recent data
 Effect of seasonality
• Results
 Efficient; able to support real-time weight updates
 Accurate; outperforms a baseline and the state-of-the-art
Summary
• Described an STHMM that is able to model multiple
correlated, sparse time series.
 Dependent: STHMM couples the states from adjacent edges.
 Heterogeneous: State formulation identifies a unique state set for
each edge.
 Sparse: Smoothing techniques are used to contend with sparse
data.
• The described framework is able to support real-time
weight updates and thus provide more accurate routing
services, e.g., eco-routing.
Stochastic Skyline Route Planning
Under Time-Varying Uncertainty
Outline
• Motivation and challenges
• Road network modeling
• Stochastic skyline route planning
• Empirical studies
• Summary
• Reduction in greenhouse gas (GHG) emissions is
important to the whole society.
 EU: reduce GHG emissions by 30% by 2020.
 G8: a 50% GHG reduction by 2050.
 China: a 17% GHG reduction by 2015.
• Eco-routing is a simple yet effective way to achieve the
reduction from transportation section.
 Of interest to both fleet owners and individual drivers.
 For example, FlexDanmark is interested in using most eco-friendly
routes while considering travel times and distances.
• We aim to provide useful eco-routing services.
Motivation
Challenges
• To provide practically useful eco-routing, we need to
contend with three challenges.
• Multiple travel costs
 Consider not only GHG emissions, but also distances and travel
times.
• Time dependence
 Some travel costs are time-dependent.
 For example, travel times on the same road may vary during peak
vs. off-peak periods.
• Uncertainty
 Some travel costs are uncertain.
 For example, aggressive and moderate driving on the same road
may produce different GHG emissions.
A novel road network model - MTUG
• MTUG is a Multi-cost, Time-dependent, Uncertain Graph.
• Assume N different costs are of interest in total.
 Distance (DI), travel time (TT), GHG emissions (GE).
• G=(V, E, MM, W)
 V is the vertex set, and E is the edge set.
 MM = <MM(1), … , MM(N)>
 Function MM(i) maps an edge to the minimum and maximum i-th
cost of using the edge.
 MM(TT) (ea) = (150 seconds, 500 seconds)
 MM(GE) (eb) = (10 ml, 85 ml)
 W = <W(1), … , W(N)>
 Function W(i) maps an edge to a set of (interval, random variable)
pairs of the i-th cost type.
 W(TT) (ea) = {([0:00, 7:15), N(300, 120)), ([7:15, 8:45), N(450, 100)), …}
 W(GE) (eb) = {([0:00, 7:00), N(30, 100)), ([7:00, 9:00), N(50, 80)), …}
Instantiation of MM in an MTUG
• MM and W are instantiated using GPS records.
• GPS records are map matched to edges.
• Each edge is associated with a set of edge records in the
form of (e, t, C).
 An edge record indicates that a traversal on edge e at time t takes
costs C, where C is a vector of all costs of interest.
 (e1, 8:08, <55 seconds, 80 ml>)
 (e1, 9:18, < 45 seconds , 63 ml>)
 (e1, 10:10, < 43 seconds , 60 ml>)
 (e1, 21:03, < 45 seconds , 62 ml>)
• Based on the edge records on an edge, functions MM on
the edge can be instantiated.
 MM(TT) (e1) = (43 seconds, 55 seconds)
 MM(GE) (e1) = (60 ml, 80 ml)
Instantiation of W in an MTUG
• Partition a day into 96 15-min intervals.
• For each (edge, interval) pair, we obtain a multi-set containing
the costs on the edge during the interval.
 ms={(10 s, 3), (20 s, 10), (25 s, 20), (30 s, 10), (40 s, 5), …}
• Estimate a random variable (RV) based on the multi-set.
 Use a Gaussian Mixture Model (GMM) to represent an RV.
 GMMs can approximate arbitrary distributions.
 A GMM is a weighted sum of K Gaussian distributions.
 GMM(x) = 𝑚 𝑘 ⁡∙ 𝑁(𝑥|µ 𝑘, δ 𝑘
2
)𝐾
𝑘=1
Instantiation of W in an MTUG (cont.)
• If two RVs in two adjacent intervals are similar, we combine
the two intervals into a long interval.
 Use KL-divergence to measure the similarity between two RVs.
• Re-estimate a new RV for the long interval using the costs in
the long interval.
• The whole procedure works iteratively until no RVs from
consecutive intervals are similar enough to be combined.
• The long intervals along with their RVs instantiate W.
Route Costs in MTUG
• Given a route Ri=<r1, r2, …, rX>, where ri ∈ E is an edge.
• RC(Ri, t) indicates the costs of using route Ri at time t
 RC(Ri, t) = <RVDI, RVTT, RVGE> is a vector of RVs, and each RV
corresponds to a travel cost.
• RVDI is a deterministic value, which equals to the sum of
the length of each edge in route Ri.
• RVTT is the convolution of the corresponding travel time
RV of each edge in route Ri.
 Deciding the travel time RV of the first edge r1 is dependent on the
trip start time t.
 Deciding the travel time RV of the k-th edge rk is dependent on the
travel time of the previous k-1 edges, which may be uncertain.
• RVGE is the convolution of the corresponding GHG
emission RV of each edge in route Ri.
Stochastic Dominance
• Stochastic Dominance between two RVs
 Given two RVs X and Y, if cdfX(a) >= cdfY(a), for all possible value
a in R+, we say ―X stochastically dominates Y‖.
 RC(R1, t).RVTT stochastically dominates RC(R2, t). RVTT.
 No stochastic dominance between RC(R3, t). RVTT and RC(R4, t).
RVTT.
Stochastic Skyline Routes
• Route Ri dominates route Rj iff
 No RV of cost(Rj) stochastically dominates the corresponding RV
of RC(Ri).
 At least one RV of RC(Ri, t) stochastically dominates the
corresponding RV of RC(Rj, t).
• Stochastic skyline route
 Given a source-destination pair, and a trip starting time.
 Stochastic skyline routes are the routes that are not dominated by
any other routes.
Trajectories
Framework Overview
Pre-Processing
Road Network
MTUG
Source,
Destination,
Time
RoutingModule
StochasticSkylineRoutes
Cost Records
Offline Phase Online Phase
Instantiating MM and W
Stochastic Skyline Route Planning
• A brute force method
 Enumerate all possible routes, compute the route costs, and check
whether one route dominates another
 Very inefficient, and only works for small road networks
• An efficient method
 Prune some routes that cannot become skyline routes early.
 Efficient stochastic dominance checking
Early Pruning Strategy
• Do the following for all travel cost types of interest.
 We use travel time as an example.
 We maintain a graph where each edge is associated with the
minimum travel time, which is recorded in MM.
 From the destination, run Dijkstra’s algorithm on the graph.
 As each vertex is associated with the minimum travel time, we get the
―fastest‖ possible route from the source to the destination.
 Compute the route cost of the ―fastest‖ possible route and add the
route as a candidate Skyline route.
vs vd
Each vertex vx has the
shortest distance
least travel time
least GHG emissions
to the destination vd
v1
v2
v3
Candidate skyline route 1
Candidate skyline route 2
Candidate skyline route 3
Early Pruning Strategy (cont.)
• Explore routes from source, until no more routes can be
explored.
 Estimate the least possible travel costs for a partially explored
route R’.
 If the partially explored route with its estimated least possible costs
is dominated by an existing candidate skyline route, there is no
need to explore the route any further.
 Otherwise, continue exploring.
 Update candidate skyline route if necessary.
vs vd
v1
v2
v3
Candidate skyline route 1
Candidate skyline route 2
Candidate skyline route 3R’
Candidate skyline route 4
Stochastic Dominance Checking
• Naïve way: check according to the definition of stochastic
dominance.
 For each value a, check whether cdfX(a) >= cdfY(a).
• An efficient way
 Consider one cost type at a time.
 Compute the minimum and maximum possible travel costs of a
route.
• Distinguish among three cases based on the min and max
travel costs of two routes.
 Disjoint case: dominance.
Stochastic Dominance Checking (cont.)
• Covered case: non-dominance.
• Overlapping case (needs further checking)
 Both dominance and none-dominance may occur.
Empirical Settings
• GPS Data
 More than 180 million GPS records collected from Denmark in
2007 and 2008.
 1 Hz sampling rate: one GPS record per second.
• Road networks (from OpenStreetMap)
 Aalborg (AA): 4,981 vertices and 12,614 edges.
 North Jutland (NJ): 68,318 vertices and 163,546 edges.
 Jutland (JU): 667,876 vertices and 1,622,974 edges.
 AA is the largest city in NJ, and NJ is part of JU.
• Travel costs
 Distances (DI): obtained from OpenStreetMap.
 Travel time (TT): computed from the timestamps in the GPS
records.
 GHG emissions (GE): computed from the instantaneous speeds
and accelerations from the GPS records.
MTUG Generation
• Hot edges: edges covered by the GPS data set.
• Instantiate MM and W based on the GPS data.
• Cardinalities of TT and GE intervals.
 Largest cardinalities: 65 for TT and 72 for GE.
 Average cardinalities:
• Cold edges: edges not covered by the GPS data set.
 Derive corresponding travel costs using speed limits.
The traffic in AA is much more
dynamic that that in other parts.
Stochastic Skyline Route Queries
• Query settings:
• 50 random source-destination pairs with a starting time in
each setting.
• Study the cardinality of the stochastic skyline routes and
the run time.
Cardinalities of Skyline Routes
• As the number of travel cost types considered increases,
the cardinality of the skyline routes also increases.
• When distance between source and destination increases,
the cardinality of the skyline routes also increases.
Run Time
• The run-time for a query is below 5 seconds, and most
queries are computed in less than 2 seconds.
• Advanced dominance checking method is effective and
efficient.
 Benefits of using the advanced dominance checking is more
substantial on JU since long routes require more dominance
checks.
Summary
• Described a framework that enables stochastic skyline
route planning in road networks with multiple, time-
dependent, and uncertain travel costs.
• Enables eco-routing in a realistic setting.
Vehicle Routing with
User-Generated Trajectory Data
Motivating Example
Fastest route: Route #1
Shopping
Mall
Food
Store
50
70
S
E
B
F
GA
I
H
J
K D
Route #1
Motivating Example
Shortest route: Route #2
Shopping
Mall
Food
Store
50
70
S
E
B
F
GA
I
H
J
K D
Route#2
Motivating Example
Route #3 avoids parking lots
Shopping
Mall
Food
Store
50
70
S
E
B
F
GA
I
H
J
K D
Route#3
Motivating Example
Route #4 is attractive when stores are closed
Shopping
Mall
Food
Store
50
70
S
E
B
F
GA
I
H
J
K D
Route #4
Motivating Example
Which route from source S to destination D is preferable?
Shopping
Mall
Food
Store
50
70
S
E
B
F
GA
I
H
J
K D
Route #1
Route#3
Route#2
Route #4
The fastest – takes the least travel time?
The shortest – smallest distance?
The most ―direct‖– fewest turns?
The one most constrained to main roads –
better road quality?
The ―safest,‖ avoiding bad neighborhoods?
…
The one most preferred by locals?
Introduction
• Local travel
 Knowledge of the surroundings
 Follow familiar routes
• Travel in unfamiliar surroundings to unknown destinations
 Depend on available routing services
 Expect that the provided route is the best
• Idea: Use GPS data to let those who travel in unfamiliar
surroundings benefit from the insights of local travelers
Motivation
• Can’t we just use a standard routing service?
• How often do drivers follow routes recommended by
routing a service?
V. Čeikutė and C.S. Jensen. Routing service quality—local driver behavior versus routing services. MDM 2013, pp. 97–106
0
500
1000
1500
2000
2500
0.5-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-220
#routes
route length (km)
Routes
5000 routes
Motivation
• How often do drivers follow routes recommended by
routing a service?
• Breakdown by distance traveled
73%
56%
44%
36%
19%
9%
18%
20%
11%
0
20
40
60
80
100
0.5-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-220
onavgmatch(%)
route length (km)
V. Čeikutė and C.S. Jensen. Routing service quality—local driver behavior versus routing services. MDM 2013, pp. 97–106
• ~60% of routes match the
route provided by the routing
service; ~40% do not!
• Use of previous user’s trips
can improve routing quality
Goal of the Study
• Propose a routing framework that
 Utilizes GPS data volunteered by local drivers
 Exploits possibly hard-to-formalize insight into local conditions
 Takes into account temporal variation in driver behavior
 Recommends routes based on popularity and temporal aspects
• Evaluate the quality of proposed routes
 Study based on trip length and pre-selected drivers
 Quality comparison with existing routing service and route
recommendation approaches
Outline
• Introduction and motivation
• Framework
• Data preparation methodology
• Scoring of routes
• Empirical study
 Data foundation
 Routing quality evaluation and comparison
• Related work
• Conclusion and research directions
Framework
offline
GPS logs map-
matching
trajectory
DB
Routing
service
online
trajectory
data
user’s
input data
top route
user
GPS trace
• source
• destination
• time
Trips used:
• Start or end at, or go through,
source and destination locations
• Trips that start during the issued
time period are preferred
When the source and
destination are not covered
by available GPS data, an
existing routing service is
used.
Data Preparation Methodology
• A trip is a sequence of
timestamped GPS points
between stay points.
• Stay point
 A location where the user stays
for a certain period time (no
GPS data is received)
 A location reported repeatedly
for a certain period of time (the
engine remained on)
• In practice, trips are
consolidated
 Short trips are dropped and
some consecutive trips are
mergedx
y
t
same
location
Data Preparation Methodology
• Trips are map-matched and
transformed into sequences of
road segments with entry and
exit timestamps
• Example
 Trip 𝑡𝑟1is map-matched to a
sequence of road segments:
(𝑠1, 𝑡 𝑠1
, 𝑡 𝑒1
), (𝑠2, 𝑡 𝑠2
, 𝑡 𝑒2
), (𝑠3, 𝑡 𝑠3
, 𝑡 𝑒3
)
𝑡𝑟3
𝑡𝑟2
𝑡𝑟1
𝑠1
𝑠2
𝑠3
Data Preparation Methodology
The similarity of two trips is
determined by applying
Longest Common
Subsequence (LCSS) to the
trajectories represented as
sequences of road segments.
s1
s2
s3 s4
s1
s2
s3
s4 s11 s9 s10
s10s9
s5
s6 s7
s8
Similar trips belong to the same route.
𝑆1 𝑝𝑙𝑠, 𝑝𝑙𝑠′
=
𝑙𝑒𝑛𝑔𝑡ℎ(𝐿𝐶𝑆𝑆(𝑝𝑙𝑠,𝑝𝑙𝑠′))
𝑙𝑒𝑛𝑔𝑡ℎ(𝑝𝑙𝑠)
∙ 100%
𝐶1 𝑝𝑙𝑠, 𝑝𝑙𝑠′
=
𝑡𝑟𝑢𝑒⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡𝑖𝑓⁡𝑆1 𝑝𝑙𝑠, 𝑝𝑙𝑠′ ≥ 𝑔𝑒𝑡𝑀𝑎𝑡𝑐𝑕(𝑝𝑙𝑠, 𝑝𝑙𝑠′)⁡
𝑓𝑎𝑙𝑠𝑒⁡⁡⁡⁡⁡⁡⁡𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡
Similarity of two
trajectories
The similarity threshold
depends on the trip length.
Longer trips, higher
similarity thresholds,
interval [90-100].
Do trips follow the same route?
Data Preparation Methodology
• Trips that follow the same sequence of road segments are
grouped into route usage objects.
user route # traversals
𝑢1 𝑟1: 𝑠 𝐵𝐶, 𝑠 𝐶𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐺 - 10off
𝑢2 𝑟2: 𝑠 𝐶𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐺 3peak 5off
𝑢3 𝑟3: 𝑠 𝐸𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐼, 𝑠𝐼𝐺 7peak 4off
𝑢4 𝑟2: 𝑠 𝐶𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐺 5peak 6off
A B
𝑟1
𝑟2
𝑟3
E D F
I
G
C
• Route 𝑟2 is taken by users 𝑢2 and 𝑢4 3 and 5 times during
peak hours and 5 and 6 times during off-peak hours.
(𝑟2, 𝑝𝑙, 𝑝𝑒𝑎𝑘, 𝑢2, 3 , 𝑢4, 5 , 𝑜𝑓𝑓, 𝑢2, 5 , 𝑢4, 6 )
Scoring of Routes
Preferred routes
• Popular among drivers
• Taken by many distinct drivers
• Popular on the time of the day
and day of the week of the query
peak hours
off-peak hours
destination
source
B
A
Final route score:
𝑠𝑐𝑜𝑟𝑒 𝑟 = 𝛽 ∙ 𝑝𝑟𝑒𝑓 𝑀 𝑟 + (1 − 𝛽) ∙ 𝑝𝑟𝑒𝑓 𝑁(𝑟)
Scoring of Routes
Route preference value:
𝑝𝑟𝑒𝑓 𝑟 = 𝛼 ∙ 𝑢𝑠𝑒𝑟𝑠 𝑟 + (1 − 𝛼) ∙ 𝑡𝑟𝑎𝑣𝑒𝑟𝑠𝑎𝑙𝑠(𝑟)
# distinct drivers
taking the route
# of traversals
of the route
Considers trips taken during
the query temporal pattern
Considers trips taken during
other temporal patterns
Empirical Study: Data
• Monitoring period: 2 years
• Number of drivers: 285
• Number of GPS points (raw data): ~182,700,000
• Number of trips: ~275,000
• ―Pay as You Speed‖ project (http://www.sparpaafarten.dk)
0
20000
40000
60000
80000
100000
120000
140000
2-10 10-20 20-30 30-40 40-50 50-350
#traversals
traversal length (km)
Afternoon peak
Morning peak
Off-peak
Routing Quality Evaluation: Data
For this study, we randomly selected equal amounts of trips
for different trip length intervals
0
100
200
300
400
500
600
700
2-10 10-20 20-350
#traversals
traversal length (km)
Off-peak
Morning peak
Afternoon peak
Routing Quality: Match
A match is identified using LCSS
Trajectory
DBtest
Routing
Service
source and
destination
Compare
route with
trajectory
preferred
route
trajectory
Total
number of
matches match!
C
A
B
Trajectory
DB
Routing Quality Evaluation: Results
Our Proposal Google Directions API (top-1)
TPMFP (SIGMOD’13)
0
20
40
60
80
100
2-10 10-20 20-350
%
length (km)
Unmatched
Matched
0
20
40
60
80
100
2-10 10-20 20-350
%
length (km)
Unmatched
Matched
0
20
40
60
80
100
2-10 10-20 20-350
%
length (km)
Empty
Unmatched
Matched
Routing Quality Evaluation: Data
For this study, we considered the five drivers with the most
trips.
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5
#traversals
driver
2-10 km
10-30 km
30-300 km
0
1000
2000
3000
4000
5000
1 2 3 4 5#traversals
driver
Off-peak
Afternoon peak
Morning peak
Traversals per Length Traversals per Temporal Pattern
Routing Quality Evaluation: Results
Our proposal
0%
20%
40%
60%
80%
100%
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
source and dest. source destination subroutes
Different Types of Route Calculation
Matched Unmatched Empty
0
20
40
60
80
100
1 2 3 4 5
%
driver
unmatched matched
Google Directions API (top-1)
Related Work
[1] K.-P. Chang, L.-Y. Wei, M.-Y. Yeh, and W.-C. Peng. Discovering personalized routes from trajectories. LBSN 2011, pp. 33–40
[2] Z. Chen, H. T. Shen, and X. Zhou. Discovering popular routes from trajectories. ICDE 2011, pp. 900–911
[3] W. Lou. H. Tan, L. Chen, and L.M. Ni. Finding time period-based most frequent path in big trajectory data. In SIGMOD 2013,
pp. 713–724
[4] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, and Y. Huang. T-Drive: Driving directions based on taxi trajectories. In GIS
2010, pp. 99–108
• Four existing routing techniques use drivers’ trajectories
[1],[2],[3],[4]
 The road network is formed from the road segments that are
covered by the trajectory data set
 A route is formed by
 Prioritizing parts of roads that are followed the most by a specific
driver (personalized routes) [1]
 Prioritizing parts of roads that are taken by other drivers [2],[4]
 Possibly using sub-routes from multiple routes [3]
 Trajectories used for scoring must contain the destination and
must start and end during the provided time interval. [3]
 Suggested routes are formed from the most popular routes or
route parts in the available data set.
Conclusion and Research Directions
Conclusions
• The proposed framework utilizes trajectory data collected from local
drivers for routing.
• A preferred route is selected using a flexible scoring function that
considers
 The number of traversals of the route
 The number of distinct drivers taking the route
 The time periods when the traversals occurred
• Use of travel histories of local drivers can increase routing quality
• More details in the paper!
Research directions
• Additional aspects of the framework can be considered
 Efficiency of route identification process (LCSS technique)
 Inclusion of personalized routes
 Better support for routes that are constructed from sub-routes
Closing
Demos and Prototype Systems
• EcoTour: http://daisy.aau.dk/its/
 Computes and compares the shortest, the fastest, and the most
eco-friendly routes for arbitrary source-destination pairs in DK.
 Best demo award at IEEE MDM 2013.
• EcoSky: http://daisy.aau.dk/its/eco/
 Supports skyline eco-routing and personalized eco-routing
• Sheafs: http://daisy.aau.dk/its/sheaf
 Trajectory based traffic sheafs
• Strict-Path Queries: http://daisy.aau.dk/its/spqdemo
 Trajectory based, Strict-Path Queries
 Trips (historical travel-time), route choice, Napoleon (road usage)
• iPark: identifying parking spaces from GPS trajectories
 On-street parking lanes vs. parking zones
Napoleon’s Russian Campaign
The Future
• Much more data
 Inductive loop detectors
 Bus data
 Rejsekortet
• Much more connected vehicles
• New services
 Routing
 Safety and warnings
 Parking, fees, insurance, road pricing
 Car sharing, multi-modality
• Driver-less vehicles
Challenges
• Detect ―black spots‖ before they occur.
• Modeling spatio-temporal congestion from data
• Characterize the effects of events
 Accidents, malfunctioning of traffic signals, rain, a concert
• Real-time traffic management
 In response to current or predicted situation, actuate traffic signals
and drivers (via their smartphones or navigation devices) to
optimize the use of the infrastructure and driver experience
• Automated trade-off between weight level of detail and
available data.
• Stochastic routing at 20 milliseconds.
• Integrate with ―point‖ data.
Acknowledgments
• Colleagues at Aalborg University, Aarhus University, and
beyond.
• The EU FP7 project, Reduction: http://www.reduction-
project.eu/
• The Obel Family Foundation: http://www.obel.com/en
Readings
• V. Ceikute, C. S. Jensen: Vehicle Routing with User-Generated
Trajectory Data. MDM (1) 2015
• C. Guo, B. Yang, O. Andersen, C. S. Jensen, K. Torp: EcoMark 2.0:
empowering eco-routing with vehicular environmental models and
actual vehicle fuel consumption data. GeoInformatica 19(3):567-599
(2015)
• C. Guo, Bin Y., O. Andersen, C. S. Jensen, K. Torp: EcoSky:
Reducing vehicular environmental impact through eco-routing. ICDE
2015:1412-1415
• B. Yang, C. Guo, Y. Ma, C. S. Jensen: Toward personalized, context-
aware routing. VLDB J. 24(2):297-318 (2015)
• Y. Ma, B. Yang, C. S. Jensen: Enabling Time-Dependent Uncertain
Eco-Weights For Road Networks. GeoRich@SIGMOD 2014:1:1-1:6
• B. Yang, C. Guo, C. S. Jensen, M. Kaul, S. Shang: Stochastic skyline
route planning under time-varying uncertainty. ICDE 2014:136-147
• B.- Yang, M. Kaul, C. S. Jensen: Using Incomplete Information for
Complete Weight Annotation of Road Networks. IEEE TKDE
26(5):1267-1279 (2014)
• C. Guo, C. S. Jensen, B. Yang: Towards Total Traffic Awareness.
SIGMOD Record 43(3):18-23 (2014)
Readings
• V. Ceikute, C. S. Jensen: Routing Service Quality - Local Driver Behavior Versus
Routing Services. MDM (1) 2013: 97-106
• M. Kaul, B. Yang, C. S. Jensen: Building Accurate 3D Spatial Networks to Enable
Next Generation Intelligent Transportation Systems. MDM 2013: 137-146, best
paper award.
• Ove Andersen, Christian S. Jensen, Kristian Torp, Bin Yang: EcoTour: Reducing
the Environmental Footprint of Vehicles Using Eco-routes. MDM 2013: 338-340,
Demo paper, best demo award.
• B. Yang, C. Guo, C. S. Jensen: Travel Cost Inference from Sparse, Spatio-
Temporally Correlated Time Series Using Markov Models. PVLDB 6(9): 769-780
(2013)
• C. Guo, Y. Ma, B. Yang, C. S. Jensen, Manohar Kaul: EcoMark: evaluating models
of vehicular environmental impact. SIGSPATIAL/GIS 2012: 269-278
• L. Speicys, C. S. Jensen: Enabling Location-based Services - Multi-Graph
Representation of Transportation Networks. GeoInformatica 12(2): 219-253 (2008)
• A. Brilingaite, C. S. Jensen: Enabling Routes of Road Network Constrained
Movements as Mobile Service Context. GeoInformatica 11(1): 55-102 (2007)
• C. Hage, C. S. Jensen, T. B. Pedersen, L. Speicys, I.Timko: Integrated Data
Management for Mobile Services in the Real World. VLDB 2003: 1019-1030

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

OpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on SolrOpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on Solr
 
Search with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial SearchSearch with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial Search
 
Geospatial search with SOLR
Geospatial search with SOLRGeospatial search with SOLR
Geospatial search with SOLR
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Big Search with Big Data Principles
Big Search with Big Data PrinciplesBig Search with Big Data Principles
Big Search with Big Data Principles
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systems
 
Movies&amp;demographics
Movies&amp;demographicsMovies&amp;demographics
Movies&amp;demographics
 
Graph processing
Graph processingGraph processing
Graph processing
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
 
Processing Large Graphs in Hadoop
Processing Large Graphs in HadoopProcessing Large Graphs in Hadoop
Processing Large Graphs in Hadoop
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
 
Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台
 

Ähnlich wie Christian jensen advanced routing in spatial networks using big data

AI & IoT in the development of smart cities
AI & IoT in the development of smart citiesAI & IoT in the development of smart cities
AI & IoT in the development of smart cities
Raunak Mundada
 

Ähnlich wie Christian jensen advanced routing in spatial networks using big data (20)

Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
Edward Robson
Edward RobsonEdward Robson
Edward Robson
 
SG Data analytics.pptx
SG Data analytics.pptxSG Data analytics.pptx
SG Data analytics.pptx
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
AI & IoT in the development of smart cities
AI & IoT in the development of smart citiesAI & IoT in the development of smart cities
AI & IoT in the development of smart cities
 
Transport for London - London's Operations Digital Twin
Transport for London - London's Operations Digital TwinTransport for London - London's Operations Digital Twin
Transport for London - London's Operations Digital Twin
 
Praktijkrelevantie TRAIL PhD onderzoek
Praktijkrelevantie TRAIL PhD onderzoekPraktijkrelevantie TRAIL PhD onderzoek
Praktijkrelevantie TRAIL PhD onderzoek
 
01-11 StreamAir - Donald.pdf
01-11 StreamAir - Donald.pdf01-11 StreamAir - Donald.pdf
01-11 StreamAir - Donald.pdf
 
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
 
Show and Tell - Data and Digitalisation, Digital Twins.pdf
Show and Tell - Data and Digitalisation, Digital Twins.pdfShow and Tell - Data and Digitalisation, Digital Twins.pdf
Show and Tell - Data and Digitalisation, Digital Twins.pdf
 
Ppt (1)
Ppt (1)Ppt (1)
Ppt (1)
 
Traffic assignment of motorized private transport in OmniTRANS transport plan...
Traffic assignment of motorized private transport in OmniTRANS transport plan...Traffic assignment of motorized private transport in OmniTRANS transport plan...
Traffic assignment of motorized private transport in OmniTRANS transport plan...
 
GIS for traffic signal optimization.
GIS for traffic signal optimization.GIS for traffic signal optimization.
GIS for traffic signal optimization.
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Presentation - Mckinsey - Exploring the potential of the “Internet of Things”...
Presentation - Mckinsey - Exploring the potential of the “Internet of Things”...Presentation - Mckinsey - Exploring the potential of the “Internet of Things”...
Presentation - Mckinsey - Exploring the potential of the “Internet of Things”...
 
Valuing demand response pub
Valuing demand response pubValuing demand response pub
Valuing demand response pub
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
e-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestatione-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestation
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performance
 

Mehr von jins0618

Mehr von jins0618 (15)

Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
LITM
LITMLITM
LITM
 
Some links of recommender system
Some links of recommender systemSome links of recommender system
Some links of recommender system
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 

Kürzlich hochgeladen

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Kürzlich hochgeladen (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Christian jensen advanced routing in spatial networks using big data

  • 1. Advanced Routing in Spatial Networks Using Big Data Christian S. Jensen www.cs.aau.dk/~csj
  • 2. Roadmap • Big data, motivation, setting, overview • Modeling and computation of weights  Spatial network lifting  EcoMark  Time-varying, uncertain weights  Complete weight annotation using incomplete information  Near-future travel cost inference • Routing  Stochastic skyline routing  Routing based on local-driver behavior • Closing  Demos, the future, challenges, acknowledgments, readings
  • 4. Roadmap • Big data  Hype or substance?  Instrumentation of reality and digitization  The digital universe  Moore’s Law generalized  Big data challenges
  • 5. Hype or Substance? • We have been pushing the boundaries for decades  How much data we can handle  How fast  Data integration • Examples  VLDB: International Conference on Very Large Database  TODS: ACM Transactions on Database Systems • So is it all hype?  No
  • 6. Instrumentation and Digitization • Instrumentation of reality  Notably, smartphones • Digitization of processes  E.g., e-commerce, public services, communications, social interactions
  • 10. Big Data Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. http://www-01.ibm.com/software/data/bigdata/
  • 11. The Digital Universe • The digital universe  Doubling every ~18-24 months  Grew 60% in 2009, 50% in 2010  2009: 0.8 zettabyte, 2010: 1.2 zettabyte, 2020: 35 zettabytes  2009-20: growth by a factor of 44 http://www.emc.com/collateral/demos/microsites/idc-digital-universe/iview.htm 1 zettabyte = 1024 exabytes 1 exabyte = 1024 petabytes = 260 = 1,152,921,504,606,846,976 ≈ 1018 bytes
  • 12. Moore’s Law – The Bicycle Analogy • Moore’s Law: computers double in speed every 24 months.  Applies also to quality-adjusted microprocessor prices, memory capacity, disks, networks, sensors, and the number and size of pixels in cameras. • How fast would a bicyclist be if Moore’s Law applied?  50 years of doubling every 24 months  30 km/h originally  ~1 billion km/h now • Three lessons  Growth rates in computing are dramatic and difficult to imagine.  Hardware advances are important information technology drivers.  Humans don’t really improve – they are the constants.
  • 13. Big Data – Synthesis • The result is new opportunity. • Lots of data and unprecedented computing infrastructure combine to offer potentials for value creation from data. • To be competitive, society and businesses must be able to create value from data. • Data-based decisions and data-driven processes  Decisions based on good data beat decisions based on feelings or opinions. • A finer granularity of services • Entirely new services
  • 14. Big Data – Data-Driven Society, Business
  • 15. Christian S. Jensen csj@cs.aau.dk Big Data in Transportation: Overview
  • 16. Motivation – ITS • A safer, greener, and more efficient and cost-effective transportation infrastructure • Greenhouse gas emissions reductions via eco-routing • Congestion, greater Copenhagen region  ~10 billion DKK/year (2004) • Bad setting of signalized intersections in Denmark  ~9,3 billion DKK/year (2012)
  • 17. • The transportation sector is the second largest greenhouse gas (GHG) emitting sector and also causes substantial pollution.  By 2025, it’s predicted that 6.2 billion commutes will be made every day, worldwide. • The reduction of greenhouse gas (GHG) emissions from transportation is essential to combat global climate change.  EU: reduce GHG emissions by 30% by 2020.  G8: a 50% GHG reduction by 2050.  China: a 17% GHG reduction by 2015. • Eco-routing can reduce vehicular impact by up to 20%. • General context: Smart City Motivation – Eco-Routing
  • 18. System of Sensors Model • The setting may be modeled as a system of (logical) streams, one per edge.  Data is emitted from the stream of an edge when a vehicle traverses the edge  Spatial  Spatio-temporally correlated  Sparse • Real, unlike early, envisioned smart dust applications!
  • 19. Methodology • The same methodology underlies the studies covered. • Define precisely a problem of (perceived) real-world interest. • Develop solutions  Concepts, data structures, algorithms • Carry out mathematical analyses  Correctness, complexity, storage size • Prototype the solutions and perform empirical studies  Often, real data is needed  Offers detailed insight in the design properties of the solutions  If you cannot measure it, you cannot improve it • Iterate!
  • 20. Data Foundation • A road network model  E.g., OpenStreetMap  Germany: 10 million edges (rough estimate) • Aerial laser scan data  For lifting, optional • Tracking data from vehicles  E.g., GPS data • Fuel consumption data from vehicles  Often called CAN bus data
  • 21. GPS Data – DW Statistics • Number of data sources (and NDAs): ca. 20  Daily data from 4 sources; data in batches from 4 sources (2 small, 2 big); data from ca. 10 finished projects • Daily data  4 sources  ~3 million rows  3500 vehicles • Number of fact table rows: More than 4 billion • Number of vehicles: ca. 25,000 • Rows with fuel data: 500 million (estimate; ~140,000 per day) • Rows from EVs: 100-200 million
  • 22. Software and Hardware • Experimental infrastructure • Software  Have complete software stack for handling traffic data  Map-matching, data cleansing, multiple map support • Hardware  Very modern server farm  Newest machine has 2TB main memory
  • 23. Basic Eco Routing – CPH to the Train Good 17:08 8.06 km 0.80 l Bad 18:06 13.64 km 1.23 l
  • 24. Roadmap to Achieving Eco-Routing EcoMark 3D Road Networks 1. Understand how to quantify environmental impact 2. Assign Eco-Weights to a Road Network 3. Novel Routing Algorithms
  • 25. Understand Vehicular Environmental Impact • EcoMark and EcoMark 2.0  An evaluation framework that compares and analyzes 11 state-of- the-art vehicular environmental impact models using GPS data and a 3D road network.  What do the models need to quantify vehicular environmental impact?  Speeds (instantaneous or average), and accelerations: from GPS data  Road slopes: from 3D road network  Model categorization  Instantaneous models: appropriate for eco-driving  Aggregated models: appropriate for eco-routing  It is possible to assign eco-weights to a large extent using GPS data, 3D maps, and appropriate models.  An extended evaluation framework, EcoMark 2.0, is developed on top of EcoMark using actual fuel consumption data that is obtained from CAN bus data.
  • 26. a1' a11' a1 a11 a5' a5 Elevation matters – 3D Road Network • Efficient methods that can build a 3D road network from 3D laser scan data and a 2D road network. • Use 3D laser scan data to formulate a terrain that is represented as a Triangulated Irregular Network (TIN). • Project 2D polylines to the TIN in order to obtain 3D polylines, and thus result in a 3D road network.
  • 27. Roadmap to Achieving Eco-Routing EcoMark 3D Road Networks Simple vs. Advanced Eco-Weights Static vs. Dynamic Eco-Weights 1. Understand how to quantify environmental impact 2. Assign Eco-Weights to a Road Network 3. Novel Routing Algorithms
  • 28. Eco-Weights – Simple vs. Advanced • Simple: time dependent, deterministic.  Each interval has a deterministic weight.  Ex: (0:00, 7:00]: 10 mg; (7:00, 9:00]: 18 mg; (9:00, 15:00]: 12 mg; … • Advanced: time dependent, uncertain.  Each interval has an uncertain weight, i.e., a random variable that follows some distribution (e.g., a uniform/normal distribution)  Ex: (0:00, 7:00]: 8 to 12 mg, N(9, 10); (7:00, 9:00]: 13 to 23 mg, N(18, 10); (9:00, 15:00]: 8 to 12 mg, N(10, 20); … Representation Coverage Temporal Granularity Accuracy Simple Weights Intervals, each with a single value High (all edges) Coarse (e.g., pre- defined PEAK) Good (better than speed limits) Advanced Weights Intervals, each with a histogram or a probability density function Low (only ―hot‖ edges) Fine (e.g., 15-min intervals) High level of detail (capture the travel cost distributions)
  • 29. Eco-Weights – Static vs. Dynamic • Static Weights  Obtained based on historical GPS data and remain static  Capture typical traffic patterns • Dynamic Weights  Updated as the real-time GPS data streams in  Capture dynamic and instant traffic patterns • Our solutions Static Dynamic Simple Graph Weight Annotation Spatio-Temporal Hidden Markov Model Advanced Time-Dependent Histograms Spatio-Temporal Hidden Markov Model
  • 30. Assigning Eco-Weights • Static, simple weights, all edges  Challenge: infer weights for ―cold‖ edges – the edges that are not covered by any GPS data  Graph weight annotation  Quantify edge similarities based on traffic flows that are derived from the topology of a road network  Propagate the weights on hot edges to similar cold edges • Static, advanced weights, hot edges  Capturing the weights of hot edges using time dependent histograms  Challenge: compact representation of the histograms • Dynamic, simple and advanced weights, hot edges  Inferring near-future weights of hot edges as GPS data streams in  Challenge: sparseness, dependency, and heterogenity
  • 31. Time-Dependent Histograms • We need to contend with three challenges. • Multiple travel costs  Consider not only GHG emissions, but also distance, travel time. • Time-dependence  Some travel costs are time-dependent.  E.g., travel times on the same road may vary during peak vs. off- peak periods. • Uncertainty  Some travel costs are uncertain.  E.g., aggressive and moderate driving on the same road may produce different GHG emissions.
  • 32. Road Network Model: MTUG • MTUG: Multi-cost, Time-dependent, Uncertain Graph • N different costs are of interest.  Distance (DI), travel time (TT), GHG emissions (GE). • An MTUG is a graph G = (V, E, MM, W)  V is the vertex set, and E is the edge set.  MM = <MM(1), …, MM(N)>  Function MM(i) maps an edge to the minimum and maximum i-th cost of using the edge.  MM(TT) (ei) = (150 seconds, 500 seconds)  MM(GE) (ei) = (10 ml, 85 ml)  W = <W(1), …, W(N)>  Function W(i) maps an edge to a set of (interval, random variable) pairs of the i-th cost type.  W(TT) (ej) = {([0:00, 7:15), N(300, 120)), ([7:15, 8:45), N(450, 100)), …}  W(GE) (ej) = {([0:00, 7:00), N(30, 100)), ([7:00, 9:00), N(50, 80)), …}
  • 33. Instantiation of MM in an MTUG • GPS records are map matched to edges. • Each edge is associated with a set of edge records of the form (e, t, C)  An edge record indicates that a traversal on edge e at time t takes costs C, where C is a vector of all costs of interest.  (e1, 8:08, <55 seconds, 80 ml>)  (e1, 9:08, < 45 seconds , 63 ml>)  (e1, 9:10, < 43 seconds , 60 ml>)  (e1, 10:03, < 45 seconds , 62 ml>) • MM(TT) (e1) = (43 seconds, 55 seconds) • MM(GE) (e1) = (60 ml, 80 ml)
  • 34. Instantiation of W in an MTUG • Partition a day into 96 15-min intervals. • For each edge and interval, we obtain a multi-set containing the costs on the edge during the interval.  ms = {(10 s, 3), (20 s, 10), (25 s, 20), (30 s, 10), (40 s, 5), …} • Estimate a random variable (RV) based on the multi-set. • Combine adjacent intervals with similar RVs and estimate a new RV. The long intervals along with the RVs instantiate W.
  • 35. Route Costs in MTUG • Given a route Ri = <r1, r2, …, rX>, where ri ∈ E is an edge. • RC(Ri, t) indicates the costs of using route Ri at time t  RC(Ri, t) = <RVDI, RVTT, RVGE> is a vector of RVs, where each RV corresponds to a travel cost. • RVDI is deterministic and equals the sum of the length of each edge in route Ri. • RVTT is the convolution of the travel time RV of each edge in route Ri.  The travel time RV of the first edge r1 is dependent on the trip start time t.  The travel time RV of the k-th edge rk is dependent on the travel time of the previous k-1 edges, which are generally uncertain. • RVGE is the convolution of the GHG emissions RV of each edge in route Ri.
  • 36. Roadmap to Achieving Eco-Routing EcoMark 3D Road Networks Simple vs. Advanced Eco-Weights Static vs. Dynamic Eco-Weights 1. Understand how to quantify environmental impact 2. Assign Eco-Weights to a Road Network 3. Novel Routing Algorithms Skyline Eco-Routing Continuous Eco-Routing Personalized Eco-Routing
  • 37. Novel Routing Algorithms • Stochastic skyline route planning under time-varying uncertainty.  Identify pareto-optimal routes considering multiple travel costs, e.g., travel times, GHG emissions, distances.  R1 (15 min, 500 ml, 25 km)  R2 (13 min, 520 ml, 26 km)  R3 (16 min, 530 ml, 30 km) • Continuous routing that supports real-time updates on eco-weights (i.e., dynamic eco-weights).  Real-time re-navigate vehicles according to recently updated eco-weights and the current locations of vehicles. • Context-aware, personalized routing  Identify context-aware preferences for each driver (e.g., time- efficient driving or fuel-efficient driving).  Choose a single route that best satisfies the driver’s preferences.
  • 38. Deterministic Skyline Routes • Route cost: cost(Ri) = <DI, TT, GE>  A vector of deterministic values.  Each value corresponds to a travel cost. • Dominance relationship  Ri dominates Rj iff all the costs of Ri are no greater than those of Rj, and there is at lest one cost of Ri is smaller than that of Rj. • Consider multiple routes for the same source-destination.  R1: 3.5 km, 230 mg, 10 min;  R2: 5.1 km, 250 mg, 11 min;  R3: 5.1 km, 200 mg, 12 min; • The skyline routes are the non-dominated routes.  Since R2 is dominated by R1, R1 and R3 are the skyline routes.
  • 39. Stochastic Dominance • Route cost: RC(Ri) = <RVDI, RVTT, RVGE>  A vector of random variables (RVs), where each RV represents the distribution of a travel cost. • Stochastic Dominance between two RVs  Given two RVs X and Y, if cdfX(a) >= cdfY(a), for all possible value a in R+, we say ―X stochastically dominates Y‖.  Cost(R1).RVTT stochastically dominates Cost(R2). RVTT.  No stochastic dominance between Cost(R3). RVTT and Cost(R4). RVTT.
  • 40. Stochastic Skyline Routes • Dominance between two routes Ri and Rj  If each RV of cost(Ri) stochastically dominates the corresponding RV of cost(Rj), then Ri dominates Rj. • Stochastic Skyline Routes  Given a source-destination pair, and a trip starting time.  Stochastic skyline routes are the routes that are not dominated by any other routes. • Brute force stochastic skyline route planning  Enumerate all possible routes, compute the route costs, and check whether one route dominates another.  Very inefficient and works only for small road networks. • Efficient stochastic skyline route planning  Prune early some routes that cannot be skyline routes.  Efficient stochastic dominance checking.
  • 41. Example Result • Skyline routes R1, R2, and R3, identified by our algorithm • R1: 94,849 m; R2: 106,216 m; R3: 91,382 m;  DI: R3 dominates R1 and R2.  TT: R1 dominates R2 and R3  GE: R2 dominates R1 and R3.
  • 42. Personalized Routing • Different drivers may take different routes because they may have quite different preferences. • The same drivers may take different routes in different contexts.  Morning: try to save time to avoid being late.  Weekend afternoon: try to save fuel consumption. • Challenges  Identify contexts for drivers and identify driving preference in each context.  Deal with time-dependent uncertain travel costs, e.g., travel time and fuel consumption, while considering individual drivers’ driving behaviors, e.g., aggressive vs. moderate driving.
  • 43. Drivers’ Trajectories Framework Context and Preference Identification Contexts & Preferences Driver, Source, Destination, Time RoutingModule OptimalRoutes Offline Phase Online Phase
  • 44. Example Results  Dark, bold routes: actual routes used by drivers.  Red routes: shortest routes.  Green routes: fastest routes.  Blue routes: predicted routes using the identified contexts and driving preferences.
  • 45. Summary • Eco-Routing is able to effectively reduce environmental impact from transportation section. • Eco-routing is achieved by  Reliably and accurately quantifying vehicles’ environmental impact using GPS data and 3D road networks;  Assigning various types of eco-weights to edges;  Developing novel routing algorithms that enable effective and practical eco-routing service.
  • 47. Outline • Motivation • Solution  Phase A: Filtering  Phase B: Lifting • Experiments • Summary
  • 48. High-Res 3D Model Usage Eco-Routing •USD 6 Billion per annum in Fuel savings with 3D model – TomTom •EcoMark: Benchmark shows better CO2 emissions estimation.
  • 49. The Idea 2D Road Network Aerial Laser Scan 3D Road Network
  • 50. Classifying 3D Models • Need for accurate 3D spatial networks by applications. • Ease of availability of rich terrain datasets. 3D Road Model High-Res Model •± 20 cm Accuracy •Higher Storage Cost Low-Res Model •± 2 mts Accuracy •Lower Storage Cost
  • 51. Current 3D Map Generation Methods • Van fitted with Differential GPS driven on roads
  • 52. Current 3D Map Generation Methods • Crowd-sourcing from ―Personal Navigation Devices‖ (PNDs).
  • 53. Solution GOAL: Design two methods to generate High and Low resolution 3D maps.
  • 54. Problem Definition  Input  A 2D spatial network, where an edge is modeled as a 2D polyline  A laser scan point cloud containing a set of 3D points  Output  A 3D spatial network, where an edge is modeled as a 3D polyline 2D Spatial Network + 3D points 3D Spatial Network 3D Points ε-Neighborhood
  • 55. Framework 2D Spatial Network Laser Scan Point Cloud One pass Filtering External Memory based Filtering Filtering Delaunay Triangulation or Lifting Interpolation 3D Spatial Network
  • 57. One-Pass Filter (OPF) •3D laser scan points are ―Inherently sorted spatially‖! •Data Guarantee: A 1m x 1m cell contains two 3-D points. •Assumption: The road is much smaller and fits in main memory.  Build a grid-index Hg from the road segments, mapping road points to cell IDs.  Scan the massive 3D point cloud in a single pass and build a 3D point to cell mapping Hp ”seeded” by Hg  If 3D points exceed a threshold cell occupancy of α x δ x δ where α is the user defined threshold in [0,1] and δ is the grid cell width -> Lift cell contents and release cell memory!
  • 58. One-Pass Filter (OPF) Advantages of OPF Needs no pre-processing of massive 3D point cloud for sorting or indexing. Builds parts of the 3D road network as 3D points are streaming through. Faster runtimes, lower resolution. (± 2m accuracy)
  • 59. External Memory based Filtering (EMF) • No assumption about road network size. Neither road points nor 3D points fit in memory. Hence, stored on disk. • Limited memory budget available to the method.  Moore Neighborhood: 3,5,8-cell neighborhood. Ex: grey shaded region around cell (1,1).
  • 60. External Memory based Filtering (EMF) Locality Preserving Space-Filling Curves
  • 61. External Memory based Filtering (EMF)  Read road point disk blocks Br from disk into main memory.  For each block Br - read 3D points Moore neighborhood blocks from disk into a limited LRU cache (size given in multiples of the grid width c).  Continue this process in a space- filling curve (Z-order or Row-Major) fashion to avoid re-reading 3D point blocks from disk.
  • 62. Z-Order vs. Row-Major Disk Access
  • 63. Z-Order vs. Row-Major Disk Access  Why is Row-Major doing well at 3c? LRU Cache Block to process Neighbor Block needed in memory Blocks in cache
  • 64. Z-Order vs. Row-Major Disk Access  Why is Row-Major doing well at 3c? LRU Cache (Filled up) Block to process Neighbor Block needed in memory Blocks in cache
  • 65. Z-Order vs. Row-Major Disk Access  Why is Row-Major doing well at 3c?  Answer: Disk Blocks are read into memory exactly once ! LRU Cache (Filled up) Block to process Neighbor Block needed in memory Blocks in cache Blocks evicted
  • 66. External Memory based Filtering (EMF) Advantages of EMF Requires little main memory Slower runtimes, higher resolution (± 20 cm accuracy) Can work to ―Zoom-in‖ on OPF created maps to improve resolution on mobile devices
  • 68. Triangulation • Delaunay Triangulation  After filtering, a set of 3D points is associated with a cell of road segments.  The set of 3D points is triangulated. 3D Points 2D Road Points OPF vs. EMF
  • 69. Interpolation a1' a11' a1 a11 a5' a5  Project the TIN triangles to the x-y plane.  Compute the intersections of triangles and road 2D polyline.  Re-project road points + intersections back to TIN surface again.  Interpolate road points on TIN to get elevations.
  • 70. Experimental Study Dataset Statistics •Aalborg (AA) : 7km x 5km •North Jutland (NJ) : 185km × 130km •NJ approx. 120 times larger than AA.
  • 72. Experimental Study (Scalability) • North Jutland (NJ) covering a region of 185km × 130km • EMF  Accuracy ±20 cm in 21 hrs with memory use of 230 MB • OPF  Accuracy ±2 m in 2 hrs, which is an order of magnitude faster than EMF, using 2.4 GB of main memory
  • 73. Summary • Augment a 2D road network with elevation values from 3D point cloud • Propose two methods that work with limited memory budgets • Extensive experiments to offer insight into both methods ** 3D Spatial Network available from UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/)
  • 74. EcoMark: Evaluating Models of Vehicular Environmental Impact
  • 75. Outline • Motivation • EcoMark overview • Model analysis • Experiments • Context • Summary
  • 76. Motivation • The reduction of greenhouse gas (GHG) emissions from transportation is essential to combat global climate change.  EU: reduce GHG emissions by 30% by 2020.  G8: a 50% GHG reduction by 2050.  China: a 17% GHG reduction by 2015. • Eco-driving and eco-routing can reduce GHG emissions.  Eco-driving: driving behavior  Eco-routing: routes that minimize fuel usage • We need to be able to measure vehicular environmental impact.  Several impact models have been developed, but a comprehensive comparison of these is missing.  The utility of the models for eco-driving and eco-routing is not well understood.
  • 77. EcoMark • EcoMark aims to enable a better understanding of how to compute the environmental impact of vehicles. • EcoMark is a benchmark that compares and analyzes 11 state-of-the-art vehicular environmental impact models.  No existing such comprehensive benchmark. • Key characteristics of EcoMark  Uses GPS data from vehicles and a 3D spatial network  Categorizes the 11 models into two major categories  Compares and analyzes the models in the same categories, and the models between the two categories
  • 78. EcoMark Framework GPS observations 2D Spatial Network Laser Scan Points Input Instantaneous Impact 6 Instantaneous Models 5 Aggregated Models Aggregated Impact Comparison and Analysis Insights EcoMark Map Matching GPS Trajectories Spatial Network Lifting 3D Spatial Network
  • 79. 3D Spatial Network • Spatial network lifting  2D spatial network: OpenStreetMap  Aerial laser scan of Denmark (1+ point per m2; 2.5 TB for Denmark) • A 3D spatial network is a directed, weighted graph denoted as G = (V, E, F, H), where  V and E are the vertex and edge sets.  F maps vertices to 3D coordinates.  H maps edges to grades.
  • 80. Benchmark Experiments • Exp1: Eco driving: Comparisons of instantaneous models  Compare the behaviors of the 6 models.  Identify models that behave similarly. • Exp2: Eco routing: Comparisons of aggregated models  Compare the behaviors of the 5 models.  Identify models that behave similarly. • Exp3: Cross-comparisons of instantaneous and aggregated models  Identity the similarity of the aggregation of instantaneous models and aggregated models. • Exp4: Study of models that support grade information  How important is 3D – effects of using a 3D spatial network.
  • 81. Insights • Instantaneous models  Five of six models behaved consistently.  They can be used to measure the impact of driving behaviors.  They can be used to classify good and bad driving behaviors. • Aggregated models  All the five models behaved consistently.  Not suitable to identify impact at operational levels.  They can be used to assign eco-weights to road segments, and are helpful for eco-routing. • Importance of 3D Spatial Networks  Road grades affect all models substantially.  Eco-driving and eco-routing both benefit.
  • 82. Summary • EcoMark  Evaluates empirically 11 existing models of vehicular environmental impact using a large trajectory database and a lifted, 3D OpenStreetMap  Findings: Instantaneous models support eco driving, while aggregated models support eco-routing. • Next steps  Accuracy comparison with ground truth CAN bus data (EcoMark 2.0)  Requires access to corresponding GPS and CAN bus data.  Eco driving and eco routing
  • 84. Outline • Motivation • Problem definition • Framework • ERN building • GHG emissions estimation • Empirical study • Summary
  • 85. Motivation – Advanced Eco-Weights • Eco-routing is an effective way to reduce vehicular environmental impact. • The capture of the environmental costs of traversing road network edges is key to eco-routing.  Eco-weights are uncertain.  Eco-weights are time-dependent.  Once obtained, existing routing algorithms can be applied to compute eco-routes.
  • 86. Problem Definition • Goal: Obtain an Eco Road Network G = (V, E, F)  V: Vertex set. Each vertex indicates a road intersection.  E: Edge set. Each edge indicates a road segment.  Function F assigns a time-dependent, uncertain eco-weight to each edge in E. • Input  A set of map-matched trajectories TR.  An accompanying road network G' = (V, E, Null). • Output  The Eco Road Network G = (V, E, F).
  • 87. Approach • Time-dependent uncertain histograms.  A vector of (period, histogram) tuples <Ti, Hi>.  Hi is the histogram describing the distribution of cost values observed in period Ti. • Time-dependent uncertain histograms are used to represent the eco-weight of each road network edge. • Several data compression techniques are applied to reduce the storage space while retaining accuracy. [10 a.m., 11 a.m.)[9 a.m., 10 a.m.)[8 a.m., 9 a.m.) 0 0,5 1 0 0,5 1 0 0,5 1
  • 88. Framework Trajectories TR Road Network G = (V, E, NULL) Traversal Record Analysis Initial Histogram Construction Histogram Merging Bucket Reduction Eco Road Network G = (V, E, F)
  • 89. Traversal Record Analysis • GPS records are map-matched to the corresponding edges. • Map-matched records are transformed into traversal records.  A traversal record r = (e, t, tt, ge) indicates that edge e is traversed by a trajectory trj starting at time t and has travel time tt and GHG emissions ge.  The VT-micro environmental impact model is used to estimate the GHG emissions of each traversal record.
  • 90. Initial Histogram Building • Each edge is associated with a set of traversal records • Divide the time space into intervals with equal width  The default value is 1 hour, (24 intervals in total). • For each edge e  Build equi-width histograms for each time interval.  The number of buckets per time interval is configurable.  The histograms are isomorphic. 0.1 0.3 0.5 0.7 [0, 20) [20, 40) [40, 60) [60, 80] Probability GHG emissions (mL) [8 a.m., 9 a.m.) [9 a.m., 10 a.m.)
  • 91. Histogram Merging • For each edge, merge two temporally adjacent histograms if they are sufficiently similar. • Use cosine similarity to quantify similarity. • We use a merge threshold Tmerge to decide when to stop merging. sim(Hi ,Hj ) = V(Hi )ÄV(Hj ) V(Hi ) · V(Hj ) 0.1 0.3 0.5 0.7 [0,20) [20,40) [40,60) [60,80] Probability GHG emissions (mL) [8 a.m., 9 a.m.) [9 a.m., 10 a.m.) 0.1 0.3 0.5 0.7 [0,20) [20,40) [40,60) [60,80] Probability GHG emissions (mL) [8 a.m., 10 a.m.)
  • 92. Histogram Bucket Reduction • Further reduce the storage size of an individual histogram by merging adjacent buckets.  Use SSE to measure the merge cost (accuracy loss).  Merge buckets when the cost does not exceed threshold Tred.  Iteratively merge adjacent buckets in all the histograms of a road segment. 0.1 0.3 0.5 0.7 [0,20) [20,40) [40,60) [60,80] Probability GHG emissions (mL) [8 a.m., 10 a.m.) 0.1 0.3 0.5 0.7 [0,40) [40,60) [60,80] Probability GHG emissions (mL) [8 a.m., 10 a.m.)
  • 93. GHG Emissions Estimation • For a route  Estimate the distribution of GHG emissions as a histogram.  Aggregate the histograms of the edges in the route. • Given two histograms H1 and H2 for adjacent edges  A histogram H′ is computed that represents the aggregated GHG emissions distribution for traversing both edges. 21 ' HHH  0.1 0.3 0.5 0.7 [0,20) [20,40)[40,60) Probability GHG emissions (mL) e1 e2 0.1 0.3 0.5 0.7 [0,40) [40,80)[80,120) Probability GHG emissions (mL) e1 + e2
  • 94. Histogram Aggregation • Given buckets bj in H1 and bk in H2 • A bucket bi in H’ is generated as • Enumerate all bucket pairs of H1[j] and H2[k]. • Rearrange the buckets in H′ to obtain an equi-width histogram. H' [i]. p= H1[ j]. p×å H2[k]. p Range(H' [i]) = Range(H1[ j])+ Range(H2[k])
  • 95. Experiments • Experimental settings  Data Set: 200 million GPS records.  Road network of Denmark: 414K vertices and 818K edges.  GHG emissions: VT-micro is applied to evaluate vehicular environmental impact. • Evaluation  Baseline: without any compression.  State-of-the-art: T-drive.  Accuracy, storage, and efficiency.
  • 96. Storage Study • Histogram Merging • Bucket Reduction MCRm = Minit - Mmerge Minit 0.6 0.7 0.8 0.9 1 0.9 0.92 0.94 0.96 0.98 MCR Merge Threshold MCRr = Mmerge - Mredu Mmerge 0.6 0.7 0.8 0.9 1 35 40 45 50 55 60 MCR Bucket Limit TMerge = 0.98 TMerge = 0.95
  • 97. Accuracy Study • Histogram Aggregation • MCR vs. Err 0.8 0.9 1 2 6 10 14 18 HSimilarity Number of Edges 0 0.1 0.2 0.3 0.4 35 40 45 50 55 60 Err Bucket Limit Tmerge = 0.98 Tmerge = 0.95 0.2 0.4 0.6 0.8 0.1 0.12 0.14 0.16 0.18 MCR Err ERN T-drive • Error Metric – Err  The relative accumulative deviation of all the buckets in a histograms from the original distribution. 0 0.1 0.2 0.9 0.92 0.94 0.96 0.98 Err Merge Threshold After Merging Equi-width • Histogram Merging • Bucket Reduction
  • 98. Summary • We propose technique to estimate the time-dependent environmental travel costs in a road network based on histograms. • An empirical study suggests that we can efficiently compute eco-weights of road segments while retaining accuracy. • The proposed eco-weights can be used for eco routing and real-time traffic monitoring.
  • 99. Using Incomplete Information for Complete Weight Annotation of Road Networks
  • 100. Overview • Motivation • Problem definition • Cost modeling • Objective functions  Residual Sum of Squares  Topology constraint  Adjacency constraint • Experiments
  • 101. Motivation • Why weight annotation?  Vehicle routing relies on a weighted-graph representation of the underlying road network.  Given a graph with appropriate weights, different types of routes can be computed. • Why incomplete information?  Weights that capture travel costs such as travel times can be extracted from GPS trajectory data.  GPS trajectory data typically does not cover all edges. • Why complete weight annotation?  To use a graph for routing, all edges must have weights.
  • 102. Problem Definition GPS Records Pre-Processing Module A set of (trip, cost) pairs {(t(i), c(i))} G''(V, E, null) Weight Annotation Module G(V, E, H)
  • 103. Temporal Road Network Cost Modeling  A road network is modeled as a directed graph.  Temporal tags: during the same temporal tag, the traffic has similar behavior, e.g., PEAK and OFFPEAK.  Each edge has 2 cost values, which indicate the cost per unit length for traversing the corresponding road segment during PEAK and OFFPEAK, respectively.  d(AB, PEAK) indicates the cost per unit length for traversing the road segment AB during PEAK tag.  d: a vector of size |E| * |TAGS| (e.g., 6 * 2 =12) A B C D d(AB, PEAK) d(AB, OFFPEAK) d(BC, PEAK) d(BC, OFFPEAK) d(BA, PEAK) d(BA, OFFPEAK) d(CB, PEAK) d(CB, OFFPEAK) d(BD, PEAK) d(BD, OFFPEAK) E d(CE, PEAK) d(CE, OFFPEAK)
  • 104. Trip Cost Modeling • GPS trajectory  A sequence of records of the form (x,y,t). • Trip  A sequence of records of the form (edge, ts, te). • The cost of a trip is the sum of the cost of each record.  E.g., t1= <AB, 7:55, 8:00>, <BC, 8:00, 8:10>  cost_model(t1) = d(AB, PEAK) * length(AB) + d(BC, OFFPEAK) * length(BC) • If we know all cost variables in d, we can compute the estimated cost of any trip.
  • 105. • For every trip t in T’  The square of difference between the actual cost of t and the estimated cost using the proposed model, i.e., cost_model(t).  E.g., t1= <AB, 7:55, 8:00>, <BC, 8:00, 8:10>, actual_cost(t1) = 100.  RSS(t1) = (100 – cost_model(t1))2  Problem: Training data T’ may not cover the whole road network. By minimizing the RSS function, we can only get cost variables for the road segments traversed by T’. d(BC, OFFPEAK) Objective Functions: Residual Sum of Squares A B C D d(AB, OFFPEAK) d(BC, PEAK) d(BA, PEAK) d(BA, OFFPEAK) d(CB, PEAK) d(CB, OFFPEAK) d(BD, PEAK) d(BD, OFFPEAK) E d(CE, PEAK) d(CE, OFFPEAK) d(AB, PEAK)
  • 106. Objective Function: Topological Constraint • If two road segments locate in similar topology in a road network, they tend to have similar cost during the same tag. • Given a pair of road segments ei and ej  The weighted square loss between the cost variables of ei and ej.  weight_TC(ei, ej) * (d(ei, PEAK) – d(ej, PEAK))2  If weight_TC (ei, ej) is big, d(ei, tag) and d(ej, tag) tend to be similar in order to make (d(ei, PEAK) – d(ej, PEAK))2 small. A B C D d(AB, PEAK) d(AB, OFFPEAK) d(BC, PEAK) d(BC, OFFPEAK) d(BA, PEAK) d(BA, OFFPEAK) d(CB, PEAK) d(CB, OFFPEAK) d(BD, PEAK) d(BD, OFFPEAK) Ed(CE, OFFPEAK) d(CE, PEAK)
  • 107. Weighted PageRank Computation • PageRank gives a value to each vertex in a graph. • Primal graph -> dual graph  Vertex -> Edge  Edge -> Vertex A B C D E BA CB BD AB BC CE
  • 108. Weighted PageRank Computation • Original PageRank  A vertex evenly gives its PR value to all its adjacent vertices.  Vertex AB gives ½ * PR(AB) to BC and BD, respectively. • Weighted PageRank  Weight(AB, BD) = 𝑇𝑟𝑖𝑝𝑁𝑢𝑚 𝐴𝐵,𝐵𝐷 +1 𝑇𝑟𝑖𝑝𝑁𝑢𝑚 𝐴𝐵,𝐵𝐷 +𝑇𝑟𝑖𝑝𝑁𝑢𝑚 𝐴𝐵,𝐵𝐶 +|𝑑𝑒𝑔𝑟𝑒𝑒(𝐴𝐵)|  100 trips from AB to BC, 50 trips from AB to BD  weight(AB, BD) = 50+1 50+100+2 = 51 152  weight(AB, BC) = 100+1 50+100+2 = 101 152  No trips information  weight(AB, BD) = weight(AB, BC) = 1 2 BA CB BD AB BC CE
  • 109. Weighted PageRank based Topological Constraint • Use weighted PageRank value to quantify the topological similarity.  If two edges have similar weighted PageRank values, they also tend to have similar cost.  weight_TC(ei, ej) = m𝑖𝑛⁡( 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘 𝑒𝑖 ,⁡𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘(𝑒𝑗)) max⁡( 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘 𝑒𝑖 ,⁡𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘(𝑒𝑗))
  • 110. Objective Function: Adjacency Constraint  If two road segments are directionally adjacent in a road network, they tend to have similar cost during the same tag.  d(AB, PEAK) may be similar to d(BC, PEAK)  But not necessary to be similar to d(CB, PEAK) or d(BC, OFFPEAK)  Given a pair of directionally adjacent road segments ei and ej  The weighted square loss between the cost variables of ei and ej.  weight (ei, ej) * (d(ei, PEAK) – d(ej, PEAK)) 2 A B C D d(AB, PEAK) d(AB, OFFPEAK) d(BC, PEAK) d(BC, OFFPEAK) d(BA, PEAK) d(BA, OFFPEAK) d(CB, PEAK) d(CB, OFFPEAK) d(BD, OFFPEAK) E d(CE, PEAK) d(CE, OFFPEAK) d(BD, PEAK)
  • 111. Objective Functions • Residual Sum of Squares  RSS(d) = (actual_cost(t) – cost_model(t))2 t⁡∈T′ • Topological Constraint  PRTC(d) = 𝑤𝑒𝑖𝑔𝑕𝑡_𝑇𝐶 𝑒𝑖, 𝑒𝑗 ∗⁡(𝑑 𝑒𝑖, 𝑡𝑎𝑔 − 𝑑(𝑒𝑗, 𝑡𝑎𝑔))2 𝑖,𝑗⁡ • Adjacency Constraint  ACTC(d) = 𝑤𝑒𝑖𝑔𝑕𝑡 𝑒𝑖, 𝑒𝑗 ∗⁡(𝑑 𝑒𝑖, 𝑡𝑎𝑔 − 𝑑(𝑒𝑗, 𝑡𝑎𝑔))2 𝑖,𝑗⁡ • Overall objective function  The sum of the above three (plus a regularizer term) • Differentiating the objective function w.r.t. vector d and setting it to 0 yields an equation that can be solved using standard techniques to obtain a solution for d.
  • 112. Experimental Results • Two weeks GPS trajectories from SPF • 18K trips in total, and 20% as training, 80% as testing • Two road networks: Skagen and North Jutland • Objective functions used  F1: RSS; F2: RSS + PRTC; F3: RSS + ACTC  F4: RSS + PRTC + ACTC
  • 113. Travel Cost Inference from Sparse, Spatio-Temporally Correlated Time Series Using Markov Models
  • 114. Outline • Motivation • Travel-cost time series • Modeling travel-cost time series in a road network • Learning a spatio-temporal hidden Markov model • Empirical study • Summary
  • 115. • Vehicle routing plays an important role in transportation.  E.g., fastest routes, eco-routes. • Effective vehicle routing relies on a weighted graph representation of a road network.  The weight of an edge should accurately capture the travel cost of traversing the edge.  E.g., travel times, greenhouse gas (GHG) emissions. • Edge weights should be dynamically changing due to the dynamics of traffic. To support dynamic edge weights, we propose techniques that are able to infer near-future travel costs based on incoming GPS data. Motivation
  • 116. Travel-Cost Time Series • The travel costs on edge ei are modeled as a time series TSi = <Ci (1), Ci (2), …, Ci (T)>. • Travel cost set Ci (j) contains travel costs observed in the j-th time interval on edge ei.  An interval is the finest time interval of interest.  E.g., 15 minutes, yielding 96 intervals per day.  A travel cost is the travel time or GHG emissions of a traversal of edge ei during the j-th interval.  E.g., during the j-th interval [8:00, 8:15), if 3 vehicles traversed ei taking 30, 32, 40 s, Ci (j) = {30, 32, 40}. e1 e2 e3 TS1 TS2 TS3
  • 117. Time Series Characteristics • Dependent  Travel costs on adjacent edges may be dependent. • Heterogeneous  The traffic on different edges is often different.  Some edges have clear peak/off-peak periods while others do not. • Sparse  Some intervals do not have any travel cost observations.
  • 118. Modeling Travel-Cost Time Series • We use a Spatio-Temporal Hidden Markov Model (STHMM) to model the interactions among multiple travel- cost time series in a road network. State transition on edge e1 s1 t-1= cong s1 t= cong s1 t+1 = clr C1 t-1 C1 t C1 t+1Time Series: TS1 TP TP OP OP OP An edge ei has a set of hidden states Si, where each state indicates a particular traffic status on the edge. S1={cong, clr} During an interval, the traffic of an edge is in a state. Intervals: t-1 (previous) t (current) t+1 (next) The traffic on an edge evolves over time, transiting from one state to another according to transition probabilities (TP). In an interval, the travel costs observed on an edge is dependent on the state of the edge according to output probabilities (OP).
  • 119. Modeling Multiple Time Series using STHMM e1 e2 e3 e1 e2 e3 s1 t-1= cong s1 t = cong s1 t+1= clr s2 t-1= clr s2 t = clr s2 t+1= cong s3 t-1= cong s3 t = clr s3 t+1= clr Intervals: t-1 (previous) t (current) t+1 (next) TP: P(S1 t|S1 t-1, S2 t-1) TP: P(S2 t|S1 t-1, S2 t-1, S3 t-1) TP: P(S3 t|S2 t-1, S3 t-1) Capture the traffic dependency among edges.
  • 120. Travel-Cost Time Series Derived from Historical GPS Data Framework Overview State Formulation & Parameter Learning Spatio-Temporal Hidden Markov Model Routing Algorithms Near Future Edge Weights Real-Time GPS Data Eco-routes & Fastest-routes
  • 121. State Formulation – Intuitions • Since edges are heterogeneous, we need to identify a unique state set for each edge.  Edges with heavy and complicated traffic: more states  Edges with light and simple traffic: fewer states • Target properties of the state set of an edge  The traffic in the same state behaves similarly.  The current state depends on its previous state.
  • 122. State Formulation – Methods • For each edge, identify a Gaussian Mixture Model (GMM) with K Gaussian components to capture the distribution of all the travel costs on the edge.  Due to heterogeneity, different edges may have different Ks. • Transform the travel costs in each interval in the edge’s time series into a K+1 dimensional point.  The first K dimensions correspond to the travel cost distribution in the interval, still using the K Gaussian components.  The K+1st dimension captures the difference between the travel cost distributions in the interval and its previous interval. • Cluster these K+1 dimensional points into several clusters, and each cluster is a state.
  • 123. Parameter Learning • Output probabilities  Since each state is a cluster, we use the cluster center to derive a GMM as the output probability for the state. • Transition probabilities  For each edge, we need the edge’s travel-cost time series, along with its adjacent edges’ travel-cost time series.  Based on the obtained output probabilities, transition probabilities are then estimated using maximum likelihood estimation.
  • 124. Travel Cost Inference based on an STHMM e1 e2 e3 Est. s1 t Est. s2 t Est. s2 t+1 Est. s3 t Intervals: t (current) t+1 (next) Real Time GPS data On e3 Real Time GPS data On e2 Real Time GPS data On e1 OP + Bayes OP + Bayes OP + Bayes OP Est. Travel Costs TP TP TP
  • 125. Empirical Settings • Data  181 million GPS records from North Jutland, Denmark from April 2007 to March 2008.  The data is split into training and testing sets. • Periods of interest  Weekdays during [6:00, 20:00). • Road network  1,916 edges in North Jutland with more than 500 records. • Travel costs  Travel time and GHG emissions. • Accuracy measurements  Average sum of squared loss (ASSL) between estimated costs and real costs.
  • 126. Experimental Study • Aspects studied  Effect of the length of interval.  Effect of the sizes of training data  Effect of recent data  Effect of seasonality • Results  Efficient; able to support real-time weight updates  Accurate; outperforms a baseline and the state-of-the-art
  • 127. Summary • Described an STHMM that is able to model multiple correlated, sparse time series.  Dependent: STHMM couples the states from adjacent edges.  Heterogeneous: State formulation identifies a unique state set for each edge.  Sparse: Smoothing techniques are used to contend with sparse data. • The described framework is able to support real-time weight updates and thus provide more accurate routing services, e.g., eco-routing.
  • 128. Stochastic Skyline Route Planning Under Time-Varying Uncertainty
  • 129. Outline • Motivation and challenges • Road network modeling • Stochastic skyline route planning • Empirical studies • Summary
  • 130. • Reduction in greenhouse gas (GHG) emissions is important to the whole society.  EU: reduce GHG emissions by 30% by 2020.  G8: a 50% GHG reduction by 2050.  China: a 17% GHG reduction by 2015. • Eco-routing is a simple yet effective way to achieve the reduction from transportation section.  Of interest to both fleet owners and individual drivers.  For example, FlexDanmark is interested in using most eco-friendly routes while considering travel times and distances. • We aim to provide useful eco-routing services. Motivation
  • 131. Challenges • To provide practically useful eco-routing, we need to contend with three challenges. • Multiple travel costs  Consider not only GHG emissions, but also distances and travel times. • Time dependence  Some travel costs are time-dependent.  For example, travel times on the same road may vary during peak vs. off-peak periods. • Uncertainty  Some travel costs are uncertain.  For example, aggressive and moderate driving on the same road may produce different GHG emissions.
  • 132. A novel road network model - MTUG • MTUG is a Multi-cost, Time-dependent, Uncertain Graph. • Assume N different costs are of interest in total.  Distance (DI), travel time (TT), GHG emissions (GE). • G=(V, E, MM, W)  V is the vertex set, and E is the edge set.  MM = <MM(1), … , MM(N)>  Function MM(i) maps an edge to the minimum and maximum i-th cost of using the edge.  MM(TT) (ea) = (150 seconds, 500 seconds)  MM(GE) (eb) = (10 ml, 85 ml)  W = <W(1), … , W(N)>  Function W(i) maps an edge to a set of (interval, random variable) pairs of the i-th cost type.  W(TT) (ea) = {([0:00, 7:15), N(300, 120)), ([7:15, 8:45), N(450, 100)), …}  W(GE) (eb) = {([0:00, 7:00), N(30, 100)), ([7:00, 9:00), N(50, 80)), …}
  • 133. Instantiation of MM in an MTUG • MM and W are instantiated using GPS records. • GPS records are map matched to edges. • Each edge is associated with a set of edge records in the form of (e, t, C).  An edge record indicates that a traversal on edge e at time t takes costs C, where C is a vector of all costs of interest.  (e1, 8:08, <55 seconds, 80 ml>)  (e1, 9:18, < 45 seconds , 63 ml>)  (e1, 10:10, < 43 seconds , 60 ml>)  (e1, 21:03, < 45 seconds , 62 ml>) • Based on the edge records on an edge, functions MM on the edge can be instantiated.  MM(TT) (e1) = (43 seconds, 55 seconds)  MM(GE) (e1) = (60 ml, 80 ml)
  • 134. Instantiation of W in an MTUG • Partition a day into 96 15-min intervals. • For each (edge, interval) pair, we obtain a multi-set containing the costs on the edge during the interval.  ms={(10 s, 3), (20 s, 10), (25 s, 20), (30 s, 10), (40 s, 5), …} • Estimate a random variable (RV) based on the multi-set.  Use a Gaussian Mixture Model (GMM) to represent an RV.  GMMs can approximate arbitrary distributions.  A GMM is a weighted sum of K Gaussian distributions.  GMM(x) = 𝑚 𝑘 ⁡∙ 𝑁(𝑥|µ 𝑘, δ 𝑘 2 )𝐾 𝑘=1
  • 135. Instantiation of W in an MTUG (cont.) • If two RVs in two adjacent intervals are similar, we combine the two intervals into a long interval.  Use KL-divergence to measure the similarity between two RVs. • Re-estimate a new RV for the long interval using the costs in the long interval. • The whole procedure works iteratively until no RVs from consecutive intervals are similar enough to be combined. • The long intervals along with their RVs instantiate W.
  • 136. Route Costs in MTUG • Given a route Ri=<r1, r2, …, rX>, where ri ∈ E is an edge. • RC(Ri, t) indicates the costs of using route Ri at time t  RC(Ri, t) = <RVDI, RVTT, RVGE> is a vector of RVs, and each RV corresponds to a travel cost. • RVDI is a deterministic value, which equals to the sum of the length of each edge in route Ri. • RVTT is the convolution of the corresponding travel time RV of each edge in route Ri.  Deciding the travel time RV of the first edge r1 is dependent on the trip start time t.  Deciding the travel time RV of the k-th edge rk is dependent on the travel time of the previous k-1 edges, which may be uncertain. • RVGE is the convolution of the corresponding GHG emission RV of each edge in route Ri.
  • 137. Stochastic Dominance • Stochastic Dominance between two RVs  Given two RVs X and Y, if cdfX(a) >= cdfY(a), for all possible value a in R+, we say ―X stochastically dominates Y‖.  RC(R1, t).RVTT stochastically dominates RC(R2, t). RVTT.  No stochastic dominance between RC(R3, t). RVTT and RC(R4, t). RVTT.
  • 138. Stochastic Skyline Routes • Route Ri dominates route Rj iff  No RV of cost(Rj) stochastically dominates the corresponding RV of RC(Ri).  At least one RV of RC(Ri, t) stochastically dominates the corresponding RV of RC(Rj, t). • Stochastic skyline route  Given a source-destination pair, and a trip starting time.  Stochastic skyline routes are the routes that are not dominated by any other routes.
  • 140. Stochastic Skyline Route Planning • A brute force method  Enumerate all possible routes, compute the route costs, and check whether one route dominates another  Very inefficient, and only works for small road networks • An efficient method  Prune some routes that cannot become skyline routes early.  Efficient stochastic dominance checking
  • 141. Early Pruning Strategy • Do the following for all travel cost types of interest.  We use travel time as an example.  We maintain a graph where each edge is associated with the minimum travel time, which is recorded in MM.  From the destination, run Dijkstra’s algorithm on the graph.  As each vertex is associated with the minimum travel time, we get the ―fastest‖ possible route from the source to the destination.  Compute the route cost of the ―fastest‖ possible route and add the route as a candidate Skyline route. vs vd Each vertex vx has the shortest distance least travel time least GHG emissions to the destination vd v1 v2 v3 Candidate skyline route 1 Candidate skyline route 2 Candidate skyline route 3
  • 142. Early Pruning Strategy (cont.) • Explore routes from source, until no more routes can be explored.  Estimate the least possible travel costs for a partially explored route R’.  If the partially explored route with its estimated least possible costs is dominated by an existing candidate skyline route, there is no need to explore the route any further.  Otherwise, continue exploring.  Update candidate skyline route if necessary. vs vd v1 v2 v3 Candidate skyline route 1 Candidate skyline route 2 Candidate skyline route 3R’ Candidate skyline route 4
  • 143. Stochastic Dominance Checking • Naïve way: check according to the definition of stochastic dominance.  For each value a, check whether cdfX(a) >= cdfY(a). • An efficient way  Consider one cost type at a time.  Compute the minimum and maximum possible travel costs of a route. • Distinguish among three cases based on the min and max travel costs of two routes.  Disjoint case: dominance.
  • 144. Stochastic Dominance Checking (cont.) • Covered case: non-dominance. • Overlapping case (needs further checking)  Both dominance and none-dominance may occur.
  • 145. Empirical Settings • GPS Data  More than 180 million GPS records collected from Denmark in 2007 and 2008.  1 Hz sampling rate: one GPS record per second. • Road networks (from OpenStreetMap)  Aalborg (AA): 4,981 vertices and 12,614 edges.  North Jutland (NJ): 68,318 vertices and 163,546 edges.  Jutland (JU): 667,876 vertices and 1,622,974 edges.  AA is the largest city in NJ, and NJ is part of JU. • Travel costs  Distances (DI): obtained from OpenStreetMap.  Travel time (TT): computed from the timestamps in the GPS records.  GHG emissions (GE): computed from the instantaneous speeds and accelerations from the GPS records.
  • 146. MTUG Generation • Hot edges: edges covered by the GPS data set. • Instantiate MM and W based on the GPS data. • Cardinalities of TT and GE intervals.  Largest cardinalities: 65 for TT and 72 for GE.  Average cardinalities: • Cold edges: edges not covered by the GPS data set.  Derive corresponding travel costs using speed limits. The traffic in AA is much more dynamic that that in other parts.
  • 147. Stochastic Skyline Route Queries • Query settings: • 50 random source-destination pairs with a starting time in each setting. • Study the cardinality of the stochastic skyline routes and the run time.
  • 148. Cardinalities of Skyline Routes • As the number of travel cost types considered increases, the cardinality of the skyline routes also increases. • When distance between source and destination increases, the cardinality of the skyline routes also increases.
  • 149. Run Time • The run-time for a query is below 5 seconds, and most queries are computed in less than 2 seconds. • Advanced dominance checking method is effective and efficient.  Benefits of using the advanced dominance checking is more substantial on JU since long routes require more dominance checks.
  • 150. Summary • Described a framework that enables stochastic skyline route planning in road networks with multiple, time- dependent, and uncertain travel costs. • Enables eco-routing in a realistic setting.
  • 152. Motivating Example Fastest route: Route #1 Shopping Mall Food Store 50 70 S E B F GA I H J K D Route #1
  • 153. Motivating Example Shortest route: Route #2 Shopping Mall Food Store 50 70 S E B F GA I H J K D Route#2
  • 154. Motivating Example Route #3 avoids parking lots Shopping Mall Food Store 50 70 S E B F GA I H J K D Route#3
  • 155. Motivating Example Route #4 is attractive when stores are closed Shopping Mall Food Store 50 70 S E B F GA I H J K D Route #4
  • 156. Motivating Example Which route from source S to destination D is preferable? Shopping Mall Food Store 50 70 S E B F GA I H J K D Route #1 Route#3 Route#2 Route #4 The fastest – takes the least travel time? The shortest – smallest distance? The most ―direct‖– fewest turns? The one most constrained to main roads – better road quality? The ―safest,‖ avoiding bad neighborhoods? … The one most preferred by locals?
  • 157. Introduction • Local travel  Knowledge of the surroundings  Follow familiar routes • Travel in unfamiliar surroundings to unknown destinations  Depend on available routing services  Expect that the provided route is the best • Idea: Use GPS data to let those who travel in unfamiliar surroundings benefit from the insights of local travelers
  • 158. Motivation • Can’t we just use a standard routing service? • How often do drivers follow routes recommended by routing a service? V. Čeikutė and C.S. Jensen. Routing service quality—local driver behavior versus routing services. MDM 2013, pp. 97–106 0 500 1000 1500 2000 2500 0.5-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-220 #routes route length (km) Routes 5000 routes
  • 159. Motivation • How often do drivers follow routes recommended by routing a service? • Breakdown by distance traveled 73% 56% 44% 36% 19% 9% 18% 20% 11% 0 20 40 60 80 100 0.5-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-220 onavgmatch(%) route length (km) V. Čeikutė and C.S. Jensen. Routing service quality—local driver behavior versus routing services. MDM 2013, pp. 97–106 • ~60% of routes match the route provided by the routing service; ~40% do not! • Use of previous user’s trips can improve routing quality
  • 160. Goal of the Study • Propose a routing framework that  Utilizes GPS data volunteered by local drivers  Exploits possibly hard-to-formalize insight into local conditions  Takes into account temporal variation in driver behavior  Recommends routes based on popularity and temporal aspects • Evaluate the quality of proposed routes  Study based on trip length and pre-selected drivers  Quality comparison with existing routing service and route recommendation approaches
  • 161. Outline • Introduction and motivation • Framework • Data preparation methodology • Scoring of routes • Empirical study  Data foundation  Routing quality evaluation and comparison • Related work • Conclusion and research directions
  • 162. Framework offline GPS logs map- matching trajectory DB Routing service online trajectory data user’s input data top route user GPS trace • source • destination • time Trips used: • Start or end at, or go through, source and destination locations • Trips that start during the issued time period are preferred When the source and destination are not covered by available GPS data, an existing routing service is used.
  • 163. Data Preparation Methodology • A trip is a sequence of timestamped GPS points between stay points. • Stay point  A location where the user stays for a certain period time (no GPS data is received)  A location reported repeatedly for a certain period of time (the engine remained on) • In practice, trips are consolidated  Short trips are dropped and some consecutive trips are mergedx y t same location
  • 164. Data Preparation Methodology • Trips are map-matched and transformed into sequences of road segments with entry and exit timestamps • Example  Trip 𝑡𝑟1is map-matched to a sequence of road segments: (𝑠1, 𝑡 𝑠1 , 𝑡 𝑒1 ), (𝑠2, 𝑡 𝑠2 , 𝑡 𝑒2 ), (𝑠3, 𝑡 𝑠3 , 𝑡 𝑒3 ) 𝑡𝑟3 𝑡𝑟2 𝑡𝑟1 𝑠1 𝑠2 𝑠3
  • 165. Data Preparation Methodology The similarity of two trips is determined by applying Longest Common Subsequence (LCSS) to the trajectories represented as sequences of road segments. s1 s2 s3 s4 s1 s2 s3 s4 s11 s9 s10 s10s9 s5 s6 s7 s8 Similar trips belong to the same route. 𝑆1 𝑝𝑙𝑠, 𝑝𝑙𝑠′ = 𝑙𝑒𝑛𝑔𝑡ℎ(𝐿𝐶𝑆𝑆(𝑝𝑙𝑠,𝑝𝑙𝑠′)) 𝑙𝑒𝑛𝑔𝑡ℎ(𝑝𝑙𝑠) ∙ 100% 𝐶1 𝑝𝑙𝑠, 𝑝𝑙𝑠′ = 𝑡𝑟𝑢𝑒⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡𝑖𝑓⁡𝑆1 𝑝𝑙𝑠, 𝑝𝑙𝑠′ ≥ 𝑔𝑒𝑡𝑀𝑎𝑡𝑐𝑕(𝑝𝑙𝑠, 𝑝𝑙𝑠′)⁡ 𝑓𝑎𝑙𝑠𝑒⁡⁡⁡⁡⁡⁡⁡𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡ Similarity of two trajectories The similarity threshold depends on the trip length. Longer trips, higher similarity thresholds, interval [90-100]. Do trips follow the same route?
  • 166. Data Preparation Methodology • Trips that follow the same sequence of road segments are grouped into route usage objects. user route # traversals 𝑢1 𝑟1: 𝑠 𝐵𝐶, 𝑠 𝐶𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐺 - 10off 𝑢2 𝑟2: 𝑠 𝐶𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐺 3peak 5off 𝑢3 𝑟3: 𝑠 𝐸𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐼, 𝑠𝐼𝐺 7peak 4off 𝑢4 𝑟2: 𝑠 𝐶𝐷, 𝑠 𝐷𝐹, 𝑠 𝐹𝐺 5peak 6off A B 𝑟1 𝑟2 𝑟3 E D F I G C • Route 𝑟2 is taken by users 𝑢2 and 𝑢4 3 and 5 times during peak hours and 5 and 6 times during off-peak hours. (𝑟2, 𝑝𝑙, 𝑝𝑒𝑎𝑘, 𝑢2, 3 , 𝑢4, 5 , 𝑜𝑓𝑓, 𝑢2, 5 , 𝑢4, 6 )
  • 167. Scoring of Routes Preferred routes • Popular among drivers • Taken by many distinct drivers • Popular on the time of the day and day of the week of the query peak hours off-peak hours destination source B A
  • 168. Final route score: 𝑠𝑐𝑜𝑟𝑒 𝑟 = 𝛽 ∙ 𝑝𝑟𝑒𝑓 𝑀 𝑟 + (1 − 𝛽) ∙ 𝑝𝑟𝑒𝑓 𝑁(𝑟) Scoring of Routes Route preference value: 𝑝𝑟𝑒𝑓 𝑟 = 𝛼 ∙ 𝑢𝑠𝑒𝑟𝑠 𝑟 + (1 − 𝛼) ∙ 𝑡𝑟𝑎𝑣𝑒𝑟𝑠𝑎𝑙𝑠(𝑟) # distinct drivers taking the route # of traversals of the route Considers trips taken during the query temporal pattern Considers trips taken during other temporal patterns
  • 169. Empirical Study: Data • Monitoring period: 2 years • Number of drivers: 285 • Number of GPS points (raw data): ~182,700,000 • Number of trips: ~275,000 • ―Pay as You Speed‖ project (http://www.sparpaafarten.dk) 0 20000 40000 60000 80000 100000 120000 140000 2-10 10-20 20-30 30-40 40-50 50-350 #traversals traversal length (km) Afternoon peak Morning peak Off-peak
  • 170. Routing Quality Evaluation: Data For this study, we randomly selected equal amounts of trips for different trip length intervals 0 100 200 300 400 500 600 700 2-10 10-20 20-350 #traversals traversal length (km) Off-peak Morning peak Afternoon peak
  • 171. Routing Quality: Match A match is identified using LCSS Trajectory DBtest Routing Service source and destination Compare route with trajectory preferred route trajectory Total number of matches match! C A B Trajectory DB
  • 172. Routing Quality Evaluation: Results Our Proposal Google Directions API (top-1) TPMFP (SIGMOD’13) 0 20 40 60 80 100 2-10 10-20 20-350 % length (km) Unmatched Matched 0 20 40 60 80 100 2-10 10-20 20-350 % length (km) Unmatched Matched 0 20 40 60 80 100 2-10 10-20 20-350 % length (km) Empty Unmatched Matched
  • 173. Routing Quality Evaluation: Data For this study, we considered the five drivers with the most trips. 0 500 1000 1500 2000 2500 3000 3500 1 2 3 4 5 #traversals driver 2-10 km 10-30 km 30-300 km 0 1000 2000 3000 4000 5000 1 2 3 4 5#traversals driver Off-peak Afternoon peak Morning peak Traversals per Length Traversals per Temporal Pattern
  • 174. Routing Quality Evaluation: Results Our proposal 0% 20% 40% 60% 80% 100% 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 source and dest. source destination subroutes Different Types of Route Calculation Matched Unmatched Empty 0 20 40 60 80 100 1 2 3 4 5 % driver unmatched matched Google Directions API (top-1)
  • 175. Related Work [1] K.-P. Chang, L.-Y. Wei, M.-Y. Yeh, and W.-C. Peng. Discovering personalized routes from trajectories. LBSN 2011, pp. 33–40 [2] Z. Chen, H. T. Shen, and X. Zhou. Discovering popular routes from trajectories. ICDE 2011, pp. 900–911 [3] W. Lou. H. Tan, L. Chen, and L.M. Ni. Finding time period-based most frequent path in big trajectory data. In SIGMOD 2013, pp. 713–724 [4] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, and Y. Huang. T-Drive: Driving directions based on taxi trajectories. In GIS 2010, pp. 99–108 • Four existing routing techniques use drivers’ trajectories [1],[2],[3],[4]  The road network is formed from the road segments that are covered by the trajectory data set  A route is formed by  Prioritizing parts of roads that are followed the most by a specific driver (personalized routes) [1]  Prioritizing parts of roads that are taken by other drivers [2],[4]  Possibly using sub-routes from multiple routes [3]  Trajectories used for scoring must contain the destination and must start and end during the provided time interval. [3]  Suggested routes are formed from the most popular routes or route parts in the available data set.
  • 176. Conclusion and Research Directions Conclusions • The proposed framework utilizes trajectory data collected from local drivers for routing. • A preferred route is selected using a flexible scoring function that considers  The number of traversals of the route  The number of distinct drivers taking the route  The time periods when the traversals occurred • Use of travel histories of local drivers can increase routing quality • More details in the paper! Research directions • Additional aspects of the framework can be considered  Efficiency of route identification process (LCSS technique)  Inclusion of personalized routes  Better support for routes that are constructed from sub-routes
  • 178. Demos and Prototype Systems • EcoTour: http://daisy.aau.dk/its/  Computes and compares the shortest, the fastest, and the most eco-friendly routes for arbitrary source-destination pairs in DK.  Best demo award at IEEE MDM 2013. • EcoSky: http://daisy.aau.dk/its/eco/  Supports skyline eco-routing and personalized eco-routing • Sheafs: http://daisy.aau.dk/its/sheaf  Trajectory based traffic sheafs • Strict-Path Queries: http://daisy.aau.dk/its/spqdemo  Trajectory based, Strict-Path Queries  Trips (historical travel-time), route choice, Napoleon (road usage) • iPark: identifying parking spaces from GPS trajectories  On-street parking lanes vs. parking zones
  • 180. The Future • Much more data  Inductive loop detectors  Bus data  Rejsekortet • Much more connected vehicles • New services  Routing  Safety and warnings  Parking, fees, insurance, road pricing  Car sharing, multi-modality • Driver-less vehicles
  • 181. Challenges • Detect ―black spots‖ before they occur. • Modeling spatio-temporal congestion from data • Characterize the effects of events  Accidents, malfunctioning of traffic signals, rain, a concert • Real-time traffic management  In response to current or predicted situation, actuate traffic signals and drivers (via their smartphones or navigation devices) to optimize the use of the infrastructure and driver experience • Automated trade-off between weight level of detail and available data. • Stochastic routing at 20 milliseconds. • Integrate with ―point‖ data.
  • 182. Acknowledgments • Colleagues at Aalborg University, Aarhus University, and beyond. • The EU FP7 project, Reduction: http://www.reduction- project.eu/ • The Obel Family Foundation: http://www.obel.com/en
  • 183. Readings • V. Ceikute, C. S. Jensen: Vehicle Routing with User-Generated Trajectory Data. MDM (1) 2015 • C. Guo, B. Yang, O. Andersen, C. S. Jensen, K. Torp: EcoMark 2.0: empowering eco-routing with vehicular environmental models and actual vehicle fuel consumption data. GeoInformatica 19(3):567-599 (2015) • C. Guo, Bin Y., O. Andersen, C. S. Jensen, K. Torp: EcoSky: Reducing vehicular environmental impact through eco-routing. ICDE 2015:1412-1415 • B. Yang, C. Guo, Y. Ma, C. S. Jensen: Toward personalized, context- aware routing. VLDB J. 24(2):297-318 (2015) • Y. Ma, B. Yang, C. S. Jensen: Enabling Time-Dependent Uncertain Eco-Weights For Road Networks. GeoRich@SIGMOD 2014:1:1-1:6 • B. Yang, C. Guo, C. S. Jensen, M. Kaul, S. Shang: Stochastic skyline route planning under time-varying uncertainty. ICDE 2014:136-147 • B.- Yang, M. Kaul, C. S. Jensen: Using Incomplete Information for Complete Weight Annotation of Road Networks. IEEE TKDE 26(5):1267-1279 (2014) • C. Guo, C. S. Jensen, B. Yang: Towards Total Traffic Awareness. SIGMOD Record 43(3):18-23 (2014)
  • 184. Readings • V. Ceikute, C. S. Jensen: Routing Service Quality - Local Driver Behavior Versus Routing Services. MDM (1) 2013: 97-106 • M. Kaul, B. Yang, C. S. Jensen: Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems. MDM 2013: 137-146, best paper award. • Ove Andersen, Christian S. Jensen, Kristian Torp, Bin Yang: EcoTour: Reducing the Environmental Footprint of Vehicles Using Eco-routes. MDM 2013: 338-340, Demo paper, best demo award. • B. Yang, C. Guo, C. S. Jensen: Travel Cost Inference from Sparse, Spatio- Temporally Correlated Time Series Using Markov Models. PVLDB 6(9): 769-780 (2013) • C. Guo, Y. Ma, B. Yang, C. S. Jensen, Manohar Kaul: EcoMark: evaluating models of vehicular environmental impact. SIGSPATIAL/GIS 2012: 269-278 • L. Speicys, C. S. Jensen: Enabling Location-based Services - Multi-Graph Representation of Transportation Networks. GeoInformatica 12(2): 219-253 (2008) • A. Brilingaite, C. S. Jensen: Enabling Routes of Road Network Constrained Movements as Mobile Service Context. GeoInformatica 11(1): 55-102 (2007) • C. Hage, C. S. Jensen, T. B. Pedersen, L. Speicys, I.Timko: Integrated Data Management for Mobile Services in the Real World. VLDB 2003: 1019-1030