SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Scalable Algorithms for Nearest-Neighbor
Joins on Big Trajectory Data – Fang, Cheng,
Tang, Maniu, Yang (2016)
presented by Alex Klibisz
University of Tennessee
aklibisz@gmail.com
November 17, 2016
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Contents
1 Introduction
Trajectory Joins Introduction
Motivation
MapReduce Introduction
Problem Statement
Trajectory Operations
2 Sub-optimal Solutions
3 Solution: kNN Join
Pre-processing Phase
Querying Phase
Extension: kNN Load Balancing
Extension: hkNN Join
4 Results
Evaluation Setup
kNN Results
hkNN Results Summary
5 Conclusion
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Trajectory Joins Vocabulary
• Trajectory: series of locations that depicts movement of
an entity over time.
• Trajectory Object: snapshot of time and location; many
trajectory objects in a single trajectory.
• Trajectory Join: given two sets M and R of trajectories,
join(M, R) returns trajectory objects from M and R within
some proximity of space and time.
• Joining Criterion: criteria by which objects in M and R are
joined. This paper uses the k-nearest-neighbors algorithm
to join objects.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Example Use Case
• Hubble space telescope generates 140GB/week about
movements of stars and asteroids. Analysis of proximity
among trajectory objects helps to uncover behavior of
outer-space objects, discover meteors, etc. We can use
trajectory joins to find objects in some proximity to one
another.
• Given two groups A and B of asteroids, return the
identities of asteroids from B that have been close to
those in A.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
MapReduce Basics
• Divide-and-conquer ”big data” on share-nothing clusters.
• Master node partitions data and assigns it to map nodes.
• Map performs analysis on local data.
• Shuffle step redistributes data after the map step.
• Reduce performs a summary operation over data from the
the Map step.
• MapReduce software handles the data partitioning,
execution over distributed nodes, error recovery.
1
1
https://goo.gl/0nbYhp
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Problem Statement
kNN Join
Find the K nearest neighbors from set R for objects in M over
time interval [ts, te] ⊆ [Ts, Te].
(h,k)NN Join
Find a list of h objects from M over time interval
[ts, te] ⊆ [Ts, Te] that minimize function f . Then return the k
nearest neighbors for each of the h objects.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
kNN Example
Figure illustrates a kNN Join. An (h,k)NN join with h = 1, k = 2
might use f (m1) = max{d1, d2} = d2 to return the k nearest
neighbors of d2 = {r1, r2}.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Some Fundamental Operations
• Min/max distance from point to line-segment.
• Min/max distance from point to trajectory.
• Min/max distance from trajectory to trajectory.
• kNN from trajectory object to trajectory objects.
2
2
Formulas omitted for brevity, available in section 3.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Sub-optimal Solutions
Single Machine Brute Force (BF)
Nested loop to compute euclidean distance between every pair
of points in M and R. Worst-case O(|M||N|l) for l points in
trajectory of interest tr.
Single Machine Sweep Line (SL)
Pre-sort the data based on time and compute only distances for
overlapping trajectories. Also worst-case O(|M||N|l).
Naive MapReduce
Map divides objects in M and R randomly into disjoint subsets.
Reduce joins all pairs of subsets to compute distance. A second
MapReduce job selects the k nearest neighbors.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Overview of kNN Join
Each of the steps is composed of its own MapReduce algorithm for a
total of 6 algorithms.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Overview of kNN Join
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Pre-processing Phase
Algorithm 1
1 Input: non-partitioned trajectories.
2 Map splits trajectories in sets M and R into T temporal
partitions. O(l + T) where l is the size of a trajectory.
3 Reduce splits each temporal partition into N spatial
partitions. O((|M| + |R|)(l + N))
4 Output: trajectories partitioned by time and space.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Sub-Trajectory Extraction
• An anchor trajectory must span an entire time partition.
• TrL
i is object i in trajectory r in set L in time partiton T.
Algorithm 2
1 Input: trajectories partitioned by time and space.
2 Map retrieves all sub-trajectories in [ts, te]3. Ot(log(l)),
Os(l)
3 Reduce finds anchor trajectories that will be used in next
step. Ot(|TrL
i |2l), Os(|TrL
i |l).
4 Output: anchor trajectories
3
the queried time window
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Anchor Trajectories
• An anchor trajectory must span an entire time partition ts
to te.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Computing Time-dependent Bound (TDB)
• The TDB is a circle c(t) that bounds the k nearest
neighbors of a set S of objects at time t.
• The TDB for a set S of objects can change over time.
Algorithm 4, containing Algorithm 3
1 Input: anchor trajectories
2 Map computes the maximum distance from each anchor
trajectory to each central point pi in each temporal
partition T. Ot(N · l), Os(l)
3 Reduce computes the TDB of TrM
i based on the maximum
distances. Ot(|R|log|R|), Os(|R|) for the set of objects R.
4 Output: Time-dependent Bounds
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Time-dependent Bounds
• The TDB is a circle c(t) that bounds the k nearest
neighbors of a set S of objects at time t.
• The TDB for a set S of objects can change over time.
White dots are objects from M. Black dots are objects from R. c(t)
needs a small circle to encompass k = 2 points. c(t ) needs a bigger
circle to encompass k = 2 points.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Finding Candidate Trajectories
Algorithm 5
1 Input: partition of trajectories TrR
j .
2 Map classifies each partition of trajectories TrR
j as having
no candidates, all candidates, or some candidates.
Ot(|Tr|Nl), Os(|Tr|l).
3 Reduce gathers the candidates for a join into CR
i . Ot(1),
Os(|CR
i |l).
4 Output: a set of candidate trajectories CR
i .
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Candidate Trajectories
Finding candidates for TrR
j (red). Case 1 have no overlap. Case 2
have complete overlap. Case 3 have partial overlap.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Trajectory Join
Algorithm 6
1 Input: candidate trajectories
2 Map joins each partition TrM
i with corresponding
candidates CR
i using a single machine. O(|Tr||CR
i |l).
3 Reduce sorts each object’s neighbors and leaves only the k
nearest. O(kN).
4 Output: each queried object with its k nearest neighbors.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Extension: kNN Load Balancing
1 Hash the trajectory objects by an ID to distribute them
more uniformly among compute nodes.
2 Requires modification in the sub-trajectory extraction,
finding candidates, and trajectory join.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Extension: hkNN Join
1 Review: finds the h objects from M that minimize some
function f and returns each of their k nearest neighbors.
2 Forced to compute a smaller TDB.
3 Smaller query result hxk size. kNN query was |M|xk.
4 Time and space complexities remain the same.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Evaluation Setup
• 2 Synthetic and 2 real datasets.
• Non-trivial size, up to 1.2B observations and 17.2GB.
• Hadoop cluster with 60 slave nodes, multi-core 3.40GHz
and 16GB memory per node.
• Using Sweep Line (SL) for single-node parts.
• Measuring query execution time and MapReduce shuffling
cost.4
• k = 10, N = 400 constant for all datasets. T and tq
varied.
4
The amount of data sent from mappers to reducers.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Effect of T (number of temporal partitions)
As T grows the time decreases until it hits an inflection point. This
happens to be similar for both datasets. We are still spending the
most time on single-node SL.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
kNN Results Summary
• Increasing N (number of temporal patitions) improves
performance to a point of inflection. This point is different
for the two datasets. Fig. 15.
• Balanced Sweep-Line (BL-SL) is the more efficient
single-node algorithm. Fig. 16.5
• Adding slave nodes improves performance. Rate of change
is slow, likely due to I/O overhead. Fig. 17.
• As k increases the running time and shuffle cost increase.
TDB makes a difference. Fig. 18.
• Increases in tq show a near-linear increase in running time
and shuffling cost. TDB and load balancing make a
difference. Fig. 19.
• Time increases linearly with dataset size. Sharper increase
in shuffling cost than time. Fig. 20.
5
I think they mixed up the figure labels.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
hkNN Results Summary
• Time is constant as h grows (probably because k is
constant).
• (h,k)NN is 2x faster than kNN methods.
• Load-balanced is faster than non-load-balanced.
Scalable kNN
Joins, Fang
presented by
Alex Klibisz
Introduction
Trajectory Joins
Introduction
Motivation
MapReduce
Introduction
Problem
Statement
Trajectory
Operations
Sub-optimal
Solutions
Solution: kNN
Join
Pre-processing
Phase
Querying Phase
Extension: kNN
Load Balancing
Extension:
hkNN Join
Results
Evaluation Setup
kNN Results
hkNN Results
Summary
Conclusion
Conclusion
Contributions
1 Leverage share-nothing MapReduce structure for kNN
joins, which typically rely on shared indices.
2 Introduce the TDB and load-balancing methods, which
yield tangible improvements.
Questions
1 Most of the time is still spent on the single-node
computation. What is the theoretical bound for
improvement via parallelization?
2 How much time does the partitioning step take?
3 The partitioning step probably has to be re-run when new
data arrives. Does this prevent a real-time
implementation?
4 Any benefit to localize data instead of using HDFS?

Weitere ähnliche Inhalte

Was ist angesagt?

Parallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringParallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringAndreina Uzcategui
 
A Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS AlgorithmA Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS AlgorithmTELKOMNIKA JOURNAL
 
TurnerBottoneStanekNIPS2013
TurnerBottoneStanekNIPS2013TurnerBottoneStanekNIPS2013
TurnerBottoneStanekNIPS2013Clay Stanek
 
4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...
4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...
4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...GISRUK conference
 
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaCFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaMario Felipe Campuzano Ochoa
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMIJCSEA Journal
 
Ravasi_etal_EAGE2015b
Ravasi_etal_EAGE2015bRavasi_etal_EAGE2015b
Ravasi_etal_EAGE2015bMatteo Ravasi
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsIRJET Journal
 
An improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioningAn improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioningIJRES Journal
 
Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...
Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...
Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...theijes
 

Was ist angesagt? (20)

Parallel-kmeans
Parallel-kmeansParallel-kmeans
Parallel-kmeans
 
Parallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringParallel Algorithms K – means Clustering
Parallel Algorithms K – means Clustering
 
Isav2012 draft1final (1)
Isav2012 draft1final (1)Isav2012 draft1final (1)
Isav2012 draft1final (1)
 
A Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS AlgorithmA Load-Balanced Parallelization of AKS Algorithm
A Load-Balanced Parallelization of AKS Algorithm
 
Masters Thesis
Masters ThesisMasters Thesis
Masters Thesis
 
Koptreport
KoptreportKoptreport
Koptreport
 
TurnerBottoneStanekNIPS2013
TurnerBottoneStanekNIPS2013TurnerBottoneStanekNIPS2013
TurnerBottoneStanekNIPS2013
 
4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...
4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...
4A_ 3_Parallel k-means clustering using gp_us for the geocomputation of real-...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaCFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
 
Slideshare
SlideshareSlideshare
Slideshare
 
Presentation
PresentationPresentation
Presentation
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
 
V8N3-6.PDF
V8N3-6.PDFV8N3-6.PDF
V8N3-6.PDF
 
Ravasi_etal_EAGE2015b
Ravasi_etal_EAGE2015bRavasi_etal_EAGE2015b
Ravasi_etal_EAGE2015b
 
Kalman Filter
Kalman FilterKalman Filter
Kalman Filter
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
An improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioningAn improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioning
 
Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...
Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...
Mathematical Calculation toFindtheBest Chamber andDetector Radii Used for Mea...
 

Ähnlich wie Research Summary: Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data

Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...Dario Panada
 
Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...
Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...
Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...Chien-Chun Ni
 
1 chayes
1 chayes1 chayes
1 chayesYandex
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor ApproachKumud Arora
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusAdvanced-Concepts-Team
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
Kmr slides
Kmr slidesKmr slides
Kmr slidesMeena124
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
ME 644 Trebuchet Parametric Design Optimization
ME 644 Trebuchet Parametric Design OptimizationME 644 Trebuchet Parametric Design Optimization
ME 644 Trebuchet Parametric Design OptimizationBenjamin Johnson
 
study Latent Doodle Space
study Latent Doodle Spacestudy Latent Doodle Space
study Latent Doodle SpaceChiamin Hsu
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 
Presentation for Numerical Field Theory
Presentation for Numerical Field TheoryPresentation for Numerical Field Theory
Presentation for Numerical Field TheoryIndraneel Pole
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...Cemal Ardil
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Design of Machine Tool Gear BOx
Design of Machine Tool Gear BOxDesign of Machine Tool Gear BOx
Design of Machine Tool Gear BOxKailash Bhosale
 

Ähnlich wie Research Summary: Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data (20)

Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
 
Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...
Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...
Capacitated Kinetic Clustering in Mobile Networks by Optimal Transportation T...
 
1 chayes
1 chayes1 chayes
1 chayes
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel Kus
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
Kmr slides
Kmr slidesKmr slides
Kmr slides
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
ME 644 Trebuchet Parametric Design Optimization
ME 644 Trebuchet Parametric Design OptimizationME 644 Trebuchet Parametric Design Optimization
ME 644 Trebuchet Parametric Design Optimization
 
study Latent Doodle Space
study Latent Doodle Spacestudy Latent Doodle Space
study Latent Doodle Space
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Presentation for Numerical Field Theory
Presentation for Numerical Field TheoryPresentation for Numerical Field Theory
Presentation for Numerical Field Theory
 
PSOCTSR-1.ppt
PSOCTSR-1.pptPSOCTSR-1.ppt
PSOCTSR-1.ppt
 
PSOCTSR-1.ppt
PSOCTSR-1.pptPSOCTSR-1.ppt
PSOCTSR-1.ppt
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Design of Machine Tool Gear BOx
Design of Machine Tool Gear BOxDesign of Machine Tool Gear BOx
Design of Machine Tool Gear BOx
 

Mehr von Alex Klibisz

Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)Alex Klibisz
 
Research Summary: Efficiently Estimating Statistics of Points of Interest on ...
Research Summary: Efficiently Estimating Statistics of Points of Interest on ...Research Summary: Efficiently Estimating Statistics of Points of Interest on ...
Research Summary: Efficiently Estimating Statistics of Points of Interest on ...Alex Klibisz
 
Exploring Serverless Architectures: AWS Lambda
Exploring Serverless Architectures: AWS LambdaExploring Serverless Architectures: AWS Lambda
Exploring Serverless Architectures: AWS LambdaAlex Klibisz
 
React, Flux, and Realtime RSVPs
React, Flux, and Realtime RSVPsReact, Flux, and Realtime RSVPs
React, Flux, and Realtime RSVPsAlex Klibisz
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberAlex Klibisz
 
Research Summary: Unsupervised Prediction of Citation Influences, Dietz
Research Summary: Unsupervised Prediction of Citation Influences, DietzResearch Summary: Unsupervised Prediction of Citation Influences, Dietz
Research Summary: Unsupervised Prediction of Citation Influences, DietzAlex Klibisz
 

Mehr von Alex Klibisz (6)

Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)
 
Research Summary: Efficiently Estimating Statistics of Points of Interest on ...
Research Summary: Efficiently Estimating Statistics of Points of Interest on ...Research Summary: Efficiently Estimating Statistics of Points of Interest on ...
Research Summary: Efficiently Estimating Statistics of Points of Interest on ...
 
Exploring Serverless Architectures: AWS Lambda
Exploring Serverless Architectures: AWS LambdaExploring Serverless Architectures: AWS Lambda
Exploring Serverless Architectures: AWS Lambda
 
React, Flux, and Realtime RSVPs
React, Flux, and Realtime RSVPsReact, Flux, and Realtime RSVPs
React, Flux, and Realtime RSVPs
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
Research Summary: Unsupervised Prediction of Citation Influences, Dietz
Research Summary: Unsupervised Prediction of Citation Influences, DietzResearch Summary: Unsupervised Prediction of Citation Influences, Dietz
Research Summary: Unsupervised Prediction of Citation Influences, Dietz
 

Kürzlich hochgeladen

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Research Summary: Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data

  • 1. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data – Fang, Cheng, Tang, Maniu, Yang (2016) presented by Alex Klibisz University of Tennessee aklibisz@gmail.com November 17, 2016
  • 2. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Contents 1 Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations 2 Sub-optimal Solutions 3 Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join 4 Results Evaluation Setup kNN Results hkNN Results Summary 5 Conclusion
  • 3. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Trajectory Joins Vocabulary • Trajectory: series of locations that depicts movement of an entity over time. • Trajectory Object: snapshot of time and location; many trajectory objects in a single trajectory. • Trajectory Join: given two sets M and R of trajectories, join(M, R) returns trajectory objects from M and R within some proximity of space and time. • Joining Criterion: criteria by which objects in M and R are joined. This paper uses the k-nearest-neighbors algorithm to join objects.
  • 4. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Example Use Case • Hubble space telescope generates 140GB/week about movements of stars and asteroids. Analysis of proximity among trajectory objects helps to uncover behavior of outer-space objects, discover meteors, etc. We can use trajectory joins to find objects in some proximity to one another. • Given two groups A and B of asteroids, return the identities of asteroids from B that have been close to those in A.
  • 5. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion MapReduce Basics • Divide-and-conquer ”big data” on share-nothing clusters. • Master node partitions data and assigns it to map nodes. • Map performs analysis on local data. • Shuffle step redistributes data after the map step. • Reduce performs a summary operation over data from the the Map step. • MapReduce software handles the data partitioning, execution over distributed nodes, error recovery. 1 1 https://goo.gl/0nbYhp
  • 6. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Problem Statement kNN Join Find the K nearest neighbors from set R for objects in M over time interval [ts, te] ⊆ [Ts, Te]. (h,k)NN Join Find a list of h objects from M over time interval [ts, te] ⊆ [Ts, Te] that minimize function f . Then return the k nearest neighbors for each of the h objects.
  • 7. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion kNN Example Figure illustrates a kNN Join. An (h,k)NN join with h = 1, k = 2 might use f (m1) = max{d1, d2} = d2 to return the k nearest neighbors of d2 = {r1, r2}.
  • 8. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Some Fundamental Operations • Min/max distance from point to line-segment. • Min/max distance from point to trajectory. • Min/max distance from trajectory to trajectory. • kNN from trajectory object to trajectory objects. 2 2 Formulas omitted for brevity, available in section 3.
  • 9. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Sub-optimal Solutions Single Machine Brute Force (BF) Nested loop to compute euclidean distance between every pair of points in M and R. Worst-case O(|M||N|l) for l points in trajectory of interest tr. Single Machine Sweep Line (SL) Pre-sort the data based on time and compute only distances for overlapping trajectories. Also worst-case O(|M||N|l). Naive MapReduce Map divides objects in M and R randomly into disjoint subsets. Reduce joins all pairs of subsets to compute distance. A second MapReduce job selects the k nearest neighbors.
  • 10. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Overview of kNN Join Each of the steps is composed of its own MapReduce algorithm for a total of 6 algorithms.
  • 11. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Overview of kNN Join
  • 12. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Pre-processing Phase Algorithm 1 1 Input: non-partitioned trajectories. 2 Map splits trajectories in sets M and R into T temporal partitions. O(l + T) where l is the size of a trajectory. 3 Reduce splits each temporal partition into N spatial partitions. O((|M| + |R|)(l + N)) 4 Output: trajectories partitioned by time and space.
  • 13. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Sub-Trajectory Extraction • An anchor trajectory must span an entire time partition. • TrL i is object i in trajectory r in set L in time partiton T. Algorithm 2 1 Input: trajectories partitioned by time and space. 2 Map retrieves all sub-trajectories in [ts, te]3. Ot(log(l)), Os(l) 3 Reduce finds anchor trajectories that will be used in next step. Ot(|TrL i |2l), Os(|TrL i |l). 4 Output: anchor trajectories 3 the queried time window
  • 14. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Anchor Trajectories • An anchor trajectory must span an entire time partition ts to te.
  • 15. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Computing Time-dependent Bound (TDB) • The TDB is a circle c(t) that bounds the k nearest neighbors of a set S of objects at time t. • The TDB for a set S of objects can change over time. Algorithm 4, containing Algorithm 3 1 Input: anchor trajectories 2 Map computes the maximum distance from each anchor trajectory to each central point pi in each temporal partition T. Ot(N · l), Os(l) 3 Reduce computes the TDB of TrM i based on the maximum distances. Ot(|R|log|R|), Os(|R|) for the set of objects R. 4 Output: Time-dependent Bounds
  • 16. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Time-dependent Bounds • The TDB is a circle c(t) that bounds the k nearest neighbors of a set S of objects at time t. • The TDB for a set S of objects can change over time. White dots are objects from M. Black dots are objects from R. c(t) needs a small circle to encompass k = 2 points. c(t ) needs a bigger circle to encompass k = 2 points.
  • 17. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Finding Candidate Trajectories Algorithm 5 1 Input: partition of trajectories TrR j . 2 Map classifies each partition of trajectories TrR j as having no candidates, all candidates, or some candidates. Ot(|Tr|Nl), Os(|Tr|l). 3 Reduce gathers the candidates for a join into CR i . Ot(1), Os(|CR i |l). 4 Output: a set of candidate trajectories CR i .
  • 18. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Candidate Trajectories Finding candidates for TrR j (red). Case 1 have no overlap. Case 2 have complete overlap. Case 3 have partial overlap.
  • 19. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Trajectory Join Algorithm 6 1 Input: candidate trajectories 2 Map joins each partition TrM i with corresponding candidates CR i using a single machine. O(|Tr||CR i |l). 3 Reduce sorts each object’s neighbors and leaves only the k nearest. O(kN). 4 Output: each queried object with its k nearest neighbors.
  • 20. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Extension: kNN Load Balancing 1 Hash the trajectory objects by an ID to distribute them more uniformly among compute nodes. 2 Requires modification in the sub-trajectory extraction, finding candidates, and trajectory join.
  • 21. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Extension: hkNN Join 1 Review: finds the h objects from M that minimize some function f and returns each of their k nearest neighbors. 2 Forced to compute a smaller TDB. 3 Smaller query result hxk size. kNN query was |M|xk. 4 Time and space complexities remain the same.
  • 22. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Evaluation Setup • 2 Synthetic and 2 real datasets. • Non-trivial size, up to 1.2B observations and 17.2GB. • Hadoop cluster with 60 slave nodes, multi-core 3.40GHz and 16GB memory per node. • Using Sweep Line (SL) for single-node parts. • Measuring query execution time and MapReduce shuffling cost.4 • k = 10, N = 400 constant for all datasets. T and tq varied. 4 The amount of data sent from mappers to reducers.
  • 23. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Effect of T (number of temporal partitions) As T grows the time decreases until it hits an inflection point. This happens to be similar for both datasets. We are still spending the most time on single-node SL.
  • 24. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion kNN Results Summary • Increasing N (number of temporal patitions) improves performance to a point of inflection. This point is different for the two datasets. Fig. 15. • Balanced Sweep-Line (BL-SL) is the more efficient single-node algorithm. Fig. 16.5 • Adding slave nodes improves performance. Rate of change is slow, likely due to I/O overhead. Fig. 17. • As k increases the running time and shuffle cost increase. TDB makes a difference. Fig. 18. • Increases in tq show a near-linear increase in running time and shuffling cost. TDB and load balancing make a difference. Fig. 19. • Time increases linearly with dataset size. Sharper increase in shuffling cost than time. Fig. 20. 5 I think they mixed up the figure labels.
  • 25. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion hkNN Results Summary • Time is constant as h grows (probably because k is constant). • (h,k)NN is 2x faster than kNN methods. • Load-balanced is faster than non-load-balanced.
  • 26. Scalable kNN Joins, Fang presented by Alex Klibisz Introduction Trajectory Joins Introduction Motivation MapReduce Introduction Problem Statement Trajectory Operations Sub-optimal Solutions Solution: kNN Join Pre-processing Phase Querying Phase Extension: kNN Load Balancing Extension: hkNN Join Results Evaluation Setup kNN Results hkNN Results Summary Conclusion Conclusion Contributions 1 Leverage share-nothing MapReduce structure for kNN joins, which typically rely on shared indices. 2 Introduce the TDB and load-balancing methods, which yield tangible improvements. Questions 1 Most of the time is still spent on the single-node computation. What is the theoretical bound for improvement via parallelization? 2 How much time does the partitioning step take? 3 The partitioning step probably has to be re-run when new data arrives. Does this prevent a real-time implementation? 4 Any benefit to localize data instead of using HDFS?