SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
Geospatial Analytics and
Spatial Capabilities on Big
Data Systems
By: Ahmed Jawad (PhD)
Agenda
2
• Why Analyze Telematics
• Analysis of Movement Data
• Analytical Assets for Telematics
• Operational Requirements on Telematics
• Data flow on big data platforms
• Analytical Challenges and Applications Solved through
Machine Learning
• Snap to road
• Unifying trajectories to patterns of movements and routines
• Traffic event detection
Why Analyze Telematics
• We are being recorded everywhere
• Provides great insights into the customer routines and
movement
• Key players competing in the market
3
Analyzing Movement Data
Trajectory
4
Object in motion (time –
space)
Coordinate based
recording
Raw trajectories
Symbolic trajectories
Discretization
Streets, locations, or
events
Traditional Operational Requirements In The World Of
Geographic Information Systems (GIS)
• Traditional use cases : cartography, geo-algebra (display of statistical
events, hotspots, co-locations on the map)
• Databases used : postgres, sql server
• Mostly static data sources
• Relatively small data sets
• Moderate geometric accuracy
• Offline processing acceptable
• Complex geometric datatypes support
Operational Requirements and
Design Considerations for
Telematics
• Realtime ingestion and analytics on sensor data, distance queries,
snap-to-road
• 100 TBs/ Petabyte scale of the data
• High variation in geospatial queries (range queries, etc..) and
throughtput of CRUD operations: insertion/deletion/read
• Processing flow and map applications, nature of the relationships in the
data implicating storage technology. Indexing techniques and
implications.
Telematics and Geospatial Data
Types
• Spatial data structures:
• Raster: geographically-referenced matrix of uniform size
• Vector: features on the earth’s surface are represented as
geographically-referenced vector objects
• Hierarchical nature of objects
• Points: different types : Entity, label, area, node
• Lines: lines, polylines, arc, link, etc.
• Polygons: area, polygon, complex polygon
• Requirements: The ability to manipulate Geospatial Data.
• Databases and libraries required to manipulate these objects on
distributed scale ( Spark and scala, MongoDB, or any other nosql
data base)
Analytical Assests for Telematics
• The analytical assets for Telematics can be broadly related to
• Snap-to-road
• Analysis of User Activities (Clustering)
• Traffic Event Detection (Classification)
• Realtime location search
• Set operations on geometriy objects and geoalgebra (layering of
geospatial information atop each other and algebraic operations on
them)
Conceptual dataflow and geospatial processing in Telematics
9
PDA
Event capture
Kafka
Event Processing & Delivery Descision
Stream Processing Engine
PDA Geodata & Critical events
Mongo / Hbase , Cassandra
/ Elastic
(on top of Hadoop)
Persistence Layer
Risk area
Tomcat App (Optional
Raster Processing - Geotrellis)
Datafeed client
Preload risk area
Preload traffic info
Client
D3 / Ajax /
Leaflet
API Push(REST)
Push
Websocket
Push
Pull
Push
Stream
Pull
Persistent layer should be scalable & support storage and querying of spatiotemporal objects (point, polygons, lines, line strings, for reference see mongo db’s 2d spherical indexing and geospatial
querying). The following low level queries shall be supported. (1) nearest neighbor query: given a point (lat, long) find all the line strings that are within x meter radius. (2) containment query: give all
the points within a polygon, or given a point find al the polygons containing them .
Client browser. e.g. fleet manager. In the
current scheme, we have deferred all the
intelligence to the client. i.e. the raster
processing, displaying the map, and
different layers along with map algebra will
be done on the client side. One such
example can be leaflet. An alternate
strategy can be to use geotrellis.io as a geo
processing engine to do the raster
operations and only use client for the display
of the map.
Stream processing queries (1)Instantaneous speed/
angular momentum of the PDA. (2) Distance to a
traffic event pulled from bing (3) Running
aggregates, e.g. how long the vehicle has spent at
the current location
Geocoding Service
OSM / Realtime traffic API
Analytics Cluster GIS capablities
Client browser. e.g. fleet manager. In the
current scheme, we have deferred all the
intelligence to the client. i.e. the raster
processing, displaying the map, and
different layers along with map algebra will
be done on the client side. One such
example can be leaflet. An alternate
strategy can be to use geotrellis.io as a geo
processing engine to do the raster
operations and only use client for the display
of the map. Hadoop Cluster
NoSql Database
Mongo DB
/ Hbase/ Elastic
Data Storage
Provisioning Layer
Spark
Scala +
R Studio
Server &
RMR
Processing Layer
Data Storage - Persistence layer
Name Index strategy geometry Query types Ease of
use/integration
Scalability/
Speed
Comments
Elastic search Geohash Point Bbox, Radius Good 3 stars 10s of TBs, Average writes, reads
and search extremely
fast
Neo4j Rtree Point/Line/
Polygon
Bbox, Radius Moderately Good 2
stars
10s of TBs Too much Granular
Hbase Buily your own
index
- - Moderately Good 2
stars
Petabytes Writes are fast, reads
as well, needs
specialization
Cassandra Build your own
index
- - Good , 3 stars Petabytes Same as HBase
Mongo db/ couch
base
geohash Point /line
/polygon
1) geo-within
2) Near
3) intersect
Excellent, 5 stars
Geojson / leaflet/
osm
10s of TBs,
Average
throughput
Best Integration with
geojson in all cases
Proposed Solutions: Short term : Mongo DB
Long term: Elastic search as the indexing engine and Hbase/ Cassandra as the storage
technology on top of hadoop
Analytical Services on Telematics
Cluster
1) Geocoding and reverse geocoding service on the
cluster
2) Weather and traffic Api (real time and history) to
support the use cases related to weather and traffic
related analytics
3) Street maps ( open street map in the start and then
some better map providers in the longer run)
• Required for the following analytics: regular trips , snap to
road, Mode of transport, Identification of risky roads, Impact of
POI (e.g. school) on events , enables Location based
Analytical Operations/Procedures Useful For Spatial Analysis
(R Studio Server With R Packages)
•Having an R studio Server on the cluster would be useful.
•Github Repository (already established)
•R packages for dealing with vector data (rgdal, rgeos, geojson_io,
SpatialTransforms)
• Point pattern analysis – dbscan, glm, gbm
• Describing and Analyzing Fields , Statistical Analysis of
Fields/Spatial Interpolation-krigging, tps
• Network Analysis, snap –to-road, frequent routes, etc..
(igraph, sna)
• Visualization of the data – leaflet, shiny
Geospatial processing layer on top
of persistence
• The Geospatial Processing layer that performs the
integration of map geometry and algebra to display the
information on map. On a small scale, can be performed
via java script (leaflet / d3)
• The following operations are required
1) Vector Operations
2) Map Algebra
• On larger scale, a software engineering layer for
distributed geospatial processing , for example, Scala,
Spark and Geotrellis is required.
• http://www.google-
melange.com/gsoc/proposal/public/google/gsoc2015/allixender/5676830073
Analytical Challenges in Movement Data
• Basic challenges in movement data
• Matching (Snap-to-road, street network matching)
• Similarity measures
• Trajectory clustering
• Event detection (classification)
15
Example Applications Solved through Machine Learning
• For raw trajectories
• Snap-to-road
• For symbolic trajectories
• Analysis of user activities
• Traffic event detection
16
Snap-to-road
• Given a trajectory T and a street network G
• Find a path in G that matches T with its real or ground
truth path
17
Snap-to-road: Analytical Modeling
• Multiregression view:
• Task = estimate noise free function f from T
that preserves the structural information
• Preserving structural correlations in output:
• Try kernelized embedding with kernel for raw trajectories
•
18
Snap-to-road• An important problem in organizations like Here, IBM and
Microsoft.
• Error between 10-100 meters (Wifi, Vehicle Navigation,
Mobile Devices)
• Sampling rate deteriorated and sparse GPS data
• Difficult at roundabouts, and tunnels
19
Solution:
 Basic steps:
 Embed the trajectory by Kernel Methods but
ignore map constraints
 Benefits:
 Noise reduction
 Capture multi-output, non-linear
dependencies
 ‘Round’ the resulting ‘relaxed
assignment’ to street map
20
Snap-to-road Algorithm
21
Snap-to-road:
Does it Work?
• Performance over challenging real tasks
22
Grouping Of Trajectories/Stops In Similar Routines
Basically Requires similarity measures for trajectories.
Unroll a trajectory by defining a mapping
23
Similarity Measures For Trajectories -- Symbolic Trajectories
• Formed by discretization of the curve through
measurement process or algorithms.
• Snap-to-road
• Stay points
• Regional division
24
Clustering of Staypoints to find Homezones
25
Grouping Of Trajectories/Stops In Similar Routines
Applications for Symbolic Trajectories Clustering and Event
Detection
• Trajectory clustering
• User activity analysis
• Traffic event detection
• Classification of events from non-event data
• Rerouting of traffic during baseball games
• Detection of conference in auditoriums
26
Applications for Symbolic Trajectories
• Exploit sequence analysis (in particular biological
sequence analysis)
1. Discretize the raw trajectories with an appropriate alphabet
2. Use alignment kernel with traffic symbol similairty in order to
translate traffic invariances to biological domain
3. Exploit sequence analysis to find discrete sequential patterns
(Where Traffic Meets DNA, Best Poster Award, ACM GIS
2011, Ahmed Jawad)
27
Trajectory Clustering
28
http://iapg.jade-hs.de/personen/brinkhoff/generator/ X
Time
24:00
20:00
16:00
12:00
8:00
4:00
0:00
Y
Home
Work
Sports
Trajectory Clustering :
Analysis of User Activities
• Analysis of user activities
• Frequent routes in trajectories
• Clustering at map matched Level
• Frequent routines in trajectories
• Clustering at stay point level
• Visualization of variability in routines (sequence logos)
29
Trajectory Clustering:
Map Matched Discretization
30
Trajectory Clustering:
Comparison to State-of-the-Art
31
Trajectory Clustering:
Routine analysis
32
Application for Symbolic Trajectories:
Traffic Event Detection
 Using biological sequence methods to model event persistence
• Analysis of Dodger’s baseball games from highway sensor
data
• Detecting Presence of Baseball Game
• Visualization
• Analysis of events at Caltech auditorium Entrance
• Detecting conferences in the auditorium
33
Traffic Event Detection
• Normalization based classifier
34
Readings from a taffic sensor
Traffic Event Detection:
Sequence Analysis
35
Summary and Conclusions
• Structural information analysis is the connection
between machine learning and GIS
• Still, a lot of data engineering and task specific tricks
needed, e.g., regularization, and normalization
36
Active Directions being pursued
• In Snap-to-road
• Fisher kernels for Sparse GPS data
• Testing KMM with real world system
• In clustering and event detection
• User profiles and diaries
• Label sequence graph kernels
• In structural information
• Can doing away the latitude/longitude pairs and keeping only
the structural information help with privacy issues
37
Q & A
References (1)
• Thomas Brinkhoff, Generating Network-Based Moving Objects, Proceedings of the 12th International Conference on Scientific and
Statistical Database Management, p.253, July 26-28, 2000
• C. Körner, M. May, S. Wrobel. Spatiotemporal Modeling and Analysis - Introduction and Overview. KI, 2012.
• Yi Guo , Junbin Gao , Paul W. Kwan, Twin Kernel Embedding, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.30
n.8, p.1490-1495, August 2008  
• Julian J. McAuley, Teofilo de Campos, and Tiberio S. Caetano. Unified graph matching in euclidean spaces. In CVPR, 2010.
• Tom Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009.
• Paul Newson , John Krumm, Hidden Markov Snap-to-road through noise and sparseness, Proceedings of the 17th ACM SIGSPATIAL
International Conference on Advances in Geographic Information Systems, November 04-06, 2009, Seattle, Washington  
• Novi Quadrianto, Le Song, and Alex Smola. Kernelized sorring. In NIPS 21, pages 1289--1296. 2009.
• Mohammed A. Quddus, Washington Y. Ochieng, and Robert B. Noland. Current map-matching algorithms for transport
applications: State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies, 15(5):312--
328, 2007.
• A. Abbott. A primer on sequence methods. Organization Science, 1(4):375--392, 1990.
• Gennady Andrienko , Natalia Andrienko , Stefan Wrobel, Visual analytics tools for analysis of movement data, ACM SIGKDD
Explorations Newsletter, v.9 n.2, December 2007  
• Mihael Ankerst , Markus M. Breunig , Hans-Peter Kriegel , Jörg Sander, OPTICS: ordering points to identify the clustering structure,
Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999,
Philadelphia, Pennsylvania, United States  
• Gerben de Vries , Maarten van Someren, Clustering vessel trajectories with alignment kernels under trajectory compression,
Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I, September 20-
24, 2010, Barcelona, Spain
• R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge University Press, 1998.
• M. Ester, H. P. Kriegel, S. Jörg, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise.
In KDD, pages 226--231, 1996.
39
References (2)
• Alexander Ihler , Jon Hutchins , Padhraic Smyth, Adaptive event detection with time-varying poisson processes, Proceedings of
the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia,
PA, USA
• Ahmed Jawad , Kristian Kersting, Kernelized Snap-to-road, Proceedings of the 18th SIGSPATIAL International Conference on
Advances in Geographic Information Systems, November 02-05, 2010, San Jose, California
• C. Joh, T. A. Arentze, and H. J. P. Timmermans. Multidimensional sequence alignment methods for activity-travel pattern
analysis: A comparison of dynamic programming and genetic algorithms. Geographical Analysis, 33(3):247--270, 2001.
• John A. Lee , Michel Verleysen, Nonlinear Dimensionality Reduction, Springer Publishing Company, Incorporated, 2007
• Yanchi Liu , Zhongmou Li , Hui Xiong , Xuedong Gao , Junjie Wu, Understanding of Internal Clustering Validation Measures,
Proceedings of the 2010 IEEE International Conference on Data Mining, p.911-916, December 13-17, 2010
• T. Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009.
• Salvatore Rinzivillo , Dino Pedreschi , Mirco Nanni , Fosca Giannotti , Natalia Andrienko , Gennady Andrienko, Visually driven
analysis of movement data by progressive clustering, Information Visualization, v.7 n.3, p.225-239, June 2008
• Albrecht Schmidt , Marc Langheinrich , Kritian Kersting, Perception beyond the Here and Now, Computer, v.44 n.2, p.86-88,
February 2011  
• S. Schonfelder and K. W. Axhausen. Urban Rhythms and Travel Behavior: Spatial and Temporal Phenomena of Daily Travel
(Transport and Society). Ashgate, 2010.
• N. Shoval and M. Isaacson. Sequence alignment as a method for human activity analysis in space and time. Annals of the
Association of American Geographers, 97(2):282--297, 2007.
• C. Wilson. Analysis of travel behavior using sequence alignment methods. Journal of the Transportation Research Board, 1645(-
1):52--59, 1998.
40
References (3)
• T. Gärtner. Kernels for structured data. World Scientific, Hackensack, N.J., 2008.
• T. Gärtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and ecient alternatives. In Proceedings of
Conference on Learning Theory (COLT), pages 129---143, 2003.
• T. Gärtner, T. Horvath, Q. V. Le, A. J. Smola, and S.Wrobel. Kernel methods for graphs. In Mining Graph Data, pages
253--282. John Wiley and Sons, Inc,2006.
• Intelligence (PAMI), 31(5):944{952, 2009.
• R. O. Duda, D. G. Stork, and P. E. Hart. Pattern classification. Wiley, New York; Chichester, 2nd edition, 2000.
• R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological SequenceAnalysis. Cambridge University Press, 1998.
• M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial
databases with noise. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining
(SIGKDD), pages 226{231, 1996.
• D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello. Bayesian ltering for location estimation. IEEE Pervasive
Computing, 2(3):24--33, 2003.
• S. J. Ganey, A. W. Robertson, P. Smyth, S. J. Camargo, and M. Ghil. Probabilistic clustering of extratropical cyclones
using regression mixture models. Climate Dynamics, 29(4):423--440, 2006.
• M. Gariel, A. N. Srivastava, and E. Feron. Trajectory clustering and an application to airspace monitoring. IEEE
Transactions on Intelligent Transportation Systems (TITS), 12(4):1511--1524, 2006.
41
Appendix: persistence options
• Neo4j Spatial :
• Utilities for importing from ESRI Shapefile as well as Open Street Map
files
• Support for all the common geometry types
• An RTree index for fast searches on geometries
• Support for topology operations during the search (contains, within,
intersects, covers, disjoint, etc.)
• The possibility to enable spatial operations on any graph of data,
regardless of the way the spatial data is stored, as long as an adapter is
provided to map from the graph to the geometries.
• Ability to split a single layer or dataset into multiple sub-layers or views
with pre-configured filters
Appendix: persistence options
Hbase/Cassandra - Build your own index .
• Perform Geohashing yourself or use elastic
search as a hashing / search engine
• Libraries Available, to connect ES with
cassandra /Hbase
• Besides geohashing is easy to program
• http://thenewstack.io/building-streaming-data-
hub-elasticsearch-kafka-cassandra/
Appendix: persistence options
Mongodb Geospatial
• Store your location data as GeoJSON objects with this
coordinate-axis order: longitude, latitude. The
coordinate reference system for GeoJSON uses the
WGS84 datum.
Mongodb: Querying Datadb.<collection>.find( { <location field> :
{ $geoWithin :
{ $geometry :
{ type : "Polygon" ,
coordinates : [ <coordinates> ]
} } } } )
db.places.find( { loc :
{ $geoWithin :
{ $geometry :
{ type : "Polygon" ,
coordinates : [ [[ 0 , 0 ] ,[ 3 , 6 ] ,[ 6 , 1 ] ,
[ 0 , 0 ]] ]} } } } )

Weitere ähnliche Inhalte

Andere mochten auch

Dez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de PequimDez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de Pequim
flavia_rodrigues
 
JPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintJPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrint
Amy Jo Reimer-Myers
 
Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)
Shaida Darian
 
Ancient china qin dynasty, the great wall, mauseleum
Ancient china   qin dynasty, the great wall, mauseleumAncient china   qin dynasty, the great wall, mauseleum
Ancient china qin dynasty, the great wall, mauseleum
Alex Thompson
 
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiCassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Modern Data Stack France
 
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_newCurriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Carlo Gaetano
 

Andere mochten auch (20)

SW 04-27 Final presentation
SW 04-27 Final presentationSW 04-27 Final presentation
SW 04-27 Final presentation
 
Program wcci-final[1]
Program wcci-final[1]Program wcci-final[1]
Program wcci-final[1]
 
Dez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de PequimDez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de Pequim
 
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
 
JPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintJPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrint
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)
 
Contoh ragam musik
Contoh ragam musikContoh ragam musik
Contoh ragam musik
 
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
 
Ancient china qin dynasty, the great wall, mauseleum
Ancient china   qin dynasty, the great wall, mauseleumAncient china   qin dynasty, the great wall, mauseleum
Ancient china qin dynasty, the great wall, mauseleum
 
Switching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileSwitching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to Agile
 
Solar Pump Applications in South asia
Solar Pump Applications in South asiaSolar Pump Applications in South asia
Solar Pump Applications in South asia
 
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiCassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_newCurriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaStratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
 
WeChat preso - Jiaqi Huang
WeChat preso - Jiaqi HuangWeChat preso - Jiaqi Huang
WeChat preso - Jiaqi Huang
 

Ähnlich wie Gis capabilities on Big Data Systems

Geographic information system
Geographic information systemGeographic information system
Geographic information system
Sumanta Das
 
Geographical information system
Geographical information systemGeographical information system
Geographical information system
Bipin Karki
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1
wang yaohui
 

Ähnlich wie Gis capabilities on Big Data Systems (20)

Optimizing GIS based Systems
Optimizing GIS based SystemsOptimizing GIS based Systems
Optimizing GIS based Systems
 
design_doc
design_docdesign_doc
design_doc
 
Geographic information system
Geographic information systemGeographic information system
Geographic information system
 
Geographical information system
Geographical information systemGeographical information system
Geographical information system
 
Geographical information system
Geographical information systemGeographical information system
Geographical information system
 
What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)
 
Spar 2010 Presetation
Spar 2010 PresetationSpar 2010 Presetation
Spar 2010 Presetation
 
Intro To Geospatial
Intro To GeospatialIntro To Geospatial
Intro To Geospatial
 
lecture03.ppt
lecture03.pptlecture03.ppt
lecture03.ppt
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
 
Geographical information systems
Geographical information systemsGeographical information systems
Geographical information systems
 
Data sources and input in GIS
Data  sources and input in GISData  sources and input in GIS
Data sources and input in GIS
 
Trb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rstTrb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rst
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
Materi Geodatabase Management - Fellowship 2022.pdf
Materi Geodatabase Management - Fellowship 2022.pdfMateri Geodatabase Management - Fellowship 2022.pdf
Materi Geodatabase Management - Fellowship 2022.pdf
 
201029 Joohee Kim
201029 Joohee Kim201029 Joohee Kim
201029 Joohee Kim
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 

Kürzlich hochgeladen

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Kürzlich hochgeladen (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Gis capabilities on Big Data Systems

  • 1. Geospatial Analytics and Spatial Capabilities on Big Data Systems By: Ahmed Jawad (PhD)
  • 2. Agenda 2 • Why Analyze Telematics • Analysis of Movement Data • Analytical Assets for Telematics • Operational Requirements on Telematics • Data flow on big data platforms • Analytical Challenges and Applications Solved through Machine Learning • Snap to road • Unifying trajectories to patterns of movements and routines • Traffic event detection
  • 3. Why Analyze Telematics • We are being recorded everywhere • Provides great insights into the customer routines and movement • Key players competing in the market 3
  • 4. Analyzing Movement Data Trajectory 4 Object in motion (time – space) Coordinate based recording Raw trajectories Symbolic trajectories Discretization Streets, locations, or events
  • 5. Traditional Operational Requirements In The World Of Geographic Information Systems (GIS) • Traditional use cases : cartography, geo-algebra (display of statistical events, hotspots, co-locations on the map) • Databases used : postgres, sql server • Mostly static data sources • Relatively small data sets • Moderate geometric accuracy • Offline processing acceptable • Complex geometric datatypes support
  • 6. Operational Requirements and Design Considerations for Telematics • Realtime ingestion and analytics on sensor data, distance queries, snap-to-road • 100 TBs/ Petabyte scale of the data • High variation in geospatial queries (range queries, etc..) and throughtput of CRUD operations: insertion/deletion/read • Processing flow and map applications, nature of the relationships in the data implicating storage technology. Indexing techniques and implications.
  • 7. Telematics and Geospatial Data Types • Spatial data structures: • Raster: geographically-referenced matrix of uniform size • Vector: features on the earth’s surface are represented as geographically-referenced vector objects • Hierarchical nature of objects • Points: different types : Entity, label, area, node • Lines: lines, polylines, arc, link, etc. • Polygons: area, polygon, complex polygon • Requirements: The ability to manipulate Geospatial Data. • Databases and libraries required to manipulate these objects on distributed scale ( Spark and scala, MongoDB, or any other nosql data base)
  • 8. Analytical Assests for Telematics • The analytical assets for Telematics can be broadly related to • Snap-to-road • Analysis of User Activities (Clustering) • Traffic Event Detection (Classification) • Realtime location search • Set operations on geometriy objects and geoalgebra (layering of geospatial information atop each other and algebraic operations on them)
  • 9. Conceptual dataflow and geospatial processing in Telematics 9 PDA Event capture Kafka Event Processing & Delivery Descision Stream Processing Engine PDA Geodata & Critical events Mongo / Hbase , Cassandra / Elastic (on top of Hadoop) Persistence Layer Risk area Tomcat App (Optional Raster Processing - Geotrellis) Datafeed client Preload risk area Preload traffic info Client D3 / Ajax / Leaflet API Push(REST) Push Websocket Push Pull Push Stream Pull Persistent layer should be scalable & support storage and querying of spatiotemporal objects (point, polygons, lines, line strings, for reference see mongo db’s 2d spherical indexing and geospatial querying). The following low level queries shall be supported. (1) nearest neighbor query: given a point (lat, long) find all the line strings that are within x meter radius. (2) containment query: give all the points within a polygon, or given a point find al the polygons containing them . Client browser. e.g. fleet manager. In the current scheme, we have deferred all the intelligence to the client. i.e. the raster processing, displaying the map, and different layers along with map algebra will be done on the client side. One such example can be leaflet. An alternate strategy can be to use geotrellis.io as a geo processing engine to do the raster operations and only use client for the display of the map. Stream processing queries (1)Instantaneous speed/ angular momentum of the PDA. (2) Distance to a traffic event pulled from bing (3) Running aggregates, e.g. how long the vehicle has spent at the current location Geocoding Service OSM / Realtime traffic API
  • 10. Analytics Cluster GIS capablities Client browser. e.g. fleet manager. In the current scheme, we have deferred all the intelligence to the client. i.e. the raster processing, displaying the map, and different layers along with map algebra will be done on the client side. One such example can be leaflet. An alternate strategy can be to use geotrellis.io as a geo processing engine to do the raster operations and only use client for the display of the map. Hadoop Cluster NoSql Database Mongo DB / Hbase/ Elastic Data Storage Provisioning Layer Spark Scala + R Studio Server & RMR Processing Layer
  • 11. Data Storage - Persistence layer Name Index strategy geometry Query types Ease of use/integration Scalability/ Speed Comments Elastic search Geohash Point Bbox, Radius Good 3 stars 10s of TBs, Average writes, reads and search extremely fast Neo4j Rtree Point/Line/ Polygon Bbox, Radius Moderately Good 2 stars 10s of TBs Too much Granular Hbase Buily your own index - - Moderately Good 2 stars Petabytes Writes are fast, reads as well, needs specialization Cassandra Build your own index - - Good , 3 stars Petabytes Same as HBase Mongo db/ couch base geohash Point /line /polygon 1) geo-within 2) Near 3) intersect Excellent, 5 stars Geojson / leaflet/ osm 10s of TBs, Average throughput Best Integration with geojson in all cases Proposed Solutions: Short term : Mongo DB Long term: Elastic search as the indexing engine and Hbase/ Cassandra as the storage technology on top of hadoop
  • 12. Analytical Services on Telematics Cluster 1) Geocoding and reverse geocoding service on the cluster 2) Weather and traffic Api (real time and history) to support the use cases related to weather and traffic related analytics 3) Street maps ( open street map in the start and then some better map providers in the longer run) • Required for the following analytics: regular trips , snap to road, Mode of transport, Identification of risky roads, Impact of POI (e.g. school) on events , enables Location based
  • 13. Analytical Operations/Procedures Useful For Spatial Analysis (R Studio Server With R Packages) •Having an R studio Server on the cluster would be useful. •Github Repository (already established) •R packages for dealing with vector data (rgdal, rgeos, geojson_io, SpatialTransforms) • Point pattern analysis – dbscan, glm, gbm • Describing and Analyzing Fields , Statistical Analysis of Fields/Spatial Interpolation-krigging, tps • Network Analysis, snap –to-road, frequent routes, etc.. (igraph, sna) • Visualization of the data – leaflet, shiny
  • 14. Geospatial processing layer on top of persistence • The Geospatial Processing layer that performs the integration of map geometry and algebra to display the information on map. On a small scale, can be performed via java script (leaflet / d3) • The following operations are required 1) Vector Operations 2) Map Algebra • On larger scale, a software engineering layer for distributed geospatial processing , for example, Scala, Spark and Geotrellis is required. • http://www.google- melange.com/gsoc/proposal/public/google/gsoc2015/allixender/5676830073
  • 15. Analytical Challenges in Movement Data • Basic challenges in movement data • Matching (Snap-to-road, street network matching) • Similarity measures • Trajectory clustering • Event detection (classification) 15
  • 16. Example Applications Solved through Machine Learning • For raw trajectories • Snap-to-road • For symbolic trajectories • Analysis of user activities • Traffic event detection 16
  • 17. Snap-to-road • Given a trajectory T and a street network G • Find a path in G that matches T with its real or ground truth path 17
  • 18. Snap-to-road: Analytical Modeling • Multiregression view: • Task = estimate noise free function f from T that preserves the structural information • Preserving structural correlations in output: • Try kernelized embedding with kernel for raw trajectories • 18
  • 19. Snap-to-road• An important problem in organizations like Here, IBM and Microsoft. • Error between 10-100 meters (Wifi, Vehicle Navigation, Mobile Devices) • Sampling rate deteriorated and sparse GPS data • Difficult at roundabouts, and tunnels 19
  • 20. Solution:  Basic steps:  Embed the trajectory by Kernel Methods but ignore map constraints  Benefits:  Noise reduction  Capture multi-output, non-linear dependencies  ‘Round’ the resulting ‘relaxed assignment’ to street map 20
  • 22. Snap-to-road: Does it Work? • Performance over challenging real tasks 22
  • 23. Grouping Of Trajectories/Stops In Similar Routines Basically Requires similarity measures for trajectories. Unroll a trajectory by defining a mapping 23
  • 24. Similarity Measures For Trajectories -- Symbolic Trajectories • Formed by discretization of the curve through measurement process or algorithms. • Snap-to-road • Stay points • Regional division 24
  • 25. Clustering of Staypoints to find Homezones 25 Grouping Of Trajectories/Stops In Similar Routines
  • 26. Applications for Symbolic Trajectories Clustering and Event Detection • Trajectory clustering • User activity analysis • Traffic event detection • Classification of events from non-event data • Rerouting of traffic during baseball games • Detection of conference in auditoriums 26
  • 27. Applications for Symbolic Trajectories • Exploit sequence analysis (in particular biological sequence analysis) 1. Discretize the raw trajectories with an appropriate alphabet 2. Use alignment kernel with traffic symbol similairty in order to translate traffic invariances to biological domain 3. Exploit sequence analysis to find discrete sequential patterns (Where Traffic Meets DNA, Best Poster Award, ACM GIS 2011, Ahmed Jawad) 27
  • 29. Trajectory Clustering : Analysis of User Activities • Analysis of user activities • Frequent routes in trajectories • Clustering at map matched Level • Frequent routines in trajectories • Clustering at stay point level • Visualization of variability in routines (sequence logos) 29
  • 33. Application for Symbolic Trajectories: Traffic Event Detection  Using biological sequence methods to model event persistence • Analysis of Dodger’s baseball games from highway sensor data • Detecting Presence of Baseball Game • Visualization • Analysis of events at Caltech auditorium Entrance • Detecting conferences in the auditorium 33
  • 34. Traffic Event Detection • Normalization based classifier 34 Readings from a taffic sensor
  • 36. Summary and Conclusions • Structural information analysis is the connection between machine learning and GIS • Still, a lot of data engineering and task specific tricks needed, e.g., regularization, and normalization 36
  • 37. Active Directions being pursued • In Snap-to-road • Fisher kernels for Sparse GPS data • Testing KMM with real world system • In clustering and event detection • User profiles and diaries • Label sequence graph kernels • In structural information • Can doing away the latitude/longitude pairs and keeping only the structural information help with privacy issues 37
  • 38. Q & A
  • 39. References (1) • Thomas Brinkhoff, Generating Network-Based Moving Objects, Proceedings of the 12th International Conference on Scientific and Statistical Database Management, p.253, July 26-28, 2000 • C. Körner, M. May, S. Wrobel. Spatiotemporal Modeling and Analysis - Introduction and Overview. KI, 2012. • Yi Guo , Junbin Gao , Paul W. Kwan, Twin Kernel Embedding, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.30 n.8, p.1490-1495, August 2008   • Julian J. McAuley, Teofilo de Campos, and Tiberio S. Caetano. Unified graph matching in euclidean spaces. In CVPR, 2010. • Tom Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009. • Paul Newson , John Krumm, Hidden Markov Snap-to-road through noise and sparseness, Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 04-06, 2009, Seattle, Washington   • Novi Quadrianto, Le Song, and Alex Smola. Kernelized sorring. In NIPS 21, pages 1289--1296. 2009. • Mohammed A. Quddus, Washington Y. Ochieng, and Robert B. Noland. Current map-matching algorithms for transport applications: State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies, 15(5):312-- 328, 2007. • A. Abbott. A primer on sequence methods. Organization Science, 1(4):375--392, 1990. • Gennady Andrienko , Natalia Andrienko , Stefan Wrobel, Visual analytics tools for analysis of movement data, ACM SIGKDD Explorations Newsletter, v.9 n.2, December 2007   • Mihael Ankerst , Markus M. Breunig , Hans-Peter Kriegel , Jörg Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States   • Gerben de Vries , Maarten van Someren, Clustering vessel trajectories with alignment kernels under trajectory compression, Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I, September 20- 24, 2010, Barcelona, Spain • R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge University Press, 1998. • M. Ester, H. P. Kriegel, S. Jörg, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996. 39
  • 40. References (2) • Alexander Ihler , Jon Hutchins , Padhraic Smyth, Adaptive event detection with time-varying poisson processes, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA • Ahmed Jawad , Kristian Kersting, Kernelized Snap-to-road, Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 02-05, 2010, San Jose, California • C. Joh, T. A. Arentze, and H. J. P. Timmermans. Multidimensional sequence alignment methods for activity-travel pattern analysis: A comparison of dynamic programming and genetic algorithms. Geographical Analysis, 33(3):247--270, 2001. • John A. Lee , Michel Verleysen, Nonlinear Dimensionality Reduction, Springer Publishing Company, Incorporated, 2007 • Yanchi Liu , Zhongmou Li , Hui Xiong , Xuedong Gao , Junjie Wu, Understanding of Internal Clustering Validation Measures, Proceedings of the 2010 IEEE International Conference on Data Mining, p.911-916, December 13-17, 2010 • T. Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009. • Salvatore Rinzivillo , Dino Pedreschi , Mirco Nanni , Fosca Giannotti , Natalia Andrienko , Gennady Andrienko, Visually driven analysis of movement data by progressive clustering, Information Visualization, v.7 n.3, p.225-239, June 2008 • Albrecht Schmidt , Marc Langheinrich , Kritian Kersting, Perception beyond the Here and Now, Computer, v.44 n.2, p.86-88, February 2011   • S. Schonfelder and K. W. Axhausen. Urban Rhythms and Travel Behavior: Spatial and Temporal Phenomena of Daily Travel (Transport and Society). Ashgate, 2010. • N. Shoval and M. Isaacson. Sequence alignment as a method for human activity analysis in space and time. Annals of the Association of American Geographers, 97(2):282--297, 2007. • C. Wilson. Analysis of travel behavior using sequence alignment methods. Journal of the Transportation Research Board, 1645(- 1):52--59, 1998. 40
  • 41. References (3) • T. Gärtner. Kernels for structured data. World Scientific, Hackensack, N.J., 2008. • T. Gärtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and ecient alternatives. In Proceedings of Conference on Learning Theory (COLT), pages 129---143, 2003. • T. Gärtner, T. Horvath, Q. V. Le, A. J. Smola, and S.Wrobel. Kernel methods for graphs. In Mining Graph Data, pages 253--282. John Wiley and Sons, Inc,2006. • Intelligence (PAMI), 31(5):944{952, 2009. • R. O. Duda, D. G. Stork, and P. E. Hart. Pattern classification. Wiley, New York; Chichester, 2nd edition, 2000. • R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological SequenceAnalysis. Cambridge University Press, 1998. • M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 226{231, 1996. • D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello. Bayesian ltering for location estimation. IEEE Pervasive Computing, 2(3):24--33, 2003. • S. J. Ganey, A. W. Robertson, P. Smyth, S. J. Camargo, and M. Ghil. Probabilistic clustering of extratropical cyclones using regression mixture models. Climate Dynamics, 29(4):423--440, 2006. • M. Gariel, A. N. Srivastava, and E. Feron. Trajectory clustering and an application to airspace monitoring. IEEE Transactions on Intelligent Transportation Systems (TITS), 12(4):1511--1524, 2006. 41
  • 42. Appendix: persistence options • Neo4j Spatial : • Utilities for importing from ESRI Shapefile as well as Open Street Map files • Support for all the common geometry types • An RTree index for fast searches on geometries • Support for topology operations during the search (contains, within, intersects, covers, disjoint, etc.) • The possibility to enable spatial operations on any graph of data, regardless of the way the spatial data is stored, as long as an adapter is provided to map from the graph to the geometries. • Ability to split a single layer or dataset into multiple sub-layers or views with pre-configured filters
  • 43. Appendix: persistence options Hbase/Cassandra - Build your own index . • Perform Geohashing yourself or use elastic search as a hashing / search engine • Libraries Available, to connect ES with cassandra /Hbase • Besides geohashing is easy to program • http://thenewstack.io/building-streaming-data- hub-elasticsearch-kafka-cassandra/
  • 44. Appendix: persistence options Mongodb Geospatial • Store your location data as GeoJSON objects with this coordinate-axis order: longitude, latitude. The coordinate reference system for GeoJSON uses the WGS84 datum.
  • 45. Mongodb: Querying Datadb.<collection>.find( { <location field> : { $geoWithin : { $geometry : { type : "Polygon" , coordinates : [ <coordinates> ] } } } } ) db.places.find( { loc : { $geoWithin : { $geometry : { type : "Polygon" , coordinates : [ [[ 0 , 0 ] ,[ 3 , 6 ] ,[ 6 , 1 ] , [ 0 , 0 ]] ]} } } } )