SlideShare ist ein Scribd-Unternehmen logo
1 von 109
Downloaden Sie, um offline zu lesen
Anthony Fox
Directorof Data Science,Commonwealth ComputerResearch Inc
GeoMesa Founderand TechnicalLead
anthony.fox@ccri.com
linkedin.com/in/anthony-fox-ccri
twitter.com/algoriffic
www.ccri.com
GeoMesa on Spark SQL
Extracting Location Intelligence from Data
Satellite AIS
ADS-B
Mobile Apps
Mobile Apps
Intro to Location Intelligence and GeoMesa
Spatial Data Types, Spatial SQL
Extending Spark Catalyst for Optimized Spatial SQL
Density of activity in San Francisco
Speed profile of San Francisco
Intro to Location Intelligence and GeoMesa
Spatial Data Types, Spatial SQL
Extending Spark Catalyst for Optimized Spatial SQL
Density of activity in San Francisco
Speed profile of San Francisco
Intro to Location Intelligence and GeoMesa
Spatial Data Types, Spatial SQL
Extending Spark Catalyst for OptimizedSpatial SQL
Density of activity in San Francisco
Speed profile of San Francisco
Intro to Location Intelligence and GeoMesa
Spatial Data Types, Spatial SQL
Extending Spark Catalyst for Optimized Spatial SQL
Density of activity in San Francisco
Speed profile of San Francisco
Location Intelligence
Easy Hard
Location Intelligence
Show me all the
coffee shops in
this
neighborhood
Easy Hard
Location Intelligence
Show me all the
coffee shops in
this
neighborhood How many users
commute
through this
intersection
every day?
Easy Hard
Location Intelligence
Location Intelligence
Location Intelligence
Location Intelligence
Location Intelligence
Show me all the
coffee shops in
this
neighborhood How many users
commute
through this
intersection
every day?
Easy Hard
Location Intelligence
Show me all the
coffee shops in
this
neighborhood How many users
commute
through this
intersection
every day?
Which users
commute within
250 meters of a
coffee shop?
Easy Hard
Location Intelligence
Easy Hard
Show me all the
coffee shops in
this
neighborhood How many users
commute
through this
intersection
every day?
Which users
commute within
250 meters of a
coffee shop?
Should I place
an ad for coffee
on this user’s
mobile device?
Location Intelligence
Easy Hard
Show me all the
coffee shops in
this
neighborhood How many users
commute
through this
intersection
every day?
Which users
commute within
250 meters of a
coffee shop?
Should I place
an ad for coffee
on this user’s
mobile device?
Where should I
build my next
coffee shop?
What is GeoMesa?
A suite of tools for persisting, querying, analyzing, and
streaming spatio-temporal data at scale
What is GeoMesa?
A suite of tools for persisting, querying, analyzing,
and streaming spatio-temporal data at scale
What is GeoMesa?
A suite of tools for persisting, querying, analyzing,
and streaming spatio-temporal data at scale
What is GeoMesa?
A suite of tools for persisting, querying, analyzing,
and streaming spatio-temporal data at scale
What is GeoMesa?
A suite of tools for persisting, querying, analyzing, and
streaming spatio-temporal data at scale
Intro to Location Intelligence and GeoMesa
Spatial Data Types, Spatial SQL
Extending Spark Catalyst for Optimized Spatial SQL
Density of activity in San Francisco
Speed profile of San Francisco
Spatial Data Types
Spatial Data Types
Points
Locations
Events
Instantaneous Positions
Spatial Data Types
Points
Locations
Events
Instantaneous Positions
Lines
Road networks
Voyages
Trips
Trajectories
Spatial Data Types
Points
Locations
Events
Instantaneous Positions
Lines
Road networks
Voyages
Trips
Trajectories
Polygons
Administrative Regions
Airspaces
Spatial Data Types
Points
Locations
Events
Instantaneous Positions
PointUDT
MultiPointUDT
Lines
Road networks
Voyages
Trips
Trajectories
LineUDT
MultiLineUDT
Polygons
Administrative Regions
Airspaces
PolygonUDT
MultiPolygonUDT
Spatial SQL
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Spatial SQL
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Geometry constructor
Spatial SQL
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Spatial column in schema
Spatial SQL
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Topological predicate
Sample Spatial UDFs: Geometry Constructors
st_geomFromWKT Create a point, line, or polygon from a
WKT st_geomFromWKT(‘POINT(-122.40,37.78)’)
st_makeLine Create a line from a sequence of points
st_makeLine(collect_list(geom))
st_makeBBOX Create a bounding box from (left, bottom,
right, top) st_makeBBOX(-123,37,-121,39)
...
Sample Spatial UDFs: Topological Predicates
st_contains Returns true if the second argument is
contained within the first argument
st_contains(
st_geomFromWKT(‘POLYGON…’),
geom
)
st_within Returns true if the second argument
geometry is entirely within the first
argument
st_within(
st_geomFromWKT(‘POLYGON…’),
geom
)
st_dwithin Returns true if the geometries are within
a specified distance from each other
st_dwithin(geom1, geom2, 100)
Sample Spatial UDFs: Processing
st_bufferPoint Create a bufferaround a point fordistance
within type queries st_bufferPoint(geom, 10)
st_envelope Extract the envelope ofa geometry
st_envelope(geom)
st_geohash Encode the geometry using a Z-Orderspace
filling curve. Useful for grid analysis.
st_geohash(geom, 35)
st_closestpoint Find the point on the target geometry thatis
closest to the given geometry st_closestpoint(geom1, geom2)
st_distanceSpheroid Find the great circle distance usingthe WGS84
ellipsoid
st_distanceSpheroid(geom1, geom2)
Intro to Location Intelligence and GeoMesa
Spatial Data Types, Spatial SQL
Extending Spark Catalyst for OptimizedSpatial SQL
Density of activity in San Francisco
Speed profile of San Francisco
Optimizing Spatial SQL
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Optimizing Spatial SQL
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Only load partitions that have
records that intersect the query
geometry.
Extending Spark’s Catalyst Optimizer
https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
Extending Spark’s Catalyst Optimizer
https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
Catalyst exposes hooks to insert
optimization rules in various points in the
query processing logic.
Extending Spark’s Catalyst Optimizer
/**
* :: Experimental ::
* A collection of methods that are consideredexperimental,but canbe used tohook into
* the query plannerforadvanced functionality.
*
* @group basic
* @since 1.3.0
*/
@Experimental
@transient
@InterfaceStability.Unstable
def experimental: ExperimentalMethods = sparkSession.experimental
SQL optimizations for Spatial Predicates
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
SQL optimizations for Spatial Predicates
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
GeoMesa Relation
SQL optimizations for Spatial Predicates
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Relational Projection
SQL optimizations for Spatial Predicates
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Topological Predicate
SQL optimizations for Spatial Predicates
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Geometry Literal
SQL optimizations for Spatial Predicates
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
Date range predicate
SQL optimizations for Spatial Predicates
object STContainsRule extendsRule[LogicalPlan]with PredicateHelper{
override defapply(plan: LogicalPlan): LogicalPlan ={
plan.transform{
case filt @ Filter(f, lr@LogicalRelation(gmRel: GeoMesaRelation,_,_)) =>
…
val relation =gmRel.copy(filt =ff.and(gtFilters:+gmRel.filt))
lr.copy(expectedOutputAttributes=Some(lr.output),
relation =relation)
}
}
SQL optimizations for Spatial Predicates
object STContainsRule extendsRule[LogicalPlan]with PredicateHelper{
override defapply(plan: LogicalPlan): LogicalPlan ={
plan.transform{
case filt @ Filter(f, lr@LogicalRelation(gmRel: GeoMesaRelation,_,_)) =>
…
val relation =gmRel.copy(filt =ff.and(gtFilters:+gmRel.filt))
lr.copy(expectedOutputAttributes=Some(lr.output),
relation =relation)
}
}
Intercept a Filter on a
GeoMesa Logical Relation
SQL optimizations for Spatial Predicates
object STContainsRule extendsRule[LogicalPlan]with PredicateHelper{
override defapply(plan: LogicalPlan): LogicalPlan ={
plan.transform{
case filt @ Filter(f, lr@LogicalRelation(gmRel: GeoMesaRelation,_,_)) =>
…
val relation =gmRel.copy(filt =ff.and(gtFilters:+gmRel.filt))
lr.copy(expectedOutputAttributes=Some(lr.output),
relation =relation)
}
} Extract the predicates that can be handled by GeoMesa, create a new GeoMesa relation with
the predicates pushed down into the scan,and return a modified tree with the new relation and
the filter removed.
GeoMesa will compute the minimal ranges necessary to cover the query region.
SQL optimizations for Spatial Predicates
Relational
Projection
Filter
GeoMesa
Relation
SQL optimizations for Spatial Predicates
Relational
Projection
Filter
GeoMesa
Relation
Relational
Projection
GeoMesa
Relation
<topo predicate>
SQL optimizations for Spatial Predicates
Relational
Projection
Filter
GeoMesa
Relation
Relational
Projection
GeoMesa
Relation
<topo predicate>
GeoMesa
Relation
<topo predicate>
<relational projection>
SQL optimizations for Spatial Predicates
SELECT
activity_id,user_id,geom,dtg
FROM
activities
WHERE
st_contains(st_makeBBOX(-78,37,-77,38),geom) AND
dtg > cast(‘2017-06-01’ as timestamp) AND
dtg < cast(‘2017-06-05’ as timestamp)
SQL optimizations for Spatial Predicates
SELECT
*
FROM
activities<pushdown filter and projection>
SQL optimizations for Spatial Predicates
SELECT
*
FROM
activities<pushdown filter and projection>
Reduced I/O, reduced networkoverhead, reduced
compute load - faster Location Intelligence
answers
Intro to Location Intelligence and GeoMesa
Spatial Data Types, Spatial SQL
Extending Spark Catalyst for Optimized Spatial SQL
Density of activity in San Francisco
Speed profile of San Francisco
SELECT
geohash,
count(geohash)as count
FROM (
SELECT st_geohash(geom,35) as geohash
FROM sf
WHERE
st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
)
GROUP BY geohash
1. Constrain to San
Francisco
2. Snap location to 35
bit geohash
3. Group by geohash
and count records per
geohash
Density of Activity in San Francisco
SELECT
geohash,
count(geohash)as count
FROM (
SELECT st_geohash(geom,35) as geohash
FROM sf
WHERE
st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
)
GROUP BY geohash
1. Constrain to San
Francisco
2. Snap location to 35
bit geohash
3. Group by geohash
and count records per
geohash
Density of Activity in San Francisco
1. Constrain to San
Francisco
2. Snap location to 35
bit geohash
3. Group by geohash
and count records per
geohash
Density of Activity in San Francisco
SELECT
geohash,
count(geohash)as count
FROM (
SELECT st_geohash(geom,35) as geohash
FROM sf
WHERE
st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
)
GROUP BY geohash
SELECT
geohash,
count(geohash)as count
FROM (
SELECT st_geohash(geom,35) as geohash
FROM sf
WHERE
st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
)
GROUP BY geohash
1. Constrain to San
Francisco
2. Snap location to 35
bit geohash
3. Group by geohash
and count records
per geohash
Density of Activity in San Francisco
Density of Activity in San Francisco
1. Constrain to San
Francisco
2. Snap location to 35
bit geohash
3. Group by geohash
and count records per
geohash
Visualize using Jupyter
and Bokeh
Density of Activity in San Francisco
p = figure(title="STRAVA",
plot_width=900,plot_height=600,
x_range=x_range,y_range=y_range)
p.add_tile(tonerlines)
p.circle(x=projecteddf['px'],
y=projecteddf['py'],
fill_alpha=0.5,
size=6,
fill_color=colors,
line_color=colors)
show(p)
Visualize using Jupyter
and Bokeh
Density of Activity in San Francisco
Speed Profile of a Metro Area
Inputs STRAVA Activities
An activity is sampled once per second
Each observation has a location and time
{
"type": "Feature",
"geometry": { "type": "Point", "coordinates": [-122.40736,37.807147] },
"properties": {
"activity_id": "**********************",
"athlete_id": "**********************",
"device_type": 5,
"activity_type": "Ride",
"frame_type": 2,
"commute": false,
"date": "2016-11-02T23:58:03",
"index": 0
},
"id": "6a9bb90497be6f64eae009e6c760389017bc31db:0"
}
SELECT
activity_id,
index,
geom as s,
lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e,
dtg as start,
lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end
FROM activities
WHERE
activity_type = 'Ride' AND
st_contains(
st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
ORDER BY dtg ASC
1. Select all activities
within metro area
2. Sort activity by dtg
ascending
3. Window over each set
of consecutive
samples
4. Create a temporary
table
Speed Profile of a Metro Area
1. Select all activities
within metro area
2. Sort activity by dtg
ascending
3. Window over each set
of consecutive
samples
4. Create a temporary
table
Speed Profile of a Metro Area
SELECT
activity_id,
index,
geom as s,
lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e,
dtg as start,
lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end
FROM activities
WHERE
activity_type = 'Ride' AND
st_contains(
st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
ORDER BY dtg ASC
1. Select all activities
within metro area
2. Sort activity by dtg
ascending
3. Window over each
set of consecutive
samples
4. Create a temporary
table
Speed Profile of a Metro Area
SELECT
activity_id,
index,
geom as s,
lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e,
dtg as start,
lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end
FROM activities
WHERE
activity_type = 'Ride' AND
st_contains(
st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
ORDER BY dtg ASC
1. Select all activities
within metro area
2. Sort activity by dtg
ascending
3. Window over each
set of consecutive
samples
4. Create a temporary
table
Speed Profile of a Metro Area
SELECT
activity_id,
index,
geom as s,
lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e,
dtg as start,
lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end
FROM activities
WHERE
activity_type = 'Ride' AND
st_contains(
st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
ORDER BY dtg ASC
spark.sql(“””
SELECT
activity_id,
index,
geom as s,
lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e,
dtg as start,
lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end
FROM activities
WHERE
activity_type = 'Ride' AND
st_contains(
st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1),
geom)
ORDER BY dtg ASC
“””).createOrReplaceTempView(“segments”)
1. Select all activities
within metro area
2. Sort activity by dtg
ascending
3. Window over each set
of consecutive
samples
4. Create a temporary
table
Speed Profile of a Metro Area
1. Select all activities
within metro area
2. Sort activity by dtg
ascending
3. Window over each set
of consecutive
samples
4. Create a temporary
table
Speed Profile of a Metro Area
SELECT
st_geohash(s,35)as gh,
st_distanceSpheroid(s, e)/
cast(cast(end as long)-cast(startas long)as double)
as meters_per_second
FROM segments
5. Compute the
distance between
consecutive points
6. Compute the time
difference between
consecutive points
7. Compute the speed
8. Snap the location to a
grid based on a
GeoHash
9. Create a temporary
table
Speed Profile of a Metro Area
SELECT
st_geohash(s,35)as gh,
st_distanceSpheroid(s, e)/
cast(cast(end as long)-cast(startas long)as double)
as meters_per_second
FROM segments
5. Compute the distance
between consecutive
points
6. Compute the time
difference between
consecutive points
7. Compute the speed
8. Snap the location to a
grid based on a
GeoHash
9. Create a temporary
table
Speed Profile of a Metro Area
SELECT
st_geohash(s,35)as gh,
st_distanceSpheroid(s, e)/
cast(cast(end as long)-cast(startas long)as double)
as meters_per_second
FROM segments
5. Compute the distance
between consecutive
points
6. Compute the time
difference between
consecutive points
7. Compute the speed
8. Snap the location to a
grid based on a
GeoHash
9. Create a temporary
table
Speed Profile of a Metro Area
SELECT
st_geohash(s,35)as gh,
st_distanceSpheroid(s, e)/
cast(cast(end as long)-cast(startas long)as double)
as meters_per_second
FROM segments
5. Compute the distance
between consecutive
points
6. Compute the time
difference between
consecutive points
7. Compute the speed
8. Snap the location to
a grid based on a
GeoHash
9. Create a temporary
table
Speed Profile of a Metro Area
spark.sql(“””
SELECT
st_geohash(s,35)as gh,
st_distanceSpheroid(s, e)/
cast(cast(end as long)-cast(startas long)as double)
as meters_per_second
FROM segments
“””).createOrReplaceTempView(“gridspeeds”)
5. Compute the distance
between consecutive
points
6. Compute the time
difference between
consecutive points
7. Compute the speed
8. Snap the location to a
grid based on a
GeoHash
9. Create a temporary
table
Speed Profile of a Metro Area
5. Compute the distance
between consecutive
points
6. Compute the time
difference between
consecutive points
7. Compute the speed
8. Snap the location to a
grid based on a
GeoHash
9. Create a temporary
table
Speed Profile of a Metro Area
SELECT
st_centroid(st_geomFromGeoHash(gh,35))as p,
percentile_approx(meters_per_second,0.5) as avg_meters_per_second,
stddev(meters_per_second) as std_dev
FROM gridspeeds
GROUP BY gh
10. Group the grid cells
11. For each grid cell,
compute the median
and standard
deviation of the speed
12. Extract the location of
the grid cell
Speed Profile of a Metro Area
SELECT
st_centroid(st_geomFromGeoHash(gh,35))as p,
percentile_approx(meters_per_second,0.5) as med_meters_per_second,
stddev(meters_per_second) as std_dev
FROM gridspeeds
GROUP BY gh
10. Group the grid cells
11. For each grid cell,
compute the median
and standard
deviation of the
speed
12. Extract the location of
the grid cell
Speed Profile of a Metro Area
SELECT
st_centroid(st_geomFromGeoHash(gh,35))as p,
percentile_approx(meters_per_second,0.5) as avg_meters_per_second,
stddev(meters_per_second) as std_dev
FROM gridspeeds
GROUP BY gh
10. Group the grid cells
11. For each grid cell,
compute the median
and standard
deviation of the speed
12. Extract the location
of the grid cell
Speed Profile of a Metro Area
10. Group the grid cells
11. For each grid cell,
compute the median
and standard
deviation of the speed
12. Extract the location of
the grid cell
Speed Profile of a Metro Area
p = figure(title="STRAVA",
plot_width=900,plot_height=600,
x_range=x_range,y_range=y_range)
p.add_tile(tonerlines)
p.circle(x=projecteddf['px'],
y=projecteddf['py'],
fill_alpha=0.5,
size=6,
fill_color=colors,
line_color=colors)
show(p)
Visualize using Jupyter
and Bokeh
Speed Profile of a Metro Area
Speed Profile of a Metro Area
Speed Profile of a Metro Area
Thank You.
geomesa.org
github.com/locationtech/geomesa
twitter.com/algoriffic
linkedin.com/in/anthony-fox-ccri
anthony.fox@ccri.com
www.ccri.com
Indexing Spatio-Temporal Data in Bigtable
Moscone Centercoordinates
37.7839°N,122.4012°W
Indexing Spatio-Temporal Data in Bigtable
• Bigtable clones have a single dimension
lexicographic sorted index
Moscone Centercoordinates
37.7839°N,122.4012°W
Indexing Spatio-Temporal Data in Bigtable
• Bigtable clones have a single dimension
lexicographic sorted index
• What if we concatenated latitude and longitude?
Moscone Centercoordinates
37.7839°N,122.4012°W
Row Key
37.7839,-122.4012
Indexing Spatio-Temporal Data in Bigtable
• Bigtable clones have a single dimension
lexicographic sorted index
• What if we concatenated latitude and longitude?
• Fukushima sorts lexicographically near Moscone
Center because they have the same latitude
Moscone Centercoordinates
37.7839°N,122.4012°W
Row Key
37.7839,-122.4012
37.7839,140.4676
Space-filling Curves
2-D Z-orderCurve 2-D Hilbert Curve
Space-filling
curve example
Moscone Center coordinates
37.7839° N, 122.4012° W
Encode coordinates to a 32 bit Z
Space-filling
curve example
Moscone Center coordinates
37.7839° N, 122.4012° W
Encode coordinates to a 32 bit Z
1. Scale latitude and longitude to use 16 available bits each
scaled_x = (-122.4012 + 180)/360 * 2^16
= 10485
scaled_y = (37.7839 + 90)/180 * 2^16
= 46524
Space-filling
curve example
Moscone Center coordinates
37.7839° N, 122.4012° W
Encode coordinates to a 32 bit Z
1. Scale latitude and longitude to use 16 available bits each
scaled_x = (-122.4012 + 180)/360 * 2^16
= 10485
scaled_y = (37.7839 + 90)/180 * 2^16
= 46524
1. Take binary representation of scaled coordinates
bin_x = 0010100011110101
bin_y = 1011010110111100
Space-filling
curve example
Moscone Center coordinates
37.7839° N, 122.4012° W
Encode coordinates to a 32 bit Z
1. Scale latitude and longitude to use 16 available bits each
scaled_x = (-122.4012 + 180)/360 * 2^16
= 10485
scaled_y = (37.7839 + 90)/180 * 2^16
= 46524
1. Take binary representation of scaled coordinates
bin_x = 0010100011110101
bin_y = 1011010110111100
1. Interleave bits of x and y and convert back to an integer
bin_z = 01001101100100011110111101110010
z = 1301409650
Space-filling
curve example
Moscone Center coordinates
37.7839° N, 122.4012° W
Encode coordinates to a 32 bit Z
1. Scale latitude and longitude to use 16 available bits each
scaled_x = (-122.4012 + 180)/360 * 2^16
= 10485
scaled_y = (37.7839 + 90)/180 * 2^16
= 46524
1. Take binary representation of scaled coordinates
bin_x = 0010100011110101
bin_y = 1011010110111100
1. Interleave bits of x and y and convert back to an integer
bin_z = 01001101100100011110111101110010
z = 1301409650
Distance preserving hash
Space-filling curves linearize a
multi-dimensional space
Bigtable Index
[0,2^32]
1301409650
4294967296
0
Regions translate to range scans
Bigtable Index
[0,2^32]
1301409657
1301409650
0
4294967296
scan ‘geomesa’,{STARTROW=> 1301409650,ENDROW=> 1301409657}
ADS-B
Provisioning Spatial RDDs
Provisioning Spatial RDDs
params = {
"instanceId": "geomesa",
"zookeepers": "X.X.X.X",
"user": "user",
"password": "******",
"tableName": "geomesa.strava"
}
spark
.read
.format("geomesa")
.options(**params)
.option("geomesa.feature", "activities")
.load()
Accumulo
Provisioning Spatial RDDs
params = {
"bigtable.table.name": "geomesa.strava"
}
spark
.read
.format("geomesa")
.options(**params)
.option("geomesa.feature", "activities")
.load()
HBase and Bigtable
Provisioning Spatial RDDs
params = {
"geomesa.converter": "strava",
"geomesa.input": "s3://path/to/data/*.json.gz"
}
spark
.read
.format("geomesa")
.options(**params)
.option("geomesa.feature", "activities")
.load()
Flat files
Speed Profile of a Metro Area
Speed Profile of a Metro Area
The Dream
Speed Profile of a Metro Area
Inputs
Approach
● Select all activities within metro area
● Sort each activity by dtg ascending
● Window over each set of consecutive samples
● Compute summary statistics of speed
● Group by grid cell
● Visualize

Weitere ähnliche Inhalte

Was ist angesagt?

Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIAltinity Ltd
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache CalciteDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...Databricks
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache SparkDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 

Was ist angesagt? (20)

Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache Spark
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 

Ähnlich wie Location Intelligence with GeoMesa

Introduction To PostGIS
Introduction To PostGISIntroduction To PostGIS
Introduction To PostGISmleslie
 
Intro To PostGIS
Intro To PostGISIntro To PostGIS
Intro To PostGISmleslie
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudTorsten Steinbach
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQLRoberto Franchini
 
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial WorldGIS in the Rockies
 
Sql Saturday Spatial Data Ss2008 Michael Stark Copy
Sql Saturday Spatial Data Ss2008 Michael Stark   CopySql Saturday Spatial Data Ss2008 Michael Stark   Copy
Sql Saturday Spatial Data Ss2008 Michael Stark Copymws580
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
 
Going Native: Leveraging the New JSON Native Datatype in Oracle 21c
Going Native: Leveraging the New JSON Native Datatype in Oracle 21cGoing Native: Leveraging the New JSON Native Datatype in Oracle 21c
Going Native: Leveraging the New JSON Native Datatype in Oracle 21cJim Czuprynski
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabRoman
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsJohann de Boer
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)Woonsan Ko
 
What's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability ExtensionWhat's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability ExtensionSafe Software
 

Ähnlich wie Location Intelligence with GeoMesa (20)

GeoMesa on Spark SQL: Extracting Location Intelligence from Data
GeoMesa on Spark SQL: Extracting Location Intelligence from DataGeoMesa on Spark SQL: Extracting Location Intelligence from Data
GeoMesa on Spark SQL: Extracting Location Intelligence from Data
 
Introduction To PostGIS
Introduction To PostGISIntroduction To PostGIS
Introduction To PostGIS
 
Intro To PostGIS
Intro To PostGISIntro To PostGIS
Intro To PostGIS
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQL
 
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
 
Sql Saturday Spatial Data Ss2008 Michael Stark Copy
Sql Saturday Spatial Data Ss2008 Michael Stark   CopySql Saturday Spatial Data Ss2008 Michael Stark   Copy
Sql Saturday Spatial Data Ss2008 Michael Stark Copy
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
Going Native: Leveraging the New JSON Native Datatype in Oracle 21c
Going Native: Leveraging the New JSON Native Datatype in Oracle 21cGoing Native: Leveraging the New JSON Native Datatype in Oracle 21c
Going Native: Leveraging the New JSON Native Datatype in Oracle 21c
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
huhu
huhuhuhu
huhu
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at Grab
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalytics
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
What's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability ExtensionWhat's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability Extension
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Kürzlich hochgeladen (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Location Intelligence with GeoMesa

  • 1. Anthony Fox Directorof Data Science,Commonwealth ComputerResearch Inc GeoMesa Founderand TechnicalLead anthony.fox@ccri.com linkedin.com/in/anthony-fox-ccri twitter.com/algoriffic www.ccri.com GeoMesa on Spark SQL Extracting Location Intelligence from Data
  • 6. Intro to Location Intelligence and GeoMesa Spatial Data Types, Spatial SQL Extending Spark Catalyst for Optimized Spatial SQL Density of activity in San Francisco Speed profile of San Francisco
  • 7. Intro to Location Intelligence and GeoMesa Spatial Data Types, Spatial SQL Extending Spark Catalyst for Optimized Spatial SQL Density of activity in San Francisco Speed profile of San Francisco
  • 8. Intro to Location Intelligence and GeoMesa Spatial Data Types, Spatial SQL Extending Spark Catalyst for OptimizedSpatial SQL Density of activity in San Francisco Speed profile of San Francisco
  • 9. Intro to Location Intelligence and GeoMesa Spatial Data Types, Spatial SQL Extending Spark Catalyst for Optimized Spatial SQL Density of activity in San Francisco Speed profile of San Francisco
  • 11. Location Intelligence Show me all the coffee shops in this neighborhood Easy Hard
  • 12. Location Intelligence Show me all the coffee shops in this neighborhood How many users commute through this intersection every day? Easy Hard
  • 17. Location Intelligence Show me all the coffee shops in this neighborhood How many users commute through this intersection every day? Easy Hard
  • 18. Location Intelligence Show me all the coffee shops in this neighborhood How many users commute through this intersection every day? Which users commute within 250 meters of a coffee shop? Easy Hard
  • 19. Location Intelligence Easy Hard Show me all the coffee shops in this neighborhood How many users commute through this intersection every day? Which users commute within 250 meters of a coffee shop? Should I place an ad for coffee on this user’s mobile device?
  • 20. Location Intelligence Easy Hard Show me all the coffee shops in this neighborhood How many users commute through this intersection every day? Which users commute within 250 meters of a coffee shop? Should I place an ad for coffee on this user’s mobile device? Where should I build my next coffee shop?
  • 21. What is GeoMesa? A suite of tools for persisting, querying, analyzing, and streaming spatio-temporal data at scale
  • 22. What is GeoMesa? A suite of tools for persisting, querying, analyzing, and streaming spatio-temporal data at scale
  • 23. What is GeoMesa? A suite of tools for persisting, querying, analyzing, and streaming spatio-temporal data at scale
  • 24. What is GeoMesa? A suite of tools for persisting, querying, analyzing, and streaming spatio-temporal data at scale
  • 25. What is GeoMesa? A suite of tools for persisting, querying, analyzing, and streaming spatio-temporal data at scale
  • 26. Intro to Location Intelligence and GeoMesa Spatial Data Types, Spatial SQL Extending Spark Catalyst for Optimized Spatial SQL Density of activity in San Francisco Speed profile of San Francisco
  • 29. Spatial Data Types Points Locations Events Instantaneous Positions Lines Road networks Voyages Trips Trajectories
  • 30. Spatial Data Types Points Locations Events Instantaneous Positions Lines Road networks Voyages Trips Trajectories Polygons Administrative Regions Airspaces
  • 31. Spatial Data Types Points Locations Events Instantaneous Positions PointUDT MultiPointUDT Lines Road networks Voyages Trips Trajectories LineUDT MultiLineUDT Polygons Administrative Regions Airspaces PolygonUDT MultiPolygonUDT
  • 32. Spatial SQL SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp)
  • 33. Spatial SQL SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Geometry constructor
  • 34. Spatial SQL SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Spatial column in schema
  • 35. Spatial SQL SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Topological predicate
  • 36. Sample Spatial UDFs: Geometry Constructors st_geomFromWKT Create a point, line, or polygon from a WKT st_geomFromWKT(‘POINT(-122.40,37.78)’) st_makeLine Create a line from a sequence of points st_makeLine(collect_list(geom)) st_makeBBOX Create a bounding box from (left, bottom, right, top) st_makeBBOX(-123,37,-121,39) ...
  • 37. Sample Spatial UDFs: Topological Predicates st_contains Returns true if the second argument is contained within the first argument st_contains( st_geomFromWKT(‘POLYGON…’), geom ) st_within Returns true if the second argument geometry is entirely within the first argument st_within( st_geomFromWKT(‘POLYGON…’), geom ) st_dwithin Returns true if the geometries are within a specified distance from each other st_dwithin(geom1, geom2, 100)
  • 38. Sample Spatial UDFs: Processing st_bufferPoint Create a bufferaround a point fordistance within type queries st_bufferPoint(geom, 10) st_envelope Extract the envelope ofa geometry st_envelope(geom) st_geohash Encode the geometry using a Z-Orderspace filling curve. Useful for grid analysis. st_geohash(geom, 35) st_closestpoint Find the point on the target geometry thatis closest to the given geometry st_closestpoint(geom1, geom2) st_distanceSpheroid Find the great circle distance usingthe WGS84 ellipsoid st_distanceSpheroid(geom1, geom2)
  • 39. Intro to Location Intelligence and GeoMesa Spatial Data Types, Spatial SQL Extending Spark Catalyst for OptimizedSpatial SQL Density of activity in San Francisco Speed profile of San Francisco
  • 40. Optimizing Spatial SQL SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp)
  • 41. Optimizing Spatial SQL SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Only load partitions that have records that intersect the query geometry.
  • 42. Extending Spark’s Catalyst Optimizer https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
  • 43. Extending Spark’s Catalyst Optimizer https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html Catalyst exposes hooks to insert optimization rules in various points in the query processing logic.
  • 44. Extending Spark’s Catalyst Optimizer /** * :: Experimental :: * A collection of methods that are consideredexperimental,but canbe used tohook into * the query plannerforadvanced functionality. * * @group basic * @since 1.3.0 */ @Experimental @transient @InterfaceStability.Unstable def experimental: ExperimentalMethods = sparkSession.experimental
  • 45. SQL optimizations for Spatial Predicates SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp)
  • 46. SQL optimizations for Spatial Predicates SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) GeoMesa Relation
  • 47. SQL optimizations for Spatial Predicates SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Relational Projection
  • 48. SQL optimizations for Spatial Predicates SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Topological Predicate
  • 49. SQL optimizations for Spatial Predicates SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Geometry Literal
  • 50. SQL optimizations for Spatial Predicates SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp) Date range predicate
  • 51. SQL optimizations for Spatial Predicates object STContainsRule extendsRule[LogicalPlan]with PredicateHelper{ override defapply(plan: LogicalPlan): LogicalPlan ={ plan.transform{ case filt @ Filter(f, lr@LogicalRelation(gmRel: GeoMesaRelation,_,_)) => … val relation =gmRel.copy(filt =ff.and(gtFilters:+gmRel.filt)) lr.copy(expectedOutputAttributes=Some(lr.output), relation =relation) } }
  • 52. SQL optimizations for Spatial Predicates object STContainsRule extendsRule[LogicalPlan]with PredicateHelper{ override defapply(plan: LogicalPlan): LogicalPlan ={ plan.transform{ case filt @ Filter(f, lr@LogicalRelation(gmRel: GeoMesaRelation,_,_)) => … val relation =gmRel.copy(filt =ff.and(gtFilters:+gmRel.filt)) lr.copy(expectedOutputAttributes=Some(lr.output), relation =relation) } } Intercept a Filter on a GeoMesa Logical Relation
  • 53. SQL optimizations for Spatial Predicates object STContainsRule extendsRule[LogicalPlan]with PredicateHelper{ override defapply(plan: LogicalPlan): LogicalPlan ={ plan.transform{ case filt @ Filter(f, lr@LogicalRelation(gmRel: GeoMesaRelation,_,_)) => … val relation =gmRel.copy(filt =ff.and(gtFilters:+gmRel.filt)) lr.copy(expectedOutputAttributes=Some(lr.output), relation =relation) } } Extract the predicates that can be handled by GeoMesa, create a new GeoMesa relation with the predicates pushed down into the scan,and return a modified tree with the new relation and the filter removed. GeoMesa will compute the minimal ranges necessary to cover the query region.
  • 54. SQL optimizations for Spatial Predicates Relational Projection Filter GeoMesa Relation
  • 55. SQL optimizations for Spatial Predicates Relational Projection Filter GeoMesa Relation Relational Projection GeoMesa Relation <topo predicate>
  • 56. SQL optimizations for Spatial Predicates Relational Projection Filter GeoMesa Relation Relational Projection GeoMesa Relation <topo predicate> GeoMesa Relation <topo predicate> <relational projection>
  • 57. SQL optimizations for Spatial Predicates SELECT activity_id,user_id,geom,dtg FROM activities WHERE st_contains(st_makeBBOX(-78,37,-77,38),geom) AND dtg > cast(‘2017-06-01’ as timestamp) AND dtg < cast(‘2017-06-05’ as timestamp)
  • 58. SQL optimizations for Spatial Predicates SELECT * FROM activities<pushdown filter and projection>
  • 59. SQL optimizations for Spatial Predicates SELECT * FROM activities<pushdown filter and projection> Reduced I/O, reduced networkoverhead, reduced compute load - faster Location Intelligence answers
  • 60. Intro to Location Intelligence and GeoMesa Spatial Data Types, Spatial SQL Extending Spark Catalyst for Optimized Spatial SQL Density of activity in San Francisco Speed profile of San Francisco
  • 61. SELECT geohash, count(geohash)as count FROM ( SELECT st_geohash(geom,35) as geohash FROM sf WHERE st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ) GROUP BY geohash 1. Constrain to San Francisco 2. Snap location to 35 bit geohash 3. Group by geohash and count records per geohash Density of Activity in San Francisco
  • 62. SELECT geohash, count(geohash)as count FROM ( SELECT st_geohash(geom,35) as geohash FROM sf WHERE st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ) GROUP BY geohash 1. Constrain to San Francisco 2. Snap location to 35 bit geohash 3. Group by geohash and count records per geohash Density of Activity in San Francisco
  • 63. 1. Constrain to San Francisco 2. Snap location to 35 bit geohash 3. Group by geohash and count records per geohash Density of Activity in San Francisco SELECT geohash, count(geohash)as count FROM ( SELECT st_geohash(geom,35) as geohash FROM sf WHERE st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ) GROUP BY geohash
  • 64. SELECT geohash, count(geohash)as count FROM ( SELECT st_geohash(geom,35) as geohash FROM sf WHERE st_contains(st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ) GROUP BY geohash 1. Constrain to San Francisco 2. Snap location to 35 bit geohash 3. Group by geohash and count records per geohash Density of Activity in San Francisco
  • 65. Density of Activity in San Francisco 1. Constrain to San Francisco 2. Snap location to 35 bit geohash 3. Group by geohash and count records per geohash
  • 66. Visualize using Jupyter and Bokeh Density of Activity in San Francisco p = figure(title="STRAVA", plot_width=900,plot_height=600, x_range=x_range,y_range=y_range) p.add_tile(tonerlines) p.circle(x=projecteddf['px'], y=projecteddf['py'], fill_alpha=0.5, size=6, fill_color=colors, line_color=colors) show(p)
  • 67. Visualize using Jupyter and Bokeh Density of Activity in San Francisco
  • 68. Speed Profile of a Metro Area Inputs STRAVA Activities An activity is sampled once per second Each observation has a location and time { "type": "Feature", "geometry": { "type": "Point", "coordinates": [-122.40736,37.807147] }, "properties": { "activity_id": "**********************", "athlete_id": "**********************", "device_type": 5, "activity_type": "Ride", "frame_type": 2, "commute": false, "date": "2016-11-02T23:58:03", "index": 0 }, "id": "6a9bb90497be6f64eae009e6c760389017bc31db:0" }
  • 69. SELECT activity_id, index, geom as s, lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e, dtg as start, lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end FROM activities WHERE activity_type = 'Ride' AND st_contains( st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ORDER BY dtg ASC 1. Select all activities within metro area 2. Sort activity by dtg ascending 3. Window over each set of consecutive samples 4. Create a temporary table Speed Profile of a Metro Area
  • 70. 1. Select all activities within metro area 2. Sort activity by dtg ascending 3. Window over each set of consecutive samples 4. Create a temporary table Speed Profile of a Metro Area SELECT activity_id, index, geom as s, lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e, dtg as start, lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end FROM activities WHERE activity_type = 'Ride' AND st_contains( st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ORDER BY dtg ASC
  • 71. 1. Select all activities within metro area 2. Sort activity by dtg ascending 3. Window over each set of consecutive samples 4. Create a temporary table Speed Profile of a Metro Area SELECT activity_id, index, geom as s, lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e, dtg as start, lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end FROM activities WHERE activity_type = 'Ride' AND st_contains( st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ORDER BY dtg ASC
  • 72. 1. Select all activities within metro area 2. Sort activity by dtg ascending 3. Window over each set of consecutive samples 4. Create a temporary table Speed Profile of a Metro Area SELECT activity_id, index, geom as s, lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e, dtg as start, lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end FROM activities WHERE activity_type = 'Ride' AND st_contains( st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ORDER BY dtg ASC
  • 73. spark.sql(“”” SELECT activity_id, index, geom as s, lead(geom) OVER (PARTITION BY activity_idORDER by dtgasc) as e, dtg as start, lead(dtg) OVER (PARTITIONBY activity_idORDER by dtgasc) as end FROM activities WHERE activity_type = 'Ride' AND st_contains( st_makeBBOX(-122.4194-1,37.77-1,-122.4194+1,37.77+1), geom) ORDER BY dtg ASC “””).createOrReplaceTempView(“segments”) 1. Select all activities within metro area 2. Sort activity by dtg ascending 3. Window over each set of consecutive samples 4. Create a temporary table Speed Profile of a Metro Area
  • 74. 1. Select all activities within metro area 2. Sort activity by dtg ascending 3. Window over each set of consecutive samples 4. Create a temporary table Speed Profile of a Metro Area
  • 75. SELECT st_geohash(s,35)as gh, st_distanceSpheroid(s, e)/ cast(cast(end as long)-cast(startas long)as double) as meters_per_second FROM segments 5. Compute the distance between consecutive points 6. Compute the time difference between consecutive points 7. Compute the speed 8. Snap the location to a grid based on a GeoHash 9. Create a temporary table Speed Profile of a Metro Area
  • 76. SELECT st_geohash(s,35)as gh, st_distanceSpheroid(s, e)/ cast(cast(end as long)-cast(startas long)as double) as meters_per_second FROM segments 5. Compute the distance between consecutive points 6. Compute the time difference between consecutive points 7. Compute the speed 8. Snap the location to a grid based on a GeoHash 9. Create a temporary table Speed Profile of a Metro Area
  • 77. SELECT st_geohash(s,35)as gh, st_distanceSpheroid(s, e)/ cast(cast(end as long)-cast(startas long)as double) as meters_per_second FROM segments 5. Compute the distance between consecutive points 6. Compute the time difference between consecutive points 7. Compute the speed 8. Snap the location to a grid based on a GeoHash 9. Create a temporary table Speed Profile of a Metro Area
  • 78. SELECT st_geohash(s,35)as gh, st_distanceSpheroid(s, e)/ cast(cast(end as long)-cast(startas long)as double) as meters_per_second FROM segments 5. Compute the distance between consecutive points 6. Compute the time difference between consecutive points 7. Compute the speed 8. Snap the location to a grid based on a GeoHash 9. Create a temporary table Speed Profile of a Metro Area
  • 79. spark.sql(“”” SELECT st_geohash(s,35)as gh, st_distanceSpheroid(s, e)/ cast(cast(end as long)-cast(startas long)as double) as meters_per_second FROM segments “””).createOrReplaceTempView(“gridspeeds”) 5. Compute the distance between consecutive points 6. Compute the time difference between consecutive points 7. Compute the speed 8. Snap the location to a grid based on a GeoHash 9. Create a temporary table Speed Profile of a Metro Area
  • 80. 5. Compute the distance between consecutive points 6. Compute the time difference between consecutive points 7. Compute the speed 8. Snap the location to a grid based on a GeoHash 9. Create a temporary table Speed Profile of a Metro Area
  • 81. SELECT st_centroid(st_geomFromGeoHash(gh,35))as p, percentile_approx(meters_per_second,0.5) as avg_meters_per_second, stddev(meters_per_second) as std_dev FROM gridspeeds GROUP BY gh 10. Group the grid cells 11. For each grid cell, compute the median and standard deviation of the speed 12. Extract the location of the grid cell Speed Profile of a Metro Area
  • 82. SELECT st_centroid(st_geomFromGeoHash(gh,35))as p, percentile_approx(meters_per_second,0.5) as med_meters_per_second, stddev(meters_per_second) as std_dev FROM gridspeeds GROUP BY gh 10. Group the grid cells 11. For each grid cell, compute the median and standard deviation of the speed 12. Extract the location of the grid cell Speed Profile of a Metro Area
  • 83. SELECT st_centroid(st_geomFromGeoHash(gh,35))as p, percentile_approx(meters_per_second,0.5) as avg_meters_per_second, stddev(meters_per_second) as std_dev FROM gridspeeds GROUP BY gh 10. Group the grid cells 11. For each grid cell, compute the median and standard deviation of the speed 12. Extract the location of the grid cell Speed Profile of a Metro Area
  • 84. 10. Group the grid cells 11. For each grid cell, compute the median and standard deviation of the speed 12. Extract the location of the grid cell Speed Profile of a Metro Area
  • 86. Speed Profile of a Metro Area
  • 87. Speed Profile of a Metro Area
  • 89. Indexing Spatio-Temporal Data in Bigtable Moscone Centercoordinates 37.7839°N,122.4012°W
  • 90. Indexing Spatio-Temporal Data in Bigtable • Bigtable clones have a single dimension lexicographic sorted index Moscone Centercoordinates 37.7839°N,122.4012°W
  • 91. Indexing Spatio-Temporal Data in Bigtable • Bigtable clones have a single dimension lexicographic sorted index • What if we concatenated latitude and longitude? Moscone Centercoordinates 37.7839°N,122.4012°W Row Key 37.7839,-122.4012
  • 92. Indexing Spatio-Temporal Data in Bigtable • Bigtable clones have a single dimension lexicographic sorted index • What if we concatenated latitude and longitude? • Fukushima sorts lexicographically near Moscone Center because they have the same latitude Moscone Centercoordinates 37.7839°N,122.4012°W Row Key 37.7839,-122.4012 37.7839,140.4676
  • 94. Space-filling curve example Moscone Center coordinates 37.7839° N, 122.4012° W Encode coordinates to a 32 bit Z
  • 95. Space-filling curve example Moscone Center coordinates 37.7839° N, 122.4012° W Encode coordinates to a 32 bit Z 1. Scale latitude and longitude to use 16 available bits each scaled_x = (-122.4012 + 180)/360 * 2^16 = 10485 scaled_y = (37.7839 + 90)/180 * 2^16 = 46524
  • 96. Space-filling curve example Moscone Center coordinates 37.7839° N, 122.4012° W Encode coordinates to a 32 bit Z 1. Scale latitude and longitude to use 16 available bits each scaled_x = (-122.4012 + 180)/360 * 2^16 = 10485 scaled_y = (37.7839 + 90)/180 * 2^16 = 46524 1. Take binary representation of scaled coordinates bin_x = 0010100011110101 bin_y = 1011010110111100
  • 97. Space-filling curve example Moscone Center coordinates 37.7839° N, 122.4012° W Encode coordinates to a 32 bit Z 1. Scale latitude and longitude to use 16 available bits each scaled_x = (-122.4012 + 180)/360 * 2^16 = 10485 scaled_y = (37.7839 + 90)/180 * 2^16 = 46524 1. Take binary representation of scaled coordinates bin_x = 0010100011110101 bin_y = 1011010110111100 1. Interleave bits of x and y and convert back to an integer bin_z = 01001101100100011110111101110010 z = 1301409650
  • 98. Space-filling curve example Moscone Center coordinates 37.7839° N, 122.4012° W Encode coordinates to a 32 bit Z 1. Scale latitude and longitude to use 16 available bits each scaled_x = (-122.4012 + 180)/360 * 2^16 = 10485 scaled_y = (37.7839 + 90)/180 * 2^16 = 46524 1. Take binary representation of scaled coordinates bin_x = 0010100011110101 bin_y = 1011010110111100 1. Interleave bits of x and y and convert back to an integer bin_z = 01001101100100011110111101110010 z = 1301409650 Distance preserving hash
  • 99. Space-filling curves linearize a multi-dimensional space Bigtable Index [0,2^32] 1301409650 4294967296 0
  • 100. Regions translate to range scans Bigtable Index [0,2^32] 1301409657 1301409650 0 4294967296 scan ‘geomesa’,{STARTROW=> 1301409650,ENDROW=> 1301409657}
  • 101. ADS-B
  • 103. Provisioning Spatial RDDs params = { "instanceId": "geomesa", "zookeepers": "X.X.X.X", "user": "user", "password": "******", "tableName": "geomesa.strava" } spark .read .format("geomesa") .options(**params) .option("geomesa.feature", "activities") .load() Accumulo
  • 104. Provisioning Spatial RDDs params = { "bigtable.table.name": "geomesa.strava" } spark .read .format("geomesa") .options(**params) .option("geomesa.feature", "activities") .load() HBase and Bigtable
  • 105. Provisioning Spatial RDDs params = { "geomesa.converter": "strava", "geomesa.input": "s3://path/to/data/*.json.gz" } spark .read .format("geomesa") .options(**params) .option("geomesa.feature", "activities") .load() Flat files
  • 106. Speed Profile of a Metro Area
  • 107. Speed Profile of a Metro Area
  • 109. Speed Profile of a Metro Area Inputs Approach ● Select all activities within metro area ● Sort each activity by dtg ascending ● Window over each set of consecutive samples ● Compute summary statistics of speed ● Group by grid cell ● Visualize