GeoWave: Scaling Complex (Not Just Geo) Data

•Als PPTX, PDF herunterladen•

2 gefällt mir•1,102 views

This presentation will discuss the successes of GeoWave applied to the spatiotemporal domain, and focus on how the successes in this domain can be further generalized to a diverse set of complex data structures. The intent is to draw corollaries to the data challenges of the audience. Fast indexed access to massive datasets fundamentally involves highly optimized range scans within key-value stores. If your reaction is "that's easier said than done" than you've had the pre-requisite experiences to attend this talk. The intent of the software is to make these use cases as seamless as possible for downstream consumers of the framework. Briefly, a GeoWave "dimension" is simply a function to apply sort order to real world values. The constructs for defining these "dimensions" and many more details will be discussed in this presentation. At the core of GeoWave is a capability to store, retrieve, and analyze multi-dimensional data structures within distributed key-value stores. Fundamentally, spatio-temporal data serves as a special case for which GeoWave provides tailored extensions. The software is intended to be easily pluggable into any sorted key-value store, with current implementations available for Apache HBase, Apache Accumulo, Apache Cassandra, Apache Kudu, Redis, RocksDB, Google BigTable, and Amazon DynamoDB. The datastore support is truly provided as an extension that is discoverable at runtime. Following any GeoWave programmatic API, commandline, or service access will not be tied to any particular key-value store. Furthermore there are optimized data transfer utilities across supported stores. This approach has proven to provide seamless transitions of scale from embedded applications, external in-memory services, all the way up to its primary applications within highly distributed ecosystems.

Technologie

An open source framework that
leverages the scalability of key-value
stores for effective storage, retrieval,
and analysis of massive geospatial
datasets

At its core, GeoWave
handles spatial and
spatiotemporal indexing
within distributed key-
value stores with
natural integrations for
various popular
frameworks
popular geospatial platforms distributed processing
frameworks
GeoWave bridges the gap between and

Use a Space Filling Curve
(SFC) to impose multi-
dimensional data.

Z-Order Hilbert H-order Peano AR2W2 BΩ
WL∞ ∞ 6 4 8 5.40 5.00
WL2 ∞ 6 4 8 6.04 5.00
WL1 ∞ 9 8 10.66 12.00 9.00
WBA ∞ 2.40 3.00 2.00 3.05 2.22
ABA 2.86 1.41 1.69 1.42 1.47 1.40
Haverkort, Walderveen Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves 2008 arXiv:0806.4787v2
Average Total Bounding
Box Area (ABA)Worst Case Dilation
Worst Case Bounding Box
Area Ratio (WBA)

● What about data with extents such as lines/polys
or time ranges?
○ We need to represent multiple resolutions...
● What about unbounded dimensions?
○ We can define a periodicity to bound a single
SFC. We end up with an SFC per period (or
combination of periods).
● What about queries?
○ Bounding hyperrectangles are discontinuous
on the space filling curve

From Massive Scale in the Cloud to
GeoWave Embedded in the Client
With a single interface, you can use both!
An example analysis tool requiring GeoWave multi-dimensional indexing for map, timeline,
and graph search and visualization of massive datasets

richard.fecher@radiantsolutions.com
barry.bragg@radiantsolutions.com

Empfohlen

Data Science Crash CourseDataWorks Summit

Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit

Managing the Dewey Decimal SystemDataWorks Summit

Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

Empfohlen

Data Science Crash CourseDataWorks Summit

Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit

Managing the Dewey Decimal SystemDataWorks Summit

Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit

Security Framework for Multitenant ArchitectureDataWorks Summit

Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit

Extending Twitter's Data Platform to Google CloudDataWorks Summit

Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit

Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit

Computer Vision: Coming to a Store Near YouDataWorks Summit

Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit

Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit

Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit

Open Source, Open Data: Driving Innovation in Smart CitiesDataWorks Summit

Data Protection in Hybrid Enterprise Data Lake EnvironmentDataWorks Summit

Big Data Technologies in Support of a Medical School Data Science InstituteDataWorks Summit

Hadoop Storage in the Cloud Native EraDataWorks Summit

Free Servers to Build Big Data System on: Bing’s ApproachDataWorks Summit

IoFMT – Internet of Fleet Management ThingsDataWorks Summit

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Sample pptx for embedding into website for demoHarshalMandlekar2

Weitere ähnliche Inhalte

Mehr von DataWorks Summit