SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Enabling Efficient Cloud Workflows
Eugene Cheipesh
echeipesh@azavea.com
GitHub: @echeipesh
www.azavea.com
Cloud Native Workflows
“Hey, let's not download the whole file every time.”
Cloud Native
● Network is between storage and computation
● Storage is chunked and probably Key/Value
● Computation must scale
● Coordination is limited and risky
Cloud Optimized GeoTiff
● Defines an internal structure of GeoTiffs that is
optimized for streaming reading from blob
storage like S3 or Google Cloud Storage.
● That is, it’s best utilized for requests that use
HTTP GET Range Requests
https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF
http://www.cogeo.org/
“A GeoTIFF file is a TIFF 6.0 file, and
inherits the file structure as described
in the corresponding portion of the
TIFF spec. All GeoTIFF specific
information is encoded in several
additional reserved TIFF tags, and
contains no private Image File
Directories (IFD's), binary structures
or other private information invisible
to standard TIFF readers.”
http://geotiff.maptools.org/spec/geotiff2.4.html#2.4
GeoTiff
TIFF
● Tagged Image File Format
● Binary image is separated out into
“segments”
● Contains metadata in a sequence of “tags”
● Can contain metadata for multiple images in
different “image file directories” (IDFs)
GeoTiff Custom Tags
● GeoTiff adds custom tag GeoKeys
○ non-TIFF keys for CRS map space etc.
● This is how we know where the pixels are on
the surface of the earth (the geospatial bit).
http://geotiff.maptools.org/spec/geotiff6.html#6.1
GeoTiff Streaming Read
● Step 1: Fetch the header
● Step 2: Decide what segments you want
● Step 3: Read the segments
Request: BBOX Mean
GeoTIFF: All of the above
COG: IFDs First - Rule #1
TIFF Segments
● Chunks of an image are stored at different
locations. This allows for parts of the image to
be decompressed and loaded separately.
● There are two methods for segment layout:
Striped and Tiled
TIFF: Strip Segments
COG: Strip Segments - Bad
TIFF: Tile Segments
COG: Use Tiled Segments - Rule #2
Request: BBOX Mean at 60m
Request: BBOX Mean at 60m
GeoTIFF: Has Overviews
COG: Provide Overviews - Rule #3
COG Rules
● Put IFDs first
● Sort IFDs by resolution
● Use Tiled segments
● Include overviews (in file)
COG Rules Continued ?
● Provenance Metadata
● Include Summary Statistics
● File naming convention
● Catalog suggestion
COG “Profiles”: Slippy Map
● Restrict CRS as EPSG:3857
● Align GeoTiff segments with slippy tile grid
● Align GeoTiff files with tile boundaries
● Serve tile endpoints directly from COG
COG World Coverage?
“You’re going to need more than one GeoTIFF.”
s3://[bucket]/[Shard]_[GeoHash]_[ISO_TIME]_[ID].tiff
GeoTrellis & COGs
COG: Tile Request
COG Tiler
● Application specific catalog
● Reprojection on read
● Memache gives responsiveness
● Raster metadata pushed into files
GeoTrellis Layers
GeoTrellis: Avro Layers
GeoTrellis Avro Layers
● Many small files, one per tile
○ PUT requests get expensive
● Data duplication hazzard
○ Probably not authoritative
● Limited access
○ Requires GeoTrellis code to read
GeoTrellis: COG Layers
GeoTrellis Strucuted COG
● Base zoom tiles are in base GeoTiff layer
● Introduces “DATE_TIME” tag
● Includes VRT for layer files
● GeoTiff segments match tile size and layout
● Additional zoom levels stored in overviews
● Pyramid up when overviews “run out”
GeoTrellis: COG Pyramid
Unstructured COGs
Cloud Native thinking
GeoTrellis Unstructured COG
● Lots of good raster sources are already COG
● They don’t have a strict layout though
● Lets “inline” ingest process into the batch job
● Slower but reduces data duplication
COG: Tile Request (Again)
GeoTrellis Unstructured COG
● Requires mechanism to query for raster names
○ WFS, STAC, PostgreSQL, Scan
● COG provies “gradual” wins over GeoTiff
● Reproject on read to match workflow
Coregistration Workflow
● Working with multiple raster layers with differing:
○ CRS, Resolution, Extent
● Must pick the working/target layout
● We can “push-down” this process into IO
Coregisteration Workflow
COGs for GeoTrellis
● Treating GeoTiff as a Key/Value tile store
○ With built in notion level of detail
● Turns an ingest workflow into query workflow
○ Use the metadata to plan the read
● Opens up the results
○ QGIS, GeoServer, GDAL
COGs in General
● GeoJSON of raster formats
● Gradated spec
● Set of best practices
● Driven by implementations
THANKS
Q: Is this a good idea ?
Q: So is this a standard ?
Q: Why not use MRF ?

Weitere ähnliche Inhalte

Was ist angesagt?

Disco API - OpenJDK distributions as a service
Disco API - OpenJDK distributions as a serviceDisco API - OpenJDK distributions as a service
Disco API - OpenJDK distributions as a serviceGerrit Grunwald
 
Arquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQLArquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQLRaul Oliveira
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Getting Started with Git and GitHub
Getting Started with Git and GitHubGetting Started with Git and GitHub
Getting Started with Git and GitHubRabiraj Khadka
 
RedisConf18 - Redis on Google Cloud Platform
RedisConf18 - Redis on Google Cloud PlatformRedisConf18 - Redis on Google Cloud Platform
RedisConf18 - Redis on Google Cloud PlatformRedis Labs
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHubVikram SV
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
The Fundamentals of Git
The Fundamentals of GitThe Fundamentals of Git
The Fundamentals of GitDivineOmega
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 

Was ist angesagt? (20)

Disco API - OpenJDK distributions as a service
Disco API - OpenJDK distributions as a serviceDisco API - OpenJDK distributions as a service
Disco API - OpenJDK distributions as a service
 
Arquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQLArquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQL
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Getting Started with Git and GitHub
Getting Started with Git and GitHubGetting Started with Git and GitHub
Getting Started with Git and GitHub
 
RedisConf18 - Redis on Google Cloud Platform
RedisConf18 - Redis on Google Cloud PlatformRedisConf18 - Redis on Google Cloud Platform
RedisConf18 - Redis on Google Cloud Platform
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
GitHubの使い方
GitHubの使い方 GitHubの使い方
GitHubの使い方
 
Github
GithubGithub
Github
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
Git & GitHub WorkShop
Git & GitHub WorkShopGit & GitHub WorkShop
Git & GitHub WorkShop
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
The Fundamentals of Git
The Fundamentals of GitThe Fundamentals of Git
The Fundamentals of Git
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 

Ähnlich wie Cloud Optimized GeotTIFFs: enabling efficient cloud workflows

GeoServer on Steroids
GeoServer on Steroids GeoServer on Steroids
GeoServer on Steroids GeoSolutions
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloudlohitvijayarenu
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupKaxil Naik
 
State of GeoServer
State of GeoServerState of GeoServer
State of GeoServerJody Garnett
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2Kaxil Naik
 
GeoServer on steroids
GeoServer on steroidsGeoServer on steroids
GeoServer on steroidsGeoSolutions
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudZhenxiao Luo
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 
State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016GeoSolutions
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container imagesAkihiro Suda
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integrationnguyenfilip
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerAkihiro Nomura
 

Ähnlich wie Cloud Optimized GeotTIFFs: enabling efficient cloud workflows (20)

Git Presentation
Git PresentationGit Presentation
Git Presentation
 
GeoServer on Steroids
GeoServer on Steroids GeoServer on Steroids
GeoServer on Steroids
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloud
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
ZGC-SnowOne.pdf
ZGC-SnowOne.pdfZGC-SnowOne.pdf
ZGC-SnowOne.pdf
 
State of GeoServer
State of GeoServerState of GeoServer
State of GeoServer
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
 
GeoServer on steroids
GeoServer on steroidsGeoServer on steroids
GeoServer on steroids
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016
 
week1slides1704202828322.pdf
week1slides1704202828322.pdfweek1slides1704202828322.pdf
week1slides1704202828322.pdf
 
Git-Basics
Git-BasicsGit-Basics
Git-Basics
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Encompassing Information Integration
Encompassing Information IntegrationEncompassing Information Integration
Encompassing Information Integration
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 

Kürzlich hochgeladen

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 

Kürzlich hochgeladen (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 

Cloud Optimized GeotTIFFs: enabling efficient cloud workflows

Hinweis der Redaktion

  1. This is original formalization of the idea by Even Rouault, its authoritative. Includes benchmarks using GDAL showing the difference in range read speeds.
  2. This is site managed by Chris Holmes, it is better overview and critical includes links to implementations and tools relevant to COG.
  3. TIFF header. Arrows are offsets in the TIFF file No order is implied Header is minimal does not have image information, IDFs have image information
  4. All of these are valid structures for TIFF. Left, easy to read, all the headers are at the top. (IDF should be seen as headers) Center, looks good but not clear what the advantage of this organization is. Right, easy to write if you don’t know when the image stream will end.
  5. Only the left organization is good for us because we get all the relevant header information at the top.
  6. COG spec is not firm about compression, deflate is common but slow, LZW is probably the most conservative choice. In any case the choice must be something that is understood by GDAL across all installs to preserve interoperability.
  7. In reality strips can even be one or two pixels high for large files.
  8. Fetching larger AOI will require a lot of IO time and network traffic
  9. Using overviews when downsampled information is sufficient, it very often is, saves the IO.
  10. Provenance is a huge problem, temptation is to use TIFF tags to keep track, but no standard exists yet. Histograms are super useful if you ever intend to view the file File naming convention seems tempting but impossible to enforce and thus expect Catalogs are the only other option
  11. This COG profile optimizes the IO to 1 request = 1 segment No reason to require this for all COGs Conclusion is that there is room to optimize COG layout for each system or application
  12. STAC static files provide reliant way to serve some index which is described using GeoJSON files REST and WFS 3.0 Services on top of static index provide an integration point for application
  13. You can play games with the names of the GeoTIFF files to come up with a scheme of finding the file we want only from request information, ID and geospatial bbox. You should consider sharding the scheme to avoid hotspotting your Blob store like S3. (See S3 best practices). People from GeoWave/GeoMesa/GeoTrellis project will have a lot of lessons about building static index like this that they would be happy to share. Also there are reusable components available from those projects.
  14. RasterFoundry is the raster management and manipulation system built on top of GeoTrellis.
  15. Level 0 support for COGs is serving tiles. RF does this using GeoTrellis.
  16. Tile request from RasterFoundry. Catalog is application specific COG Header is hot information, must be cached first Fetching COG segments may be too slow for quick screen refreshes
  17. An alternative view of tile request. Highlight here is that we’re reprojecting the COG on the fly per tile request.
  18. GeoTrellis layers are the constructs that represent a distributed raster collection that we can work over in cluster memory.
  19. Avro layers work really great on sorted stores like Accumulo but on S3 we end up with lots of small files.
  20. COG becomes our meta tile, its great. As a bonus the layers are now accessible from non GeoTrellis applications. This has potential to drastically reduce the data duplication.
  21. Note we had to add the DATE_TIME tag to keep track of the actual time. VRT files tie them together and make them accessible.
  22. There is only so much you can resample a set of tiles before you don’t have enough information to save At that point you need to collect the tops of the pyramid and reshuffle them into higher levels GeoTrellis handles this process when it writes out pyramided COG layers.
  23. Snapshot from OSM / RasterFrames / GeoTrellis machine learning project Best part is that now every layer written by GeoTrellis is accessible through QGIS This is a workflow where we combine source data, produces GeoTrellis layer and published OSM layer
  24. Thinking about COGs as mini-image databases
  25. We can put tile reading process in front of a distributed batch job. Instead of running it per request we run it per record. This effectively inlines the ingest process into the batch job itself.
  26. Same concerns that we talked about for making viewable COG catalogs are in place for making distributed imagery piplines sourcing COGs.
  27. Often we need more than one layer to do something interesting
  28. Fetch the metadata from the catalog to find what we want Fetch the metadata from the COGs to find out what it is Filter the potential IO using the COG headers Fetch the segments as tiles we know we need in a way we want
  29. Proof is in the pudding This GeoTrellis Landsat tiler coregisters scenes from multiple CRCs Reponsivness here gives us the idea that cost in a batch process is actually minimal.