SlideShare a Scribd company logo
1 of 54
Download to read offline
Rob Emanuele @lossyrob
ANALYZING LARGE RASTER DATA
IN A JUPYTER NOTEBOOK
WITH GEOPYSPARK
ON AWS
Connect to the WIFI
Network: Harvard University
http://getonline.harvard.edu
Click “I am a guest”
Credentials:
U: foss4g2017@gmail.com
P: 7RFQU3rm
FIRST:
Find your Jupyter Notebook URL
https://git.io/v77lh
(lowercase L)
visit the URL next to your name
Log in to the Jupyter Hub
U: hadoop
P: hadoop
OUTLINE
8:00 - 8:30 Intro and Background
8:30 - 9:10 Section 1: Land Cover data
9:10 - 10:00 Section 2: Landsat 8 data
10:00 - 10:10 BREAK
10:10 - 10:30 Deployment and Ingestion
10:30 - 11:10 Section 3: Combining data layers
11:10 - 12:00 Section 4: Making Cool Maps
NOW:
A MOTIVATING EXAMPLE
BY
rdd.map(lambda x: x + 1)
Source: http://silverpond.com.au/2016/10/06/balancing-spark.ht
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
Node 1
Node 2
Node 3
rdd.bufferTiles(…)
+
+
Interactive and Batch Processing
of large raster data
Web-Speed Processing
of small to medium sized raster data
GeoTrellis Ecosystem
Raster Foundry by
Spark SQL and Spark ML support
Raster Frames by
Spark SQL and Spark ML support
GeoPySpark
Python bindings
Vector Pipes
Vector Tiles on Spark
PDAL integration
Point Clouds on Spark
GeoPySpark
Started December 2016
Follows PySpark’s model of communication
between the JavaVirtual Machine and Python
Access GeoTrellis functionality through Python,
and integrates with your favorite python raster
tools (numpy + friends).
0.2 is released!
GeoPySpark
EXERCISE 1:
ANALYZING LAND COVER DATA
EXERCISE 2:
WORKING WITH LANDSAT IMAGERY
AND NDVITHROUGHTIME
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
SpaceTimeKey ≈  (col, row, instant)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
lambda
lambda
lambda
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
…
…
(SpatialKey, [(DateTime, Tile)
(DateTime, Tile)])
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
(SpatialKey, [(DateTime, Tile)])
…
…
(SpatialKey, [(DateTime, Tile)
(DateTime, Tile)])
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
(SpatialKey, [(DateTime, Tile)])
(Shuffle)
…
(SpatialKey, [(DateTime, Tile)
(DateTime, Tile)])
(SpatialKey, [(DateTime, Tile)])
…
mosaic
(SpatialKey, Tile)
(SpatialKey, Tile)
…
mosaic
BREAK!
WHERE AND HOW ARETHESE
NOTEBOOKS RUNNING?
WHERE’STHIS DATA COMING
FROM?
Supported Backends
EXERCISE 3:
COMBINING LAND COVER AND NDVITO
DETECT CROP CYCLES
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
map_to_spatial
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
…
map_to_spatial
map_to_spatial
STK = SpaceTimeKey
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
…
(SpatialKey, Tile)
(SpatialKey, Tile)
…
ndwi_rdd
nlcd_layer.to_numpy_rdd()
(SpatialKey, ((STK, Tile), Tile))
(SpatialKey, ((STK, Tile), Tile))
(SpatialKey, ((STK, Tile),Tile))
…
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
…
(SpatialKey, Tile)
(SpatialKey, Tile)
…
ndwi_rdd
nlcd_layer.to_numpy_rdd()
(SpatialKey, ((STK, Tile), Tile))
(SpatialKey, ((STK, Tile), Tile))
(SpatialKey, ((STK, Tile),Tile))
…
(Shuffle)
mask_ndwi
mask_ndwi
mask_ndwi
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
(SpatialKey, ((STK, Tile), Tile))
(SpatialKey, ((STK, Tile), Tile))
(SpatialKey, ((STK, Tile),Tile))
…
EXERCISE 4:
COMBINING IMAGERY, ELEVATION AND
LAND COVER DATA
TO MAKE A COOL LOOKING MAP
EXERCISE 4:
COMBINING IMAGERY, ELEVATION AND
LAND COVER DATA
TO MAKE A COOL LOOKING MAP
TWEETYOUR SWEET MAP SCREENSHOTS WITH
#GEOPYSPARK #FOSS4G!
FINAL QUESTIONS?
Thank you!

More Related Content

What's hot

A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
Databricks
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit
 
Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with Dask
ASI Data Science
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
Databricks
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
 

What's hot (20)

SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
 
R user group 2011 09
R user group 2011 09R user group 2011 09
R user group 2011 09
 
Autoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series dataAutoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series data
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Global Grid of Grapes
Global Grid of GrapesGlobal Grid of Grapes
Global Grid of Grapes
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with Dask
 
High Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataHigh Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris Data
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
Artmosphere Demo
Artmosphere DemoArtmosphere Demo
Artmosphere Demo
 

Similar to Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop

Similar to Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop (20)

Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
afternoon3.pdf
afternoon3.pdfafternoon3.pdf
afternoon3.pdf
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & Spark
 
Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
An Overview of Spinnaker
An Overview of SpinnakerAn Overview of Spinnaker
An Overview of Spinnaker
 
Mapping with Drupal
Mapping with DrupalMapping with Drupal
Mapping with Drupal
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Dynamic viz in the IPython Notebook
Dynamic viz in the IPython NotebookDynamic viz in the IPython Notebook
Dynamic viz in the IPython Notebook
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
TheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the RescueTheEdge10 : Big Data is Here - Hadoop to the Rescue
TheEdge10 : Big Data is Here - Hadoop to the Rescue
 
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open DataOSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
FOSS4G 2017 - Geonotebook: an extension to the jupyter notebook for explora...
FOSS4G 2017 - Geonotebook:   an extension to the jupyter notebook for explora...FOSS4G 2017 - Geonotebook:   an extension to the jupyter notebook for explora...
FOSS4G 2017 - Geonotebook: an extension to the jupyter notebook for explora...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop