This document outlines a presentation on analyzing large raster data in a Jupyter notebook with GeoPySpark on AWS. The presentation covers introductory material, exercises on working with land cover and Landsat imagery data, combining data layers to detect crop cycles, and combining different data types to create maps. It discusses where the notebooks are running, data sources, and GeoPySpark capabilities like working with space-time raster data. Attendees are encouraged to tweet maps created during the exercises.
2. Connect to the WIFI
Network: Harvard University
http://getonline.harvard.edu
Click “I am a guest”
Credentials:
U: foss4g2017@gmail.com
P: 7RFQU3rm
FIRST:
Find your Jupyter Notebook URL
https://git.io/v77lh
(lowercase L)
visit the URL next to your name
Log in to the Jupyter Hub
U: hadoop
P: hadoop
3. OUTLINE
8:00 - 8:30 Intro and Background
8:30 - 9:10 Section 1: Land Cover data
9:10 - 10:00 Section 2: Landsat 8 data
10:00 - 10:10 BREAK
10:10 - 10:30 Deployment and Ingestion
10:30 - 11:10 Section 3: Combining data layers
11:10 - 12:00 Section 4: Making Cool Maps
21. +
+
Interactive and Batch Processing
of large raster data
Web-Speed Processing
of small to medium sized raster data
22. GeoTrellis Ecosystem
Raster Foundry by
Spark SQL and Spark ML support
Raster Frames by
Spark SQL and Spark ML support
GeoPySpark
Python bindings
Vector Pipes
Vector Tiles on Spark
PDAL integration
Point Clouds on Spark
24. Started December 2016
Follows PySpark’s model of communication
between the JavaVirtual Machine and Python
Access GeoTrellis functionality through Python,
and integrates with your favorite python raster
tools (numpy + friends).
0.2 is released!
GeoPySpark