SlideShare ist ein Scribd-Unternehmen logo
1 von 45
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Jed Sundwall, Global Open Data Lead
Open Data on AWS
https://opendata.aws
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Agenda
Overview of Open Data on AWS
How shared data on the cloud can accelerate research
Finding data shared on AWS
Sharing data on AWS
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
No Up Front Expense
Pay for what you Use
Improve Time to
Market & Agility
Scale Up and
Down
Self-Service
Infrastructure
AWS Cloud
Equipment
Resources and
Administration
Contracts Cost
Traditional
Infrastructure
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Why does AWS care about open data?
Many AWS customers supply data
to the public to accelerate research
and product development.
Many AWS customers use data
shared on AWS to create new
products and services.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Companies want more value from their data
Complications:
Siloed approaches don’t work anymore
It’s too expensive and limiting to store data
on-premises
Data is:
Implication:
A new approach is needed to extract insights
and value
Growing
exponentially
From new
sources
Increasingly
diverse
Used by
many people
Analyzed by
many applications
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“…data must be organized, well-documented,
consistently formatted, and error free. Cleaning the data
is often the most taxing part of data science, and is
frequently 80% of the work.”
— Data Driven by DJ Patil and Hilary Mason
Undifferentiated heavy lifting
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Flipped data flow in the cloud
Traditional approach:
Move data to computing resources.
Cloud approach:
Move computing resources to data.
Amazon S3
Amazon
EC2
Amazon
EMR
Amazon
Athena
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cloud data lakes are the future
Customers want:
To eliminate data silos
To move to a single store, i.e. a data lake in the cloud
To store data securely in standard formats
To grow to any scale, with low costs
To analyze their data in a variety of ways
To have real-time analytics
To predict future outcomes
Data Lake
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Sharing data in the cloud lets data users
spend more time on data analysis rather
than data acquisition.
https://opendata.aws
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Advantages of sharing data in the cloud
Global community of users
Faster pace of research Lower cost of research
New services and tools
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Public Datasets
https://registry.opendata.aws
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Public Datasets
https://registry.opendata.aws/tag/satellite-imagery
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data at work
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“…data must be organized, well-documented,
consistently formatted, and error free. Cleaning the data
is often the most taxing part of data science, and is
frequently 80% of the work.”
— Data Driven by DJ Patil and Hilary Mason
Undifferentiated heavy lifting
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Graph by Drew Bollinger (@drewbo19) at Development Seed
Landsat on AWS
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Monitoring at-risk bodies of water from space
The Blue Dot Observatory uses
Sentinel-2 satellite data on AWS to
monitor water bodies around the world.
“The cost to process one month of data
for about 7,000 bodies of water
currently in the system is 6 EUR. It is
possible to set up world-scale systems
with a shoestring budget.”
opendata.aws/bluedot
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Opening high resolution elevation data
USGS shares over 10 trillion lidar point
cloud records from across the US on AWS.
“The democratization of elevation data
[…] promises to revolutionize approaches
to applications from flood forecasting and
geologic assessments to precision
agriculture and infrastructure
development.”
— Kevin Gallagher, USGS
opendata.aws/usgs-lidar
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Detect earthquakes in real time
Grillo shares seismic data produced by
their network of IoT-based sensors
deployed in Mexico, Chile and Costa Rica.
This data can be processed in real time to
detect earthquakes occurring off the coast
of Chile before they are felt in Santiago.
opendata.aws/iot-earthquake
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Facilitating over 31 million journeys made in London every
day
When Transport for London opened up
access to its data, application developers
and researchers used it to create more
than 600 applications that provide
services to 42 percent of Londoners,
saving an up to estimated £130 million
per year.
opendata.aws/tfl
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Accelerating cybersecurity research
“By providing better research data, we can
help fight cyberattacks across Canada and
throughout the world and help networks
be more secure.”
— Mike Davie, Canadian Communications
Security Establishment
opendata.aws/cse-cyber
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“Recently, the National Oceanic and
Atmospheric Administration and Amazon Web
Services (AWS) Cloud made available one of
the largest datasets describing animal
movement ever compiled: …”
— Adriaan M. Dokter et al. Nature (2018)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“Recently, the National Oceanic and
Atmospheric Administration and Amazon Web
Services (AWS) Cloud made available one of
the largest datasets describing animal
movement ever compiled: the Next Generation
Weather Radar (NEXRAD) archive.”
— Adriaan M. Dokter et al. Nature (2018)
opendata.aws/bird-radar
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Finding data on AWS
Using the Registry of Open Data on AWS (RODA)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS
https://registry.opendata.aws/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS – Tags
https://registry.opendata.aws/tag/sustainability
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS – Usage examples
https://registry.opendata.aws/usage-examples
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS – How to contribute
https://github.com/awslabs/open-data-registry
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Sharing data (on AWS)
What we’ve learned
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
What makes a dataset successful?
It is treated like a product.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
What makes a dataset successful?
It is treated like a product.
It is optimized for analysis.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Highly processedRaw
Userbase
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Raw Accessible
Documented
Trustworthy
Userbase
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Traditional GeoTIFF bundle
.tar
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cloud-optimized GeoTIFF
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cloud-optimized GeoTIFF
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Patterns
S3 Key Index External
Index
Internal Index
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Example: Allen Brain Observatory Key Naming
https://registry.opendata.aws/allen-brain-observatory/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Example: IRS 990 CSV as External Index
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
What makes a dataset successful?
It is treated like a product.
It is optimized for analysis.
There is a community around it.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Pangeo community platform
http://pangeo.io
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Pangeo community platform
http://pangeo.io
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Guide to sharing data on AWS
Over 40 pages of insights on data
sharing and case studies from
customers including:
- Transport for London
- Canada’s Communications
Security Establishment
- The US Geological Survey
- The Allen Institute for Brain
Science
https://opendata.aws/guide-pdf
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Thank you!
jed@amazon.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Islamic banking solution by oracle
Islamic banking solution by oracleIslamic banking solution by oracle
Islamic banking solution by oracle
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Augury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoTAugury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoT
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
AWSで作る分析基盤
AWSで作る分析基盤AWSで作る分析基盤
AWSで作る分析基盤
 
Partie1BI-DW2019
Partie1BI-DW2019Partie1BI-DW2019
Partie1BI-DW2019
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
Snowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdfSnowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
採用スライド2022.pdf
採用スライド2022.pdf採用スライド2022.pdf
採用スライド2022.pdf
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Access
AccessAccess
Access
 

Ähnlich wie Working with Open Data on AWS

Ähnlich wie Working with Open Data on AWS (20)

Open Data on AWS
Open Data on AWSOpen Data on AWS
Open Data on AWS
 
AWS Open Data
AWS Open DataAWS Open Data
AWS Open Data
 
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
 
Open Data on AWS
Open Data on AWSOpen Data on AWS
Open Data on AWS
 
Democratization of Big Data - GeoSummit 2018
Democratization of Big Data - GeoSummit 2018Democratization of Big Data - GeoSummit 2018
Democratization of Big Data - GeoSummit 2018
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Machine Learning, Open Data, and the Future WarFighter
Machine Learning, Open Data, and the Future WarFighterMachine Learning, Open Data, and the Future WarFighter
Machine Learning, Open Data, and the Future WarFighter
 
How Different Large Organizations are Approaching Cloud Adoption
How Different Large Organizations are Approaching Cloud AdoptionHow Different Large Organizations are Approaching Cloud Adoption
How Different Large Organizations are Approaching Cloud Adoption
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
 
Data Lifecycle Management
Data Lifecycle ManagementData Lifecycle Management
Data Lifecycle Management
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Working with Open Data on AWS

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Jed Sundwall, Global Open Data Lead Open Data on AWS https://opendata.aws
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Agenda Overview of Open Data on AWS How shared data on the cloud can accelerate research Finding data shared on AWS Sharing data on AWS
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark No Up Front Expense Pay for what you Use Improve Time to Market & Agility Scale Up and Down Self-Service Infrastructure AWS Cloud Equipment Resources and Administration Contracts Cost Traditional Infrastructure
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Why does AWS care about open data? Many AWS customers supply data to the public to accelerate research and product development. Many AWS customers use data shared on AWS to create new products and services.
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Companies want more value from their data Complications: Siloed approaches don’t work anymore It’s too expensive and limiting to store data on-premises Data is: Implication: A new approach is needed to extract insights and value Growing exponentially From new sources Increasingly diverse Used by many people Analyzed by many applications
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “…data must be organized, well-documented, consistently formatted, and error free. Cleaning the data is often the most taxing part of data science, and is frequently 80% of the work.” — Data Driven by DJ Patil and Hilary Mason Undifferentiated heavy lifting
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Flipped data flow in the cloud Traditional approach: Move data to computing resources. Cloud approach: Move computing resources to data. Amazon S3 Amazon EC2 Amazon EMR Amazon Athena
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cloud data lakes are the future Customers want: To eliminate data silos To move to a single store, i.e. a data lake in the cloud To store data securely in standard formats To grow to any scale, with low costs To analyze their data in a variety of ways To have real-time analytics To predict future outcomes Data Lake
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Sharing data in the cloud lets data users spend more time on data analysis rather than data acquisition. https://opendata.aws
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Advantages of sharing data in the cloud Global community of users Faster pace of research Lower cost of research New services and tools
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Public Datasets https://registry.opendata.aws
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Public Datasets https://registry.opendata.aws/tag/satellite-imagery
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data at work
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “…data must be organized, well-documented, consistently formatted, and error free. Cleaning the data is often the most taxing part of data science, and is frequently 80% of the work.” — Data Driven by DJ Patil and Hilary Mason Undifferentiated heavy lifting
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Graph by Drew Bollinger (@drewbo19) at Development Seed Landsat on AWS
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Monitoring at-risk bodies of water from space The Blue Dot Observatory uses Sentinel-2 satellite data on AWS to monitor water bodies around the world. “The cost to process one month of data for about 7,000 bodies of water currently in the system is 6 EUR. It is possible to set up world-scale systems with a shoestring budget.” opendata.aws/bluedot
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Opening high resolution elevation data USGS shares over 10 trillion lidar point cloud records from across the US on AWS. “The democratization of elevation data […] promises to revolutionize approaches to applications from flood forecasting and geologic assessments to precision agriculture and infrastructure development.” — Kevin Gallagher, USGS opendata.aws/usgs-lidar
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Detect earthquakes in real time Grillo shares seismic data produced by their network of IoT-based sensors deployed in Mexico, Chile and Costa Rica. This data can be processed in real time to detect earthquakes occurring off the coast of Chile before they are felt in Santiago. opendata.aws/iot-earthquake
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Facilitating over 31 million journeys made in London every day When Transport for London opened up access to its data, application developers and researchers used it to create more than 600 applications that provide services to 42 percent of Londoners, saving an up to estimated £130 million per year. opendata.aws/tfl
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Accelerating cybersecurity research “By providing better research data, we can help fight cyberattacks across Canada and throughout the world and help networks be more secure.” — Mike Davie, Canadian Communications Security Establishment opendata.aws/cse-cyber
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “Recently, the National Oceanic and Atmospheric Administration and Amazon Web Services (AWS) Cloud made available one of the largest datasets describing animal movement ever compiled: …” — Adriaan M. Dokter et al. Nature (2018)
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “Recently, the National Oceanic and Atmospheric Administration and Amazon Web Services (AWS) Cloud made available one of the largest datasets describing animal movement ever compiled: the Next Generation Weather Radar (NEXRAD) archive.” — Adriaan M. Dokter et al. Nature (2018) opendata.aws/bird-radar
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Finding data on AWS Using the Registry of Open Data on AWS (RODA)
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS https://registry.opendata.aws/
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS – Tags https://registry.opendata.aws/tag/sustainability
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS – Usage examples https://registry.opendata.aws/usage-examples
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS – How to contribute https://github.com/awslabs/open-data-registry
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Sharing data (on AWS) What we’ve learned
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark What makes a dataset successful? It is treated like a product.
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark What makes a dataset successful? It is treated like a product. It is optimized for analysis.
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Highly processedRaw Userbase
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Raw Accessible Documented Trustworthy Userbase
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Traditional GeoTIFF bundle .tar
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cloud-optimized GeoTIFF
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cloud-optimized GeoTIFF
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Patterns S3 Key Index External Index Internal Index
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Example: Allen Brain Observatory Key Naming https://registry.opendata.aws/allen-brain-observatory/
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Example: IRS 990 CSV as External Index
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark What makes a dataset successful? It is treated like a product. It is optimized for analysis. There is a community around it.
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Pangeo community platform http://pangeo.io
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Pangeo community platform http://pangeo.io
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Guide to sharing data on AWS Over 40 pages of insights on data sharing and case studies from customers including: - Transport for London - Canada’s Communications Security Establishment - The US Geological Survey - The Allen Institute for Brain Science https://opendata.aws/guide-pdf
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Thank you! jed@amazon.com