Weitere ähnliche Inhalte Ähnlich wie Working with Open Data on AWS (20) Mehr von Amazon Web Services (20) Working with Open Data on AWS1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Jed Sundwall, Global Open Data Lead
Open Data on AWS
https://opendata.aws
2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Agenda
Overview of Open Data on AWS
How shared data on the cloud can accelerate research
Finding data shared on AWS
Sharing data on AWS
3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
No Up Front Expense
Pay for what you Use
Improve Time to
Market & Agility
Scale Up and
Down
Self-Service
Infrastructure
AWS Cloud
Equipment
Resources and
Administration
Contracts Cost
Traditional
Infrastructure
4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Why does AWS care about open data?
Many AWS customers supply data
to the public to accelerate research
and product development.
Many AWS customers use data
shared on AWS to create new
products and services.
5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Companies want more value from their data
Complications:
Siloed approaches don’t work anymore
It’s too expensive and limiting to store data
on-premises
Data is:
Implication:
A new approach is needed to extract insights
and value
Growing
exponentially
From new
sources
Increasingly
diverse
Used by
many people
Analyzed by
many applications
6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“…data must be organized, well-documented,
consistently formatted, and error free. Cleaning the data
is often the most taxing part of data science, and is
frequently 80% of the work.”
— Data Driven by DJ Patil and Hilary Mason
Undifferentiated heavy lifting
7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Flipped data flow in the cloud
Traditional approach:
Move data to computing resources.
Cloud approach:
Move computing resources to data.
Amazon S3
Amazon
EC2
Amazon
EMR
Amazon
Athena
8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cloud data lakes are the future
Customers want:
To eliminate data silos
To move to a single store, i.e. a data lake in the cloud
To store data securely in standard formats
To grow to any scale, with low costs
To analyze their data in a variety of ways
To have real-time analytics
To predict future outcomes
Data Lake
9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Sharing data in the cloud lets data users
spend more time on data analysis rather
than data acquisition.
https://opendata.aws
10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Advantages of sharing data in the cloud
Global community of users
Faster pace of research Lower cost of research
New services and tools
11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Public Datasets
https://registry.opendata.aws
12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS Public Datasets
https://registry.opendata.aws/tag/satellite-imagery
13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data at work
14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“…data must be organized, well-documented,
consistently formatted, and error free. Cleaning the data
is often the most taxing part of data science, and is
frequently 80% of the work.”
— Data Driven by DJ Patil and Hilary Mason
Undifferentiated heavy lifting
15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Graph by Drew Bollinger (@drewbo19) at Development Seed
Landsat on AWS
16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Monitoring at-risk bodies of water from space
The Blue Dot Observatory uses
Sentinel-2 satellite data on AWS to
monitor water bodies around the world.
“The cost to process one month of data
for about 7,000 bodies of water
currently in the system is 6 EUR. It is
possible to set up world-scale systems
with a shoestring budget.”
opendata.aws/bluedot
17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Opening high resolution elevation data
USGS shares over 10 trillion lidar point
cloud records from across the US on AWS.
“The democratization of elevation data
[…] promises to revolutionize approaches
to applications from flood forecasting and
geologic assessments to precision
agriculture and infrastructure
development.”
— Kevin Gallagher, USGS
opendata.aws/usgs-lidar
18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Detect earthquakes in real time
Grillo shares seismic data produced by
their network of IoT-based sensors
deployed in Mexico, Chile and Costa Rica.
This data can be processed in real time to
detect earthquakes occurring off the coast
of Chile before they are felt in Santiago.
opendata.aws/iot-earthquake
19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Facilitating over 31 million journeys made in London every
day
When Transport for London opened up
access to its data, application developers
and researchers used it to create more
than 600 applications that provide
services to 42 percent of Londoners,
saving an up to estimated £130 million
per year.
opendata.aws/tfl
20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Accelerating cybersecurity research
“By providing better research data, we can
help fight cyberattacks across Canada and
throughout the world and help networks
be more secure.”
— Mike Davie, Canadian Communications
Security Establishment
opendata.aws/cse-cyber
21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“Recently, the National Oceanic and
Atmospheric Administration and Amazon Web
Services (AWS) Cloud made available one of
the largest datasets describing animal
movement ever compiled: …”
— Adriaan M. Dokter et al. Nature (2018)
22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“Recently, the National Oceanic and
Atmospheric Administration and Amazon Web
Services (AWS) Cloud made available one of
the largest datasets describing animal
movement ever compiled: the Next Generation
Weather Radar (NEXRAD) archive.”
— Adriaan M. Dokter et al. Nature (2018)
opendata.aws/bird-radar
23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Finding data on AWS
Using the Registry of Open Data on AWS (RODA)
25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS
https://registry.opendata.aws/
26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS – Tags
https://registry.opendata.aws/tag/sustainability
27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS – Usage examples
https://registry.opendata.aws/usage-examples
28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Registry of Open Data on AWS – How to contribute
https://github.com/awslabs/open-data-registry
29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Sharing data (on AWS)
What we’ve learned
30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
What makes a dataset successful?
It is treated like a product.
31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
What makes a dataset successful?
It is treated like a product.
It is optimized for analysis.
33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Highly processedRaw
Userbase
34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Raw Accessible
Documented
Trustworthy
Userbase
35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Traditional GeoTIFF bundle
.tar
36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cloud-optimized GeoTIFF
37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Cloud-optimized GeoTIFF
38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Patterns
S3 Key Index External
Index
Internal Index
39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Example: Allen Brain Observatory Key Naming
https://registry.opendata.aws/allen-brain-observatory/
40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Example: IRS 990 CSV as External Index
41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
What makes a dataset successful?
It is treated like a product.
It is optimized for analysis.
There is a community around it.
42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Pangeo community platform
http://pangeo.io
43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Pangeo community platform
http://pangeo.io
44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Guide to sharing data on AWS
Over 40 pages of insights on data
sharing and case studies from
customers including:
- Transport for London
- Canada’s Communications
Security Establishment
- The US Geological Survey
- The Allen Institute for Brain
Science
https://opendata.aws/guide-pdf
45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Thank you!
jed@amazon.com