This document discusses how putting data to work through community. It outlines the traditional approach of individual science projects versus a community approach. The traditional approach involves scientists independently finding, accessing, analyzing and publishing data. The community approach advocates opening this process up through shared infrastructure and standards to allow more collaborative data reuse. It provides examples of communities like the air quality community that have worked to develop interoperable standards and services. Overall, it argues that a community approach where data and standards are shared can lead to more open science and greater data reuse.
Powerful Google developer tools for immediate impact! (2023-24 C)
Putting Data to Work: Moving science forward together beyond where we thought possible!
1. Putting Data to Work Through Community
Erin Robinson
2020 Leptoukh Lecture
Fall 2020 AGU Meeting
Source: https://svs.gsfc.nasa.gov/30701
2. Making Data Matter
Perform filtering/masking
Find data
Retrieve high
volume data
Extract parameters
Perform spatial
and other subsetting
Identify quality and other
flags and constraints
Develop analysis
and visualization
Accept/discard/get more data
(sat, model, ground-based)
Learn formats
and develop readers
Jan
Mar
Jun
Pre-Science
DO SCIENCE Exploration
Use the best data for
the final analysis
Write the paper
Initial Analysis
Derive conclusions
Present @ AGU
Sept
Dec
Adapted from Leptoukh, 2012
Traditional Project Approach
[Open] Science User Barriers [to Open Data]
3. Making Data Matter
Perform filtering/masking
Find data
Retrieve high
volume data
Extract parameters
Perform spatial
and other subsetting
Identify quality and other
flags and constraints
Develop analysis
and visualization
Accept/discard/get more data
(sat, model, ground-based)
Learn formats
and develop readers
Jan
Mar
Jun
Pre-Science
DO SCIENCE Exploration
Use the best data for
the final analysis
Write the paper
Initial Analysis
Derive conclusions
Present @ AGU
Sept
Dec Adapted from
Leptoukh, 2012
Traditional Project Approach
4. Source: “Entryways to open data science and the power of welcome”, J. Lowndes 2020
https://youtu.be/HAh7Xy9ReJo?t=1789
10. To be a leader in promoting
the collection, stewardship and (re)use
Of Earth science data, information and knowledge
that is responsive to societal needs.
ESIP Vision
11. 11
ESIP does not:
• Provide data
• Sustain
cyberinfrastructure
• Compete with our
members
• Develop standards
INFORMATION INTEROPERABILITY STACK
Generate
recommendations
and work products.
Have a lasting
impact in the
recommendation of
standards.
14. Air Quality Community
Experiences and Perspectives on
International Interoperability Standards
IGARSS, 30 July 2010
Honolulu,HI
Presented by Erin Robinson
Erin Robinson, Stefan Falke, Rudolf Husar, David McCabe,
Frank Lindsay, Chris Lynnes, Greg Leptoukh, Beate Hildenbrand,
Oleg Goussev, Peter Sommer
18. Com Client
Data
User
OGC CSW
Queryable
Air Quality
Specific
ISO 19115
CSW Profile
OGC CSW
Returnable
Metadata
Description
Data
Binding
Air Quality Community Record
GEOSS Clearinghouse
harvests metadata from
distributed catalogs
Community/Provider
Catalogs
Data
Providers
Data
Access
Service
20. Project A combines multiple
data sources to generate
near-real time information
for the public
Project A provides web
service interfaces to some of
its data and information
Project A
22. New Project
A new project uses
services from projects A,
B and C to meet its
objectives
23. FAIR Guiding Principles
23
Article in Nature journal Scientific Data: Wilkinson,
M. D. et al. The FAIR Guiding Principles for
scientific data management and stewardship. Sci.
Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).
FAIR is…
Findable
Accessible
Interoperable
Reusable
26. • 2:02 – 3:43
Source: “Christa Hasenkopf of OpenAQ Pitches at Fast Forward Demo Day”, C. Hasenkopf 2019
https://www.youtube.com/watch?v=BHSD9W-HGOg&feature=youtu.be&t=122
27. 1/9/2021 27
The possibility of being able to implement things that we could
only think about 20 years ago because the computational
capability is available now is quite exciting.
– Hampapuram Ramapriyan, NASA/SSAI
art by @allison_horst
Source: Julie Lowndes
28. Source: “Making Ocean Data Useful”, R. Abernathyf 2020
https://youtu.be/He9_2C01Z0I?t=144
29. There is an urgent need to
improve the infrastructure
supporting the reuse of scholarly
data.
- From The FAIR Guiding Principles for scientific data management and stewardship
31. There is an urgent need to
improve the [Global
Collaborative] infrastructure
supporting the (re)use of
scholarly data.
- (Modified, Erin Robinson) From The FAIR Guiding Principles for scientific data
management and stewardship
37. Identifiers make the connections
between researchers,
repositories and publishers
possible and allow sharing credit
across all partners.
https://blog.datacite.org/powering-the-pid-graph/
41. What is the DATA HELP DESK?
Provides researchers with opportunities to
engage with informatics experts familiar with
their scientific domain and learn about skills
and techniques that help further research and
make data and software open and FAIR.
Data FAIR is a program of ESIP, EarthCube, AGU, & their partners.
45. Openscapes empowers scientists with open data
science, focusing on teams and community
We approach open science as a:
● spectrum – entryways to meet researchers where they
are
● behavior change – new skillsets and mindsets
● movement – empowering leaders and champions
Openscapes Champions is a mentorship program for research
teams
● Remote-by-design & cohort-based, Mozilla-style
● 13 teams mentored so far from academia and government
openscapes.org
Lowndes 2019
Lowndes et al. 2019
Biggest impact: research teams work more openly together
Reframe analysis as collaborative effort not an individual burden.
• students participate in research faster • grant money goes further • co-
creating norms promoting diversity, equity & inclusion • new collabs
Biggest lesson: power of research teams to normalize open
46. The FAIR Island project offers a real-world example to prove
the capabilities of machine-actionable data management plans
(maDMPs) and to analyze the downstream effects of these
policies in the resulting release of data.
https://www.fairisland.org/
47. Perform filtering/masking
Find data
Retrieve high
volume data
Extract parameters
Perform spatial
and other subsetting
Identify quality and other
flags and constraints
Develop analysis
and visualization
Accept/discard/get more data
(sat, model, ground-based)
Learn formats
and develop readers
Jan
Mar
Jun
Pre-Science
DO SCIENCE
Exploration
Use the best data for
the final analysis
Write the paper
Initial Analysis
Derive conclusions
Present @ AGU
Sept
Submit the paper
Minutes
Days for
exploration
Use the best data for
the final analysis
Write the paper &
cite artifacts
Derive conclusions
Dec
Putting Data to Work
Work with repository to
manage data
Publish data & code
New Way
Old Way
With Community
49. Thank You! &
Acknowledgements
49
Connect with me:
@connector_erin
erinrobinson.net
erinmr@gmail.com
Ted Habermann, Metadata Game Changer
Ryan Abernathy, Columbia University
Bruce Caron, The New Media Studio
Christa Hasenkopf, OpenAQ
Julia Lowndes, OpenScapes
Rudy Husar, Greg Leptoukh, Stefan Falke and all of the AQ
Collaborators
ESIP Funders: NASA, NOAA & USGS
ESIP Community & Collaborators