SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Data Infrastructure for the
Earth & Space Science
How Far Have We Come,
Where Are We Heading?
Kerstin Lehnert
Lamont-Doherty Earth Observatory, Columbia University
April 10, 2018
Ian McHarg Lecture 2018
1
Before I start, a short detour ...
April 10, 2018
Ian McHarg Lecture 2018
2
The Kaiserstuhl, Germany
Making this lecture
April 10, 2018
Ian McHarg Lecture 2018
3
My goal
April 10, 2018
Ian McHarg Lecture 2018
4
study the past
if you would
define the future
Confucius
Learning from the past:
(1) The Big Picture
April 10, 2018
Ian McHarg Lecture 2018
5
2007
2018
https://www.rd-alliance.org/sites/default/files/Common_Patterns_in_Revolutionising_Infrastructures-final.pdf
Learning from the past:
(2) The Real World
The story of IEDA
(Interdisciplinary Earth Data Alliance)
www.iedadata.org
... there was a database named PetDB
April 10, 2018
Ian McHarg Lecture 2018
6
A biased perspective
I am a geoscientist who
directs a US data facility for
primarily investigator-based
data (“long tail”) funded by
the National Science
Foundation.
April 10, 2018
Ian McHarg Lecture 2018
7
www.iedadata.org
Defining the Topic
Data infrastructure is a digital
infrastructure promoting data sharing and
consumption.
Its goal is to enable researchers to make the best use of the
world’s growing wealth of data for the advancement of
science and the benefit of society.
April 10, 2018
Ian McHarg Lecture 2018
8
Data drive Earth science:
A new way of understanding the world
April 10, 2018
Ian McHarg Lecture 2018
9
Data:
The 4th Paradigm
The 5th Dimension
We have been talking about it for a
while ...
April 10, 2018
Ian McHarg Lecture 2018
10
2006
EGU ESSI
Abstract titles
April 10, 2018
Ian McHarg Lecture 2018
11
2008 2013
2018
Growth of Earth & Space Science Informatics
 63 ESSI session proposals – an increase of 40%
 729 ESSI abstracts – an increase of ~18.7 %
 35 ESSI oral sessions - an increase of ~40%
 4 Data Fair Town Halls
 Machine Learning/Deep Learning: biggest increase in any theme
 big increases also in FAIR, Repositories & Data Storage, and Adoption & Adaption
Carnegie Institution: Unleash the Power of Data 12
Credit: Lesley Wyborn
AGU FM Program Committee Member
AGU Fall Meeting 2017:
April 10, 2018
Ian McHarg Lecture 2018
13
Learning from the past: The Big Picture
Insights into the development of infrastructures
April 10, 2018
Ian McHarg Lecture 2018
14
Revolutionary!
April 10, 2018
Ian McHarg Lecture 2018
15
 Roman water supply system
 Railroad systems
 Global electrification
 Internet
Patterns of Infrastructure Development
Edwards et al. 2007
1. Deliberate and successful design of
‘local’ systems.
2. Technology transfer across domains
and locations
3. Infrastructure form via gateways
that allow dissimilar systems to be
linked into networks
Wittenburg & Strawn 2018
1. Inventions and development of
start-up systems
2. Technology transfer between
regions and also society
(creolization)
3. Planning for system growth where
"reverse salients" need to be
tackled
4. Substantial momentum (mass,
velocity, direction)
April 10, 2018
Ian McHarg Lecture 2018
16
System Building
Growth
Consolidation
Patterns of Infrastructure Development
Edwards et al. 2007
1. Deliberate and successful design of
‘local’ systems.
2. Technology transfer across domains
and locations
3. Infrastructure form via gateways
that allow dissimilar systems to be
linked into networks
Wittenburg & Strawn 2018
1. Inventions and development of
start-up systems
2. Technology transfer between
regions and also society
(creolization)
3. Planning for system growth where
"reverse salients" need to be
tackled
4. Substantial momentum (mass,
velocity, direction)
April 10, 2018
Ian McHarg Lecture 2018
17
System Building
Growth
Consolidation
Creolization
 New components are continuously introduced
trying to solve specific challenges
 Capabilities grow unevenly (e.g. big vs small data)
 Fragmentation
Leads to
 Inefficiencies in use and costs
 Winners & loosers: some solutions are more
promising and get more attraction
 Better understanding the underlying rules,
principles and limitations.
April 10, 2018
Ian McHarg Lecture 2018
18After Wittenburg & Strawn, 2018)
Attraction via “Universals”
 “Simple” principles, broadly supported
 Only influence directly a specific part of the
overall infrastructure, enable efficiency at the top
layers
 Form stable basis for new developments
April 10, 2018
Ian McHarg Lecture 2018
19After Wittenburg & Strawn, 2018)
“Universals are ... essential to create a
momentum by overcoming fragmentation and
achieving economies of scale.
Attraction is happening!
 Relevance of community organizations that
define principles, procedures, and component
specifications
 RDA: global & cross-disciplinary
 ESIP: Earth Science & US (others coming?)
 New: RDA Interest Group “ESIP/RDA Earth,
Space, and Environmental Sciences”
April 10, 2018
Ian McHarg Lecture 2018
20
Universal: FAIR principles
April 10, 2018
Ian McHarg Lecture 2018
21
 Represent a guideline for data providers to
enhance the reusability of their data holdings:
 Data can be found on the Internet.
 Data are accessible in a usable format with clear rights
and licenses.
 Data access is reliable & persistent.
 Data are identified in a unique and persistent way so
that they can be referred to and cited.
 Data are documented with rich metadata.
Universal:
Standards for data repositories
 Cooperative effort between Data Seal of Approval (DSA) and the World Data
System (WDS) under the umbrella of the Research Data Alliance (RDA)
 Harmonized requirements & procedures for certification of repositories
 Confidence for publishers and funders which repositories to trust
 Basis for development of new repositories
April 10, 2018
Ian McHarg Lecture 2018
22
“Enabling FAIR Data” project @ AGU
 Develop & implement standards that will connect researchers, publishers, and
data repositories in the Earth and space sciences to enable FAIR data
 Grant from the Laura and John Arnold Foundation (LJAF) to the AGU
 FAIR-compliant data repositories (CoreTrustSeal certified, preferred domain
specific)
 FAIR-compliant Earth and space science publishers
 Align their policies for data to be deposited in certified repositories
 Gives similar experience for researchers.
Carnegie Institution: Unleash the Power of Data 23
Slide after S. Stall et al., presentation at RDA P11
Berlin, March 2018
All publishers who are part of the
Coalition on Publishing Data in the Earth
and Space Sciences (COPDESS) support
the efforts of trusted repositories that
aggregate research data, software, and
physical samples for the use of the
scientific community.
Carnegie Institution: Unleash the Power of Data 24
“These Data Guidelines align the
Author’s instructions for the submission
of data sets in the Earth and Space
Sciences, for all affiliated publishers.”
Universal:
Persistent Identifiers
April 10, 2018
Ian McHarg Lecture 2018
25
Founded 2009
Founded 2011
Founded 2012
“The intention of this cross-
disciplinary report is to overcome still
existing confusions about PIDs and the
lack of detail knowledge in many
disciplines. ...to identify agreements
across documents that have been
suggested to be included by experts.”From: “Common Patterns in Revolutionary
Infrastructures and Data”
P. Wittenburg & G. Strawn, February 2018,
Learning from the past:
(2) The Real World
The story of IEDA
(Interdisciplinary Earth Data Alliance)
...there was a database named PetDB
April 10, 2018
Ian McHarg Lecture 2018
26
Once upon a time ...
April 10, 2018
Ian McHarg Lecture 2018
27
PetDB web site in 1999
April 10, 2018
Ian McHarg Lecture 2018
28
Note:
PetDB is a database that allows to access
data at the level of individual data
points, not files!
Success: New data-driven science
in geochemistry
April 10, 2018
Ian McHarg Lecture 2018
29
Meyzen et al. (2007): „Isotopic portrayal
of the Earth's upper mantle flow field.“
Putirka et al. (2007)
Stracke & Hofmann (2005)
Class & Goldstein (2007)
2018: 740 citations
An analysis in 2007
April 10, 2018
Ian McHarg Lecture 2018
30
T. Plank, 1999: “Within about 5 minutes of logging on for the first
time, I was staring at an EXCEL file that had all the REE on
basalt glasses from the EPR from 10°N to 20°S. And the answer
to my La/Sm question. I am very impressed, we are looking at
the future of geochemistry.”
GSA 2007 talk: “My Data, Your Data, Our Data!”
Attraction -
but partners
disappeared
April 10, 2018
Ian McHarg Lecture 2018
31
Another failed network attempt
 PaleoStrat not funded
 Development of interoperability
with CoreWall not funded
 Too many political obstacles
April 10, 2018
Ian McHarg Lecture 2018
32
“Promises, Achievements, and Challenges of
Networking Global Geoinformatics Resources”
EGU General Assembly 2008
Growth of data systems at Lamont
April 10, 2018
Ian McHarg Lecture 2018
33
Consolidation
“This Cooperative Agreement converts a series of proposal/award-driven
activities into a community-based facility that serves to support, sustain,
and advance the geosciences by providing a centralized location for the
registry of and access to data essential for research in the solid-earth and
polar sciences.”
- Continue operating & maintaining existing systems
- Develop tools for investigators to comply with NSF data policies (IEDA Data
Management Plan Tool & Data Compliance Reporting Tool)
- Develop tools and modify architecture to provide integrated access to holdings
April 10, 2018
Ian McHarg Lecture 2018
34
IEDA’s layered architecture
April 10, 2018
Ian McHarg Lecture 2018
35
The EUDAT model:
Shared
Partners
Shared
IEDA Today: Data Holdings & Growth
 > 70 TeraBytes of marine geophysical sensor data in the MGDS
 > 20 million analytical measurements for >1 million samples in
EarthChem
 > 4.2 million samples registered and searchable in SESAR (System
for Sample Registration)
11/15/17Presentation at NSF-EAR 36
IEDA Today
 Thousands of download requests per
month
 >2,000 citations in the literature
 ~ 10,000 start-ups of GeoMapApp per
month
 >2,700 GeoPass users*
 Demonstrated impact on science
11/15/17Presentation at NSF-EAR 37
*GeoPass accounts are required to submit data to EarthChem/
Geochron, SESAR, & USAP-DC, and to use the DMP Tool
0
50
100
150
200
250
NumberofCitationsPerYear
EarthChem/ PetDB / SedDB
MGDS/ GMRT/ GMA
Citations of IEDA Systems in the
Scientific Literature
IEDA is “attracting”
👍
 Certification: Member of World Data System since 2011 (CoreTrustSeal
certification underway)
 Use of Persistent Identifiers
 Publication agent of DataCite since 2011
 DOI registration of datasets since 2009 via TIB Hannover
 The International Geo Sample Number: A PID for physical sampleas
 FAIR data
 Finable/accessible: DOIs, landing pages, GUIs, APIs
 Interoperable: CSW, DataONE member node, schema.org (EarthCube project P418)
 Reusable: disciplinary expertise for data curation, rich provenance metadata
April 10, 2018
Ian McHarg Lecture 2018
38
Lessons Learnedr
April 10, 2018
Ian McHarg Lecture 2018
39
Merger of EarthChem & MGDS created
tensions
 Partner system needs versus overarching IEDA level needs
 Budget
 Staff expertise
 Staff allocations
 Distribution among different funding sources (3 different NSF programs)
 Scientific utility versus trustworthiness of operations
 Operation & maintenance versus innovation
April 10, 2018
Ian McHarg Lecture 2018
40
Merger did not lead to the expected
‘economies of scale’
 Disciplinary data curation continues as the most relevant component.
 Additional resources/effort needed for coordination and alignment of
activities and practices across partners.
 More project management required due to budget level and status as facility.
 Building useful data search and discovery across multi-disciplinary systems is a
challenging problem.
April 10, 2018
Ian McHarg Lecture 2018
41
Costpersystem
Achievements: IEDA Data Browser
April 10, 2018
Ian McHarg Lecture 2018
42
 Access to all IEDA repositories in one place
 Free text, map, and facet-based search
options
 ISO metadata available for other catalogs to
harvest
 Major work to align concepts and
vocabularies in the different repositories
 Challenge to agree on facets
 Relevance to different data types
 Availability of metadata
 Granularity of datasets
April 10, 2018
Ian McHarg Lecture 2018
43
Achievements:
IEDA Integrated Catalog
A changing ecosystem
“IEDA’s cross-disciplinary services for data discovery (IEDA Data Browser)
and data access (IEDA Integrated Catalog) across all IEDA systems are
increasingly superseded by tools developed with substantially larger
resources as part of EarthCube, Google (Google’s new Research Data
Search based on schema.org), or perhaps DataONE. These recent
developments aim to provide researchers with the tools to find and use
data in a highly distributed and fragmented data infrastructure based on
new approaches for interoperability, metadata registries, and hubs such
as SCHOLIX to link data and literature.”
IEDA: Future Scope and Structure
(IEDA internal report, K. Lehnert & S. Carbotte, January 2018)
April 10, 2018
Ian McHarg Lecture 2018
44
We need to adapt
� Reduce complexity of operations
� Adjust to and better leverage external CI developments (e.g. EarthCube)
� Enhance opportunities to grow partnerships relevant to the disciplinary
systems to target needs of the disciplinary communities
 Systems and/or services that serve broader audiences should be funded
independently (SESAR, GeoMapApp, GMRT)
 Create a new management/governance structure
 more independence for IEDA partners and funders to allow growth
 rely on external developments for cross-disciplinary services
Ian McHarg Lecture 2018
45
Where are we heading from here?
April 10, 2018
Ian McHarg Lecture 2018
46
Oh no, that diagram again ...
 A Digital Object has a structured bit sequence
stored in a trustworthy repository.
 A Digital Object has a PID and metadata.
 The PID is associated with all relevant kernel
information that allows humans and machines
to enable FAIR.
 Kernel information and Digit Object have types
allowing humans and machines to associate
operations with them.
April 10, 2018
Ian McHarg Lecture 2018
47
According to Wittenburg & Strawn (2018), the
implementation of data infrastructure can be
guided by 4 statements:
Re-
usability
Impact
on
Science
Sustaina-
bility
My take on priorities
April 10, 2018
Ian McHarg Lecture 2018
48
Data type specific best practices
Metadata quality
Granularity of access, data fusion
Metrics
Data Science Education
Business models
Consolidation
The impact of data
infrastructure on science
& society depends on the
reusability of data and
will ultimately justify its
continued funding.
Reusability problem: Metadata quality
 Discipline-specific and data type
specific metadata not well defined
and enforced
 Lack of consistent vocabularies
 Automated metadata enrichment
(e.g. CINERGI) has not yet had
convincing results
 Manual data curation still best,
but too costly
April 10, 2018
Ian McHarg Lecture 2018
49
“The Geochemical Data(base) Factory: From Heterogeneous Input to
Homogeneous Output. AGU FM 2009
Reusability problem: data wrangling
Surveys in recent years show that data scientists still spend 75-80% of their time
‘data wrangling’.
 RDA EU survey 2013 (75%)
 Brodie 2015 (80%)
 CrowdFlower 2017 (80%)
April 10, 2018
Ian McHarg Lecture 2018
50
Source:
Crowdflower
Reusability solution: Data Fusion
Harmonize & integrate data so that
disparate pieces of information form a
picture that can be explored to reveal
patterns in space, time, and properties.
April 10, 2018
Ian McHarg Lecture 2018
51
 Structure data so they can be accessed and
understood at a more granular level
 Approaches are available and improving
 ISO/OGC Observations & Measurements
 Observation Data Model ODM2 (Horsburgh et al. 2017)
 Schema.org
 Open Core Data
Reusability solution:
Data Fusion
April 10, 2018
Ian McHarg Lecture 2018
52
S. Cox et al. “Mainstream web standards now
support science data too”; AGU FM 2017
Reusability problem: The Long Tail
 Small data volumes, but big potential
 Culture is not open to sharing
 Data fragmented and highly heterogeneous
 Lots of .xls files
 Many data never see the light of day
April 10, 2018
Ian McHarg Lecture 2018
53
ESIP Winter Meeting, January 2016
Reusability hope: Generation change
“A new scientific truth does not triumph by
convincing its opponents and making them see
the light, but rather because its opponents
eventually die, and a new generation grows up
that is familiar with it.”
Max Planck
April 10, 2018
Ian McHarg Lecture 2018
54
April 10, 2018
Ian McHarg Lecture 2018
55
Credit: Jon Stelling, LeHigh University
 steps in the data life cycle are siloed in many
communities and disciplines
 Recommendation: focus on the full data life
cycle
April 10, 2018
Ian McHarg Lecture 2018
56
Final Report from the NSF Computer and Information Science and
Engineering Advisory Committee, Data Science Working Group
Communications of the ACM, Vol. 61 No. 4,
Pages 67-72, April 2018
A trend toward large facilities
April 10, 2018
Ian McHarg Lecture 2018
57
Education in Data Science or
Data Science in Education
 Data Science as a new field in academia
 Different organizational models emerging at academic
institutions to integrate with domain sciences
April 10, 2018
Ian McHarg Lecture 2018
58
I’ll leave the funding question to the
experts.
April 10, 2018
Ian McHarg Lecture 2018
59
 Trust of the science community
Funding
April 10, 2018
Ian McHarg Lecture 2018
60
“Funding research data management and related infrastructures”, May 2016
Authors: Knowledge Exchange Research Data Expert Group and Science Europe Working Group
on Research Data.
Did we move at all?
April 10, 2018
Ian McHarg Lecture 2018
61
Did we move at all?
2007
Success!
The International Geo Sample Number
 Grew from a local, centralized system started in 2004 to
an international organization founded in 2011
 Now has 24 members in 5 continents
 currently 5 active Allocating Agents
 Adoption by researchers, collection curators, publishers,
and funding agencies growing
 Adoption spreading to other disciplines
 Biology, archeology, material sciences
2/15/2018 62
4,261,436
2,100,273
100,342 30,925 4,809
IEDA Geoscience
Australia
MARUM CSIRO GFZ
# of IGSNs issued by active IGSN Allocating
Agents
Organic Biomarker Data Workshop
Newest members since 2017:
USGS (USA)
BGS (UK)
CNRS (France)
IFREMER (France)
ANDS (Australia)
The final message: Let’s work together!
 It is relevant that we leverage existing
capabilities and expertise.
 We do not have the luxury of duplicating
effort.
 We need to break down barriers between
communities and stakeholders that compete
for their piece of the pie.
April 10, 2018
Ian McHarg Lecture 2018
63
NSF Workshop Cyberinfrastructure for Large Facilities, Nov 2015
Back to the beginning:
April 10, 2018
Ian McHarg Lecture 2018
64
“Do what excites you. Follow your passion.
Don't necessarily worry about what obstacles
might be there, because there are always ways
to overcome them. But the most exciting thing
is to be able to do what you love, and just don't
let anything stand in the way of that.”
Carol Greider 2009 Nobel Prize winner
April 10, 2018
Ian McHarg Lecture 2018
65
For my parents
April 10, 2018
Ian McHarg Lecture 2018
66

Weitere ähnliche Inhalte

Was ist angesagt?

European Open Science Cloud
European Open Science CloudEuropean Open Science Cloud
European Open Science CloudJisc RDM
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesAndrew Woolf
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Vivien Bonazzi
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data3 Round Stones
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
 
Collaborative Data Science In A Highly Networked World
Collaborative Data Science In A Highly Networked WorldCollaborative Data Science In A Highly Networked World
Collaborative Data Science In A Highly Networked WorldIlkay Altintas, Ph.D.
 
David Park APAN Slid..
David Park APAN Slid..David Park APAN Slid..
David Park APAN Slid..Videoguy
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data ThingsKatina Toufexis
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc RDM
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondonPaul Agapow
 
Data safe havens: A future EOSC service?
Data safe havens: A future EOSC service?Data safe havens: A future EOSC service?
Data safe havens: A future EOSC service?EUDAT
 
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...nabo_ghea
 
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...datacite
 
End-to-End Research Data Management for the Responsible Conduct of Research
End-to-End Research Data Management for the Responsible Conduct of ResearchEnd-to-End Research Data Management for the Responsible Conduct of Research
End-to-End Research Data Management for the Responsible Conduct of ResearchARDC
 
Talking 'bout a revolution: Framing e-Research as a computerization movement
Talking 'bout a revolution: Framing e-Research as a computerization movementTalking 'bout a revolution: Framing e-Research as a computerization movement
Talking 'bout a revolution: Framing e-Research as a computerization movementEric Meyer
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
2016 urisa track: nhd hydro linked data registery by michael tinker
2016 urisa track:  nhd hydro linked data registery by michael tinker2016 urisa track:  nhd hydro linked data registery by michael tinker
2016 urisa track: nhd hydro linked data registery by michael tinkerGIS in the Rockies
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringSK Ahammad Fahad
 

Was ist angesagt? (20)

European Open Science Cloud
European Open Science CloudEuropean Open Science Cloud
European Open Science Cloud
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challenges
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data
 
RDA UK
RDA UKRDA UK
RDA UK
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 
Collaborative Data Science In A Highly Networked World
Collaborative Data Science In A Highly Networked WorldCollaborative Data Science In A Highly Networked World
Collaborative Data Science In A Highly Networked World
 
David Park APAN Slid..
David Park APAN Slid..David Park APAN Slid..
David Park APAN Slid..
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 Paper
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Data safe havens: A future EOSC service?
Data safe havens: A future EOSC service?Data safe havens: A future EOSC service?
Data safe havens: A future EOSC service?
 
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
 
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
 
End-to-End Research Data Management for the Responsible Conduct of Research
End-to-End Research Data Management for the Responsible Conduct of ResearchEnd-to-End Research Data Management for the Responsible Conduct of Research
End-to-End Research Data Management for the Responsible Conduct of Research
 
Talking 'bout a revolution: Framing e-Research as a computerization movement
Talking 'bout a revolution: Framing e-Research as a computerization movementTalking 'bout a revolution: Framing e-Research as a computerization movement
Talking 'bout a revolution: Framing e-Research as a computerization movement
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
2016 urisa track: nhd hydro linked data registery by michael tinker
2016 urisa track:  nhd hydro linked data registery by michael tinker2016 urisa track:  nhd hydro linked data registery by michael tinker
2016 urisa track: nhd hydro linked data registery by michael tinker
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
 

Ähnlich wie EGU 2018 Ian McHarg Lecture

The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspectivedri_ireland
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - finalKathy Fontaine
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceAndrew Sallans
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...African Open Science Platform
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresguest0dc425
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
 
ElN - repository integration at the University of Goettingen
ElN - repository integration at the University of GoettingenElN - repository integration at the University of Goettingen
ElN - repository integration at the University of Goettingenrmacneil88
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemWolfgang Kuchinke
 
Research Data Management Initiatives at the University of Edinburgh
Research Data Management Initiatives at the University of EdinburghResearch Data Management Initiatives at the University of Edinburgh
Research Data Management Initiatives at the University of EdinburghRobin Rice
 
Digital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific PublicationsDigital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific PublicationsKerstin Lehnert
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhurymaredata
 
EOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC visionEOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC visionEUDAT
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatoriesgloriakt
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
Goebel.jst.big.data.jan10 12.2017.4
Goebel.jst.big.data.jan10 12.2017.4Goebel.jst.big.data.jan10 12.2017.4
Goebel.jst.big.data.jan10 12.2017.4Randy Goebel
 
Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015Research Data Alliance
 

Ähnlich wie EGU 2018 Ian McHarg Lecture (20)

Rdaeu russia_fg_1_july2014_final
Rdaeu  russia_fg_1_july2014_finalRdaeu  russia_fg_1_july2014_final
Rdaeu russia_fg_1_july2014_final
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspective
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
ElN - repository integration at the University of Goettingen
ElN - repository integration at the University of GoettingenElN - repository integration at the University of Goettingen
ElN - repository integration at the University of Goettingen
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
Research Data Management Initiatives at the University of Edinburgh
Research Data Management Initiatives at the University of EdinburghResearch Data Management Initiatives at the University of Edinburgh
Research Data Management Initiatives at the University of Edinburgh
 
Digital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific PublicationsDigital Representation of Physical Samples in Scientific Publications
Digital Representation of Physical Samples in Scientific Publications
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
 
Data and science
Data and scienceData and science
Data and science
 
EOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC visionEOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC vision
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatories
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
Goebel.jst.big.data.jan10 12.2017.4
Goebel.jst.big.data.jan10 12.2017.4Goebel.jst.big.data.jan10 12.2017.4
Goebel.jst.big.data.jan10 12.2017.4
 
Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015
 

Mehr von Kerstin Lehnert

Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Kerstin Lehnert
 
Data Services for Geochemical Data
Data Services for Geochemical DataData Services for Geochemical Data
Data Services for Geochemical DataKerstin Lehnert
 
Lehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandardsLehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandardsKerstin Lehnert
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopKerstin Lehnert
 
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Kerstin Lehnert
 
EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015Kerstin Lehnert
 
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Kerstin Lehnert
 
Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
 
IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)Kerstin Lehnert
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Kerstin Lehnert
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordKerstin Lehnert
 
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...Kerstin Lehnert
 
The Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in ActionThe Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in ActionKerstin Lehnert
 
Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015Kerstin Lehnert
 
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...Kerstin Lehnert
 
iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)Kerstin Lehnert
 
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataMoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataKerstin Lehnert
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUKerstin Lehnert
 

Mehr von Kerstin Lehnert (18)

Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29Astromat Update on Developments 2021-01-29
Astromat Update on Developments 2021-01-29
 
Data Services for Geochemical Data
Data Services for Geochemical DataData Services for Geochemical Data
Data Services for Geochemical Data
 
Lehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandardsLehnert_EGU201_SampleMetadataStandards
Lehnert_EGU201_SampleMetadataStandards
 
Goldschmidt2019 Samples Workshop
Goldschmidt2019 Samples WorkshopGoldschmidt2019 Samples Workshop
Goldschmidt2019 Samples Workshop
 
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
 
EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015EarthCubeArchitectureWS_June2015
EarthCubeArchitectureWS_June2015
 
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...
 
Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)
 
IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)IGSN: The International Geo Sample Number (DFG Roundtable)
IGSN: The International Geo Sample Number (DFG Roundtable)
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...
 
The Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in ActionThe Internet of Samples: IGSN in Action
The Internet of Samples: IGSN in Action
 
Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015
 
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long...
 
iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)iSamples Research Coordination Network (C4P Webinar)
iSamples Research Coordination Network (C4P Webinar)
 
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataMoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGU
 

Kürzlich hochgeladen

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Kürzlich hochgeladen (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

EGU 2018 Ian McHarg Lecture

  • 1. Data Infrastructure for the Earth & Space Science How Far Have We Come, Where Are We Heading? Kerstin Lehnert Lamont-Doherty Earth Observatory, Columbia University April 10, 2018 Ian McHarg Lecture 2018 1
  • 2. Before I start, a short detour ... April 10, 2018 Ian McHarg Lecture 2018 2 The Kaiserstuhl, Germany
  • 3. Making this lecture April 10, 2018 Ian McHarg Lecture 2018 3
  • 4. My goal April 10, 2018 Ian McHarg Lecture 2018 4 study the past if you would define the future Confucius
  • 5. Learning from the past: (1) The Big Picture April 10, 2018 Ian McHarg Lecture 2018 5 2007 2018 https://www.rd-alliance.org/sites/default/files/Common_Patterns_in_Revolutionising_Infrastructures-final.pdf
  • 6. Learning from the past: (2) The Real World The story of IEDA (Interdisciplinary Earth Data Alliance) www.iedadata.org ... there was a database named PetDB April 10, 2018 Ian McHarg Lecture 2018 6
  • 7. A biased perspective I am a geoscientist who directs a US data facility for primarily investigator-based data (“long tail”) funded by the National Science Foundation. April 10, 2018 Ian McHarg Lecture 2018 7 www.iedadata.org
  • 8. Defining the Topic Data infrastructure is a digital infrastructure promoting data sharing and consumption. Its goal is to enable researchers to make the best use of the world’s growing wealth of data for the advancement of science and the benefit of society. April 10, 2018 Ian McHarg Lecture 2018 8
  • 9. Data drive Earth science: A new way of understanding the world April 10, 2018 Ian McHarg Lecture 2018 9 Data: The 4th Paradigm The 5th Dimension
  • 10. We have been talking about it for a while ... April 10, 2018 Ian McHarg Lecture 2018 10 2006
  • 11. EGU ESSI Abstract titles April 10, 2018 Ian McHarg Lecture 2018 11 2008 2013 2018
  • 12. Growth of Earth & Space Science Informatics  63 ESSI session proposals – an increase of 40%  729 ESSI abstracts – an increase of ~18.7 %  35 ESSI oral sessions - an increase of ~40%  4 Data Fair Town Halls  Machine Learning/Deep Learning: biggest increase in any theme  big increases also in FAIR, Repositories & Data Storage, and Adoption & Adaption Carnegie Institution: Unleash the Power of Data 12 Credit: Lesley Wyborn AGU FM Program Committee Member AGU Fall Meeting 2017:
  • 13. April 10, 2018 Ian McHarg Lecture 2018 13
  • 14. Learning from the past: The Big Picture Insights into the development of infrastructures April 10, 2018 Ian McHarg Lecture 2018 14
  • 15. Revolutionary! April 10, 2018 Ian McHarg Lecture 2018 15  Roman water supply system  Railroad systems  Global electrification  Internet
  • 16. Patterns of Infrastructure Development Edwards et al. 2007 1. Deliberate and successful design of ‘local’ systems. 2. Technology transfer across domains and locations 3. Infrastructure form via gateways that allow dissimilar systems to be linked into networks Wittenburg & Strawn 2018 1. Inventions and development of start-up systems 2. Technology transfer between regions and also society (creolization) 3. Planning for system growth where "reverse salients" need to be tackled 4. Substantial momentum (mass, velocity, direction) April 10, 2018 Ian McHarg Lecture 2018 16 System Building Growth Consolidation
  • 17. Patterns of Infrastructure Development Edwards et al. 2007 1. Deliberate and successful design of ‘local’ systems. 2. Technology transfer across domains and locations 3. Infrastructure form via gateways that allow dissimilar systems to be linked into networks Wittenburg & Strawn 2018 1. Inventions and development of start-up systems 2. Technology transfer between regions and also society (creolization) 3. Planning for system growth where "reverse salients" need to be tackled 4. Substantial momentum (mass, velocity, direction) April 10, 2018 Ian McHarg Lecture 2018 17 System Building Growth Consolidation
  • 18. Creolization  New components are continuously introduced trying to solve specific challenges  Capabilities grow unevenly (e.g. big vs small data)  Fragmentation Leads to  Inefficiencies in use and costs  Winners & loosers: some solutions are more promising and get more attraction  Better understanding the underlying rules, principles and limitations. April 10, 2018 Ian McHarg Lecture 2018 18After Wittenburg & Strawn, 2018)
  • 19. Attraction via “Universals”  “Simple” principles, broadly supported  Only influence directly a specific part of the overall infrastructure, enable efficiency at the top layers  Form stable basis for new developments April 10, 2018 Ian McHarg Lecture 2018 19After Wittenburg & Strawn, 2018) “Universals are ... essential to create a momentum by overcoming fragmentation and achieving economies of scale.
  • 20. Attraction is happening!  Relevance of community organizations that define principles, procedures, and component specifications  RDA: global & cross-disciplinary  ESIP: Earth Science & US (others coming?)  New: RDA Interest Group “ESIP/RDA Earth, Space, and Environmental Sciences” April 10, 2018 Ian McHarg Lecture 2018 20
  • 21. Universal: FAIR principles April 10, 2018 Ian McHarg Lecture 2018 21  Represent a guideline for data providers to enhance the reusability of their data holdings:  Data can be found on the Internet.  Data are accessible in a usable format with clear rights and licenses.  Data access is reliable & persistent.  Data are identified in a unique and persistent way so that they can be referred to and cited.  Data are documented with rich metadata.
  • 22. Universal: Standards for data repositories  Cooperative effort between Data Seal of Approval (DSA) and the World Data System (WDS) under the umbrella of the Research Data Alliance (RDA)  Harmonized requirements & procedures for certification of repositories  Confidence for publishers and funders which repositories to trust  Basis for development of new repositories April 10, 2018 Ian McHarg Lecture 2018 22
  • 23. “Enabling FAIR Data” project @ AGU  Develop & implement standards that will connect researchers, publishers, and data repositories in the Earth and space sciences to enable FAIR data  Grant from the Laura and John Arnold Foundation (LJAF) to the AGU  FAIR-compliant data repositories (CoreTrustSeal certified, preferred domain specific)  FAIR-compliant Earth and space science publishers  Align their policies for data to be deposited in certified repositories  Gives similar experience for researchers. Carnegie Institution: Unleash the Power of Data 23 Slide after S. Stall et al., presentation at RDA P11 Berlin, March 2018
  • 24. All publishers who are part of the Coalition on Publishing Data in the Earth and Space Sciences (COPDESS) support the efforts of trusted repositories that aggregate research data, software, and physical samples for the use of the scientific community. Carnegie Institution: Unleash the Power of Data 24 “These Data Guidelines align the Author’s instructions for the submission of data sets in the Earth and Space Sciences, for all affiliated publishers.”
  • 25. Universal: Persistent Identifiers April 10, 2018 Ian McHarg Lecture 2018 25 Founded 2009 Founded 2011 Founded 2012 “The intention of this cross- disciplinary report is to overcome still existing confusions about PIDs and the lack of detail knowledge in many disciplines. ...to identify agreements across documents that have been suggested to be included by experts.”From: “Common Patterns in Revolutionary Infrastructures and Data” P. Wittenburg & G. Strawn, February 2018,
  • 26. Learning from the past: (2) The Real World The story of IEDA (Interdisciplinary Earth Data Alliance) ...there was a database named PetDB April 10, 2018 Ian McHarg Lecture 2018 26
  • 27. Once upon a time ... April 10, 2018 Ian McHarg Lecture 2018 27 PetDB web site in 1999
  • 28. April 10, 2018 Ian McHarg Lecture 2018 28 Note: PetDB is a database that allows to access data at the level of individual data points, not files!
  • 29. Success: New data-driven science in geochemistry April 10, 2018 Ian McHarg Lecture 2018 29 Meyzen et al. (2007): „Isotopic portrayal of the Earth's upper mantle flow field.“ Putirka et al. (2007) Stracke & Hofmann (2005) Class & Goldstein (2007) 2018: 740 citations
  • 30. An analysis in 2007 April 10, 2018 Ian McHarg Lecture 2018 30 T. Plank, 1999: “Within about 5 minutes of logging on for the first time, I was staring at an EXCEL file that had all the REE on basalt glasses from the EPR from 10°N to 20°S. And the answer to my La/Sm question. I am very impressed, we are looking at the future of geochemistry.” GSA 2007 talk: “My Data, Your Data, Our Data!”
  • 31. Attraction - but partners disappeared April 10, 2018 Ian McHarg Lecture 2018 31
  • 32. Another failed network attempt  PaleoStrat not funded  Development of interoperability with CoreWall not funded  Too many political obstacles April 10, 2018 Ian McHarg Lecture 2018 32 “Promises, Achievements, and Challenges of Networking Global Geoinformatics Resources” EGU General Assembly 2008
  • 33. Growth of data systems at Lamont April 10, 2018 Ian McHarg Lecture 2018 33
  • 34. Consolidation “This Cooperative Agreement converts a series of proposal/award-driven activities into a community-based facility that serves to support, sustain, and advance the geosciences by providing a centralized location for the registry of and access to data essential for research in the solid-earth and polar sciences.” - Continue operating & maintaining existing systems - Develop tools for investigators to comply with NSF data policies (IEDA Data Management Plan Tool & Data Compliance Reporting Tool) - Develop tools and modify architecture to provide integrated access to holdings April 10, 2018 Ian McHarg Lecture 2018 34
  • 35. IEDA’s layered architecture April 10, 2018 Ian McHarg Lecture 2018 35 The EUDAT model: Shared Partners Shared
  • 36. IEDA Today: Data Holdings & Growth  > 70 TeraBytes of marine geophysical sensor data in the MGDS  > 20 million analytical measurements for >1 million samples in EarthChem  > 4.2 million samples registered and searchable in SESAR (System for Sample Registration) 11/15/17Presentation at NSF-EAR 36
  • 37. IEDA Today  Thousands of download requests per month  >2,000 citations in the literature  ~ 10,000 start-ups of GeoMapApp per month  >2,700 GeoPass users*  Demonstrated impact on science 11/15/17Presentation at NSF-EAR 37 *GeoPass accounts are required to submit data to EarthChem/ Geochron, SESAR, & USAP-DC, and to use the DMP Tool 0 50 100 150 200 250 NumberofCitationsPerYear EarthChem/ PetDB / SedDB MGDS/ GMRT/ GMA Citations of IEDA Systems in the Scientific Literature
  • 38. IEDA is “attracting” 👍  Certification: Member of World Data System since 2011 (CoreTrustSeal certification underway)  Use of Persistent Identifiers  Publication agent of DataCite since 2011  DOI registration of datasets since 2009 via TIB Hannover  The International Geo Sample Number: A PID for physical sampleas  FAIR data  Finable/accessible: DOIs, landing pages, GUIs, APIs  Interoperable: CSW, DataONE member node, schema.org (EarthCube project P418)  Reusable: disciplinary expertise for data curation, rich provenance metadata April 10, 2018 Ian McHarg Lecture 2018 38
  • 39. Lessons Learnedr April 10, 2018 Ian McHarg Lecture 2018 39
  • 40. Merger of EarthChem & MGDS created tensions  Partner system needs versus overarching IEDA level needs  Budget  Staff expertise  Staff allocations  Distribution among different funding sources (3 different NSF programs)  Scientific utility versus trustworthiness of operations  Operation & maintenance versus innovation April 10, 2018 Ian McHarg Lecture 2018 40
  • 41. Merger did not lead to the expected ‘economies of scale’  Disciplinary data curation continues as the most relevant component.  Additional resources/effort needed for coordination and alignment of activities and practices across partners.  More project management required due to budget level and status as facility.  Building useful data search and discovery across multi-disciplinary systems is a challenging problem. April 10, 2018 Ian McHarg Lecture 2018 41 Costpersystem
  • 42. Achievements: IEDA Data Browser April 10, 2018 Ian McHarg Lecture 2018 42
  • 43.  Access to all IEDA repositories in one place  Free text, map, and facet-based search options  ISO metadata available for other catalogs to harvest  Major work to align concepts and vocabularies in the different repositories  Challenge to agree on facets  Relevance to different data types  Availability of metadata  Granularity of datasets April 10, 2018 Ian McHarg Lecture 2018 43 Achievements: IEDA Integrated Catalog
  • 44. A changing ecosystem “IEDA’s cross-disciplinary services for data discovery (IEDA Data Browser) and data access (IEDA Integrated Catalog) across all IEDA systems are increasingly superseded by tools developed with substantially larger resources as part of EarthCube, Google (Google’s new Research Data Search based on schema.org), or perhaps DataONE. These recent developments aim to provide researchers with the tools to find and use data in a highly distributed and fragmented data infrastructure based on new approaches for interoperability, metadata registries, and hubs such as SCHOLIX to link data and literature.” IEDA: Future Scope and Structure (IEDA internal report, K. Lehnert & S. Carbotte, January 2018) April 10, 2018 Ian McHarg Lecture 2018 44
  • 45. We need to adapt � Reduce complexity of operations � Adjust to and better leverage external CI developments (e.g. EarthCube) � Enhance opportunities to grow partnerships relevant to the disciplinary systems to target needs of the disciplinary communities  Systems and/or services that serve broader audiences should be funded independently (SESAR, GeoMapApp, GMRT)  Create a new management/governance structure  more independence for IEDA partners and funders to allow growth  rely on external developments for cross-disciplinary services Ian McHarg Lecture 2018 45
  • 46. Where are we heading from here? April 10, 2018 Ian McHarg Lecture 2018 46
  • 47. Oh no, that diagram again ...  A Digital Object has a structured bit sequence stored in a trustworthy repository.  A Digital Object has a PID and metadata.  The PID is associated with all relevant kernel information that allows humans and machines to enable FAIR.  Kernel information and Digit Object have types allowing humans and machines to associate operations with them. April 10, 2018 Ian McHarg Lecture 2018 47 According to Wittenburg & Strawn (2018), the implementation of data infrastructure can be guided by 4 statements:
  • 48. Re- usability Impact on Science Sustaina- bility My take on priorities April 10, 2018 Ian McHarg Lecture 2018 48 Data type specific best practices Metadata quality Granularity of access, data fusion Metrics Data Science Education Business models Consolidation The impact of data infrastructure on science & society depends on the reusability of data and will ultimately justify its continued funding.
  • 49. Reusability problem: Metadata quality  Discipline-specific and data type specific metadata not well defined and enforced  Lack of consistent vocabularies  Automated metadata enrichment (e.g. CINERGI) has not yet had convincing results  Manual data curation still best, but too costly April 10, 2018 Ian McHarg Lecture 2018 49 “The Geochemical Data(base) Factory: From Heterogeneous Input to Homogeneous Output. AGU FM 2009
  • 50. Reusability problem: data wrangling Surveys in recent years show that data scientists still spend 75-80% of their time ‘data wrangling’.  RDA EU survey 2013 (75%)  Brodie 2015 (80%)  CrowdFlower 2017 (80%) April 10, 2018 Ian McHarg Lecture 2018 50 Source: Crowdflower
  • 51. Reusability solution: Data Fusion Harmonize & integrate data so that disparate pieces of information form a picture that can be explored to reveal patterns in space, time, and properties. April 10, 2018 Ian McHarg Lecture 2018 51
  • 52.  Structure data so they can be accessed and understood at a more granular level  Approaches are available and improving  ISO/OGC Observations & Measurements  Observation Data Model ODM2 (Horsburgh et al. 2017)  Schema.org  Open Core Data Reusability solution: Data Fusion April 10, 2018 Ian McHarg Lecture 2018 52 S. Cox et al. “Mainstream web standards now support science data too”; AGU FM 2017
  • 53. Reusability problem: The Long Tail  Small data volumes, but big potential  Culture is not open to sharing  Data fragmented and highly heterogeneous  Lots of .xls files  Many data never see the light of day April 10, 2018 Ian McHarg Lecture 2018 53 ESIP Winter Meeting, January 2016
  • 54. Reusability hope: Generation change “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.” Max Planck April 10, 2018 Ian McHarg Lecture 2018 54
  • 55. April 10, 2018 Ian McHarg Lecture 2018 55 Credit: Jon Stelling, LeHigh University
  • 56.  steps in the data life cycle are siloed in many communities and disciplines  Recommendation: focus on the full data life cycle April 10, 2018 Ian McHarg Lecture 2018 56 Final Report from the NSF Computer and Information Science and Engineering Advisory Committee, Data Science Working Group Communications of the ACM, Vol. 61 No. 4, Pages 67-72, April 2018
  • 57. A trend toward large facilities April 10, 2018 Ian McHarg Lecture 2018 57
  • 58. Education in Data Science or Data Science in Education  Data Science as a new field in academia  Different organizational models emerging at academic institutions to integrate with domain sciences April 10, 2018 Ian McHarg Lecture 2018 58
  • 59. I’ll leave the funding question to the experts. April 10, 2018 Ian McHarg Lecture 2018 59  Trust of the science community
  • 60. Funding April 10, 2018 Ian McHarg Lecture 2018 60 “Funding research data management and related infrastructures”, May 2016 Authors: Knowledge Exchange Research Data Expert Group and Science Europe Working Group on Research Data.
  • 61. Did we move at all? April 10, 2018 Ian McHarg Lecture 2018 61 Did we move at all? 2007
  • 62. Success! The International Geo Sample Number  Grew from a local, centralized system started in 2004 to an international organization founded in 2011  Now has 24 members in 5 continents  currently 5 active Allocating Agents  Adoption by researchers, collection curators, publishers, and funding agencies growing  Adoption spreading to other disciplines  Biology, archeology, material sciences 2/15/2018 62 4,261,436 2,100,273 100,342 30,925 4,809 IEDA Geoscience Australia MARUM CSIRO GFZ # of IGSNs issued by active IGSN Allocating Agents Organic Biomarker Data Workshop Newest members since 2017: USGS (USA) BGS (UK) CNRS (France) IFREMER (France) ANDS (Australia)
  • 63. The final message: Let’s work together!  It is relevant that we leverage existing capabilities and expertise.  We do not have the luxury of duplicating effort.  We need to break down barriers between communities and stakeholders that compete for their piece of the pie. April 10, 2018 Ian McHarg Lecture 2018 63 NSF Workshop Cyberinfrastructure for Large Facilities, Nov 2015
  • 64. Back to the beginning: April 10, 2018 Ian McHarg Lecture 2018 64 “Do what excites you. Follow your passion. Don't necessarily worry about what obstacles might be there, because there are always ways to overcome them. But the most exciting thing is to be able to do what you love, and just don't let anything stand in the way of that.” Carol Greider 2009 Nobel Prize winner
  • 65. April 10, 2018 Ian McHarg Lecture 2018 65 For my parents
  • 66. April 10, 2018 Ian McHarg Lecture 2018 66

Hinweis der Redaktion

  1. I am incredibly honored and humbled by this medal, and I really would like you to know how much this means to me. So before I start getting into the topic of RDI, I would like to take a brief detour and talk a little bit about how I got here and what the significance of this honor is in my life. In 1982 I was about ready to finish my dissertation in petrology when I got pregnant, married, and became a housewife. The scientific work that I was doing came to an end and my career seemed to be over before it had even started. Two years after my son was born, I took a half-time position as lab technician at the Max-Planck-Institute for Chemistry in town, and even though it did not pay any real money, it brought me back into the research environment. I had amazing colleagues, who encouraged me to finish my PhD, and supported me through a rough couple of years, when I tried to be a mom during the day and catch up with science at night. But it was the best thing I have done, and I am so grateful to all those colleagues. Without that PhD, I would not have been able to get the position as Staff Associate at the Lamont-Doherty Earth Observatory, when I moved to the US in 1996. In that position I had two main duties: to run a geochemistry lab and to build a database for volcanic rock geochemistry. And that was the beginning
  2. A lecture like this is a great opportunity to reflect on the past, where we started off and where we got to, and use the experiences that we collected ourselves in our work and the insights gained through broader developments – be they good or bad – to inform decisions regarding the future.
  3. I will take two different looks at the past: one is using the work of historians, economists, social scientists, and information scientists to understand the development of infrastructures and how insights can inform the development of data and cyberinfrastructure. In 2007 while preparing a presentation for a NSF workshop that was convened to envision the future of Geoinformatics in the US and globally, I found a report written by Paul Edwards and colleagues that was a real eye-opener and helped me and I think many others to put ongoing activities aimed at building cyberinfrastructure into a context. Just last month, while preparing for this lecture, I ran into a paper by Peter Wittenburg and George Strawn that builds on the same classic book by Thomas Hughes to define the path of data infrastructure for the future.
  4. The other one is based on my own experiences along the path of building data infrastructure for the solid earth sciences, especially the experiences gained in the creation and operation of the Interdisciplinary Earth data Alliance that I am directing.
  5. I word of caution first: The data universe is highly complex and diverse. I cannot possibly aspire to cover all topics and address every aspect. I am a geoscientist ...
  6. Vision: Enable an open, extensible, and evolvable digital science ecosystem. Facilitate research data, information, knowledge, and data tools discovery. Enhance problem-solving processes. Move and connect scientific data across scientific disciplines Manage scientific workflows Interoperation between scientific data and literature Integrated science policy framework Networked digital data systems & libraries that interoperate
  7. There are a number of drivers behind building data infrastructure: There is an ever growing, and maybe exponentially growing volume of data acquired in the sciences in general, and specifically in the Earth sciences where new data acquisition technologies and computing capabilities are used to gather observations from space, in the oceans, and on land, to simulate earth processes and to generate models that predict future paths. And there are data and the technologies to mine, analyze and visualize data are giving us new insights into the way the earth works and
  8. Lots of reports have come out.
  9. There is no doubt that infrastructures have a profound effect on nature of modern human societies Roman water supply system Opened the way to building the largest capital in ancient times, Railroad systems Allowed to exchange people & goods at unknown speeds and facilitated the first industrial revolution, Global electrification Changed the availability of power and facilitated the second industrial revolution. The Internet with its web applications Changed the availability of information and facilitated new kinds of businesses.
  10. Start with test installations, followed up by small size installations, then being extended stepwise to interconnected systems
  11. “Attraction and convergence are driven mainly by efficiency and economic concerns. The benefit of convergence is the belief of stakeholders that a stable fundament has been built, on top of which new investments and developments can be made to fully exploit the new technologies and infrastructures.”
  12. FAIR principles are a major milestone that represents an ‘attractor’ in the solution space. But FAIR principles express policy goals. They need to be translated into actions
  13. When businesses merge, it is often to achieve economies of scale. Larger organizations are typically able to produce goods and services more efficiently and at a lower per-unit cost than smaller businesses because fixed costs are spread out over a larger number of units. This is not always the case, however. Sometimes when two firms merge, being larger will actually create dis-economies of scale, where per unit production costs increase because of increased coordination costs.
  14. Re-usabiDomain standards Business models Workforce
  15. Quality Communities need to define disciplinary and data type specific best practices (documentation of provenance, uncertainties, etc.) Readiness for data mining & analysis Improve granularity of access Data fusion (the ‘data lake’)
  16. There are more lessons to be learned from the IGSN development, but that is for another talk.