SlideShare ist ein Scribd-Unternehmen logo
1 von 16
The habits of highly successful data:
How to help your dataset
achieve its full potential
University of Illinois, Urbana Champaign
May 7, 2014
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
http://researchdata.elsevier.com/
Why should we care about Research Data?
Funding bodies:
 Demonstrate impact
 Guarantee permanence,
discoverability
 Avoid fraud
 Avoid double funding
 Serve general public
Research Management/Libary:
 Generate, track outputs
 Comply with mandates
 Ensure availability
Phil Bourne, (then) Associate Vice Chancellor, UCSD, 4/13:
“We need to think about the university as a digital enterprise.”
Mike Huerta, Ass. Director NLM:
“Today, the major public product of science are concepts, written
down in papers. But tomorrow, data will be the main product of
science…. We will require scientists to track and share their data as
least as well, if not better, than they are sharing their ideas today.”
Researchers:
 Derive credit
 Comply with mandates
 Discover and use
 Cite/acknowledge
Nathan Urban, PI Urban Lab, CMU, 3/13:
“If we can share our data, we can write a paper that will knock
everybody’s socks off!”
Barbara Ransom, NSF Program Director Earth Sciences:
“We’re not going to spend any more money for you to go out and get
more data! We want you first to show us how you’re going to use all
the data we paid y’all to collect in the past!”
What’s the problem? One example:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
7. Trusted (validated/checked by reviewers)
Maslow’s Hierarchy of Needs for Research Data
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
2. Archived (long-term & format-
independent)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
3. Accessible (can be accessed by others)
8. Citable (able to point & track citations)
1. Preserve: Data Rescue Challenge
• With IEDA/Lamont: award succesful data
rescue attempts
• Awarded at AGU 2013
• 23 submissions of data that was digitized,
preserved, made available
• Winner: NIMBUS Data Rescue:
– Recovery, reprocessing and digitization of the
infrared and visible observations along with their
navigation and formatting.
– Over 4000 7-track tapes of global infrared
satellite data were read and reprocessed.
– Nearly 200,000 visible light images were
scanned, rectified and navigated.
– All the resultant data was converted to HDF-5
(NetCDF) format and freely distributed to users
from NASA and NSIDC servers.
– This data was then used to calculate monthly sea
ice extents for both the Arctic d the Antarctic.
• Conclusion: we (collectively) need to do more
of this! How can we fund it?
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
3. Accessible (can be accessed by others)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
2. Archived (long-term & format-
independent)
8. Citable (able to point & track citations)
2. Archive: Olive Project
• CMU CS & Library: funded by a grant
from the IMLS, Elsevier is partner
• Goal: Preservation of executable content
- nowadays a large part of intellectual
output, and very fragile
• Identified a series of software packages
and prepared VM to preserve
• Does it work? Yes – see video (1:24)
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
3. Access: Urban Legend
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
• Part 1: Metadata acquisition
• Step through experimental process in series
of dropdown menus in simple web UI
• Can be tailored to workflow of individual
researcher
• Connected to shared ontologies through
lookup table, managed centrally in lab
• Connect to data input console (Igor Pro)
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
4. Comprehend: Urban Legend
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
• Part 2: Data Dashboard
• Access, select and manipulate data (calculate
properties, sort and plot)
• Final goal: interactive figures linked to data
• Plan to expand to more neuroscience labs
• Plan to build for geochemistry use case
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
5. Discover: Data Indexing proposals
• Collaborated on Data Discovery Index
proposal with UCSD/Carnegie Mellon
• Also worked with UIUC!
• Interested in developing distributed
infrastructures on making data easier to
search: what is the ‘Goldilocks lndex’ where
search is scalable, yet useful?
• Looking for academic/industry partners/use
cases/platforms to address the next stage
• Discoverability is key driver for
metadata/data format structure!
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
6. Reproduce: Resource Identifier Initiative
Force11 Working Group to add data identifiers
to articles that is
– 1) Machine readable;
– 2) Free to generate and access;
– 3) Consistent across publishers and journals.
• Authors publishing in participating journals
will be asked to provide RRID's for their
resources; these are added to the keyword
field
• RRID's will be drawn from:
– The Antibody Registry
– Model Organism Databases
– NIF Resource Registry
• So far, Springer, Wiley, Biomednet, Elsevier
journals have signed up with 11 journals,
more to come
• Wide community adoption!
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
7.Trust: Moonrocks
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
How can we scale up data curation?
Pilot project with IEDA:
• A database for lunar geochemistry:
leapfrog & improve curation time
• 1-year pilot, funded by Elsevier
• Main conclusion: if spreadsheet
columns/headers map to RDB
schema we can scale curation cost!
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
8. Cite: Force11 Data Citation Principles
• Another Force11 Working group
• Defined 8 principles:
• Now seeking endorsement/working on
implementation
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
1. Importance: Data should be considered legitimate, citable products of
research. Data citations should be accorded the same importance in
the scholarly record as citations of other research objects, such as
publications.
2. Credit and attribution: Data citations should facilitate giving scholarly
credit and normative and legal attribution to all contributors to the
data, recognizing that a single style or mechanism of attribution may
not be applicable to all data.
3. Evidence: Where a specific claim rests upon data, the corresponding
data citation should be provided.
4. Unique Identification: A data citation should include a persistent
method for identification that is machine actionable, globally unique,
and widely used by a community.
5. Access: Data citations should facilitate access to the data themselves
and to such associated metadata, documentation, and other materials,
as are necessary for both humans and machines to make informed use
of the referenced data.
6. Persistence: Metadata describing the data, and unique identifiers
should persist, even beyond the lifespan of the data they describe.
7. Versioning and granularity: Data citations should facilitate
identification and access to different versions and/or subsets of data.
Citations should include sufficient detail to verifiably link the citing
work to the portion and version of data cited.
8. Interoperability and flexibility: Data citation methods should be
sufficiently flexible to accommodate the variant practices among
communities but should not differ so much that they compromise
interoperability of data citation practices across communities.
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
8. Citable (able to point & track citations)
9. Use: Executable Papers
• Result of a challenge to come up with
cyberinfrastructure components to
enable executable papers
• Pilot in Computer Science journals
– See all code in the paper
– Save it, export it
– Change it and rerun on data set:
3. Accessible (can be accessed by others)
2. Archived (long-term & format-
independent)
Putting it all together:
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
2. Archived (long-term & format-
independent)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
3. Accessible (can be accessed by others)
8. Citable (able to point & track citations)
Experimental Metadata:
Workflows, Samples, Settings, Reagents, Organisms, etc.
Record Metadata: DOI, Date, Author, Institute, etc.
Processed Data:
Mathematically/computationally processed
data: correlations, plots, etc.
Raw Data: Direct outputs from equipment:
images, traces, spectra, etc.
Methods and Equipment: Reagents,
settings, manufacturer’s details, etc.
Validation: Approval, Reproduction, Selection,
Quality Stamp
Morecuration
Moreusable
So how can we help research data
be more happy and productive?
• Group therapy: Force11, W3C, other fora – shared
standards help everyone (we play well with others !)
• Financial therapy: we have a lot of content & IT skills to
support data-driven processes to support grant
proposals; funders like us.
• Creative therapy: innovative collaboration projects that
expand everyone’s mind – let’s put your data through its
paces
• Relationship therapy: happy to address any issues or
concerns!
Collaborations and discussions gratefully acknowledged:
– CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy
– UCSD: Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky
– NIF: Maryann Martone, Anita Bandrowski
– Force11: Ed Hovy, Tim Clark, Ivan Herman, Paul Groth, Maryann Martone,
Cameron Neylon, Stephanie Hagstrom
– OHSU: Melissa Haendel, Nicole Vasilevsky
– Columbia/IEDA: Kerstin Lehnert, Leslie Hsu
– MIT: Micah Altman
Thank you!
http://researchdata.elsevier.com/
Anita de Waard
a.dewaard@elsevier.com

Weitere ähnliche Inhalte

Was ist angesagt?

Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016 Rebecca Raworth, MLIS
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and LibrariansJohann van Wyk
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practicesLeon Osinski
 
Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Rebekah Cummings
 
What funders want you to do with your data
What funders want you to do with your dataWhat funders want you to do with your data
What funders want you to do with your dataLeon Osinski
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementDaniel JACOB
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Sarah Shreeves
 
NPA Data science: Progression pathway topics
NPA Data science: Progression pathway topicsNPA Data science: Progression pathway topics
NPA Data science: Progression pathway topicsKate Farrell
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at DataverseMerce Crosas
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 
A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4Leon Osinski
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 
Research data management at TU Eindhoven
Research data management at TU EindhovenResearch data management at TU Eindhoven
Research data management at TU EindhovenLeon Osinski
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
 

Was ist angesagt? (20)

Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practices
 
Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...Ownership, intellectual property, and governance considerations for academic ...
Ownership, intellectual property, and governance considerations for academic ...
 
What funders want you to do with your data
What funders want you to do with your dataWhat funders want you to do with your data
What funders want you to do with your data
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
 
NPA Data science: Progression pathway topics
NPA Data science: Progression pathway topicsNPA Data science: Progression pathway topics
NPA Data science: Progression pathway topics
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4
 
Dataset Citation and Identifiers: DOIs, ARKs, and EZID
Dataset Citation and Identifiers: DOIs, ARKs, and EZIDDataset Citation and Identifiers: DOIs, ARKs, and EZID
Dataset Citation and Identifiers: DOIs, ARKs, and EZID
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Research data management at TU Eindhoven
Research data management at TU EindhovenResearch data management at TU Eindhoven
Research data management at TU Eindhoven
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
 

Andere mochten auch

The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 

Andere mochten auch (6)

The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 

Ähnlich wie The habits of highly successful data:

Ten Habits of Highly Effective Data
Ten Habits of Highly Effective DataTen Habits of Highly Effective Data
Ten Habits of Highly Effective DataAnita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
Effective research data management
Effective research data managementEffective research data management
Effective research data managementCatherine Gold
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsMarieke Guy
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystemMaryann Martone
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016Rebecca Raworth, MLIS
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationMichael Day
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challengehopbeat
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordKerstin Lehnert
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 

Ähnlich wie The habits of highly successful data: (20)

Ten Habits of Highly Effective Data
Ten Habits of Highly Effective DataTen Habits of Highly Effective Data
Ten Habits of Highly Effective Data
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challenge
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 

Mehr von Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareAnita de Waard
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papersAnita de Waard
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
How to persuade with data
How to persuade with dataHow to persuade with data
How to persuade with dataAnita de Waard
 

Mehr von Anita de Waard (17)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
History of the future
History of the futureHistory of the future
History of the future
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
How to persuade with data
How to persuade with dataHow to persuade with data
How to persuade with data
 

Kürzlich hochgeladen

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Kürzlich hochgeladen (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

The habits of highly successful data:

  • 1. The habits of highly successful data: How to help your dataset achieve its full potential University of Illinois, Urbana Champaign May 7, 2014 Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http://researchdata.elsevier.com/
  • 2. Why should we care about Research Data? Funding bodies:  Demonstrate impact  Guarantee permanence, discoverability  Avoid fraud  Avoid double funding  Serve general public Research Management/Libary:  Generate, track outputs  Comply with mandates  Ensure availability Phil Bourne, (then) Associate Vice Chancellor, UCSD, 4/13: “We need to think about the university as a digital enterprise.” Mike Huerta, Ass. Director NLM: “Today, the major public product of science are concepts, written down in papers. But tomorrow, data will be the main product of science…. We will require scientists to track and share their data as least as well, if not better, than they are sharing their ideas today.” Researchers:  Derive credit  Comply with mandates  Discover and use  Cite/acknowledge Nathan Urban, PI Urban Lab, CMU, 3/13: “If we can share our data, we can write a paper that will knock everybody’s socks off!” Barbara Ransom, NSF Program Director Earth Sciences: “We’re not going to spend any more money for you to go out and get more data! We want you first to show us how you’re going to use all the data we paid y’all to collect in the past!”
  • 3. What’s the problem? One example: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story.
  • 4. 7. Trusted (validated/checked by reviewers) Maslow’s Hierarchy of Needs for Research Data 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 2. Archived (long-term & format- independent) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 3. Accessible (can be accessed by others) 8. Citable (able to point & track citations)
  • 5. 1. Preserve: Data Rescue Challenge • With IEDA/Lamont: award succesful data rescue attempts • Awarded at AGU 2013 • 23 submissions of data that was digitized, preserved, made available • Winner: NIMBUS Data Rescue: – Recovery, reprocessing and digitization of the infrared and visible observations along with their navigation and formatting. – Over 4000 7-track tapes of global infrared satellite data were read and reprocessed. – Nearly 200,000 visible light images were scanned, rectified and navigated. – All the resultant data was converted to HDF-5 (NetCDF) format and freely distributed to users from NASA and NSIDC servers. – This data was then used to calculate monthly sea ice extents for both the Arctic d the Antarctic. • Conclusion: we (collectively) need to do more of this! How can we fund it? 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 6. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 2. Archived (long-term & format- independent) 8. Citable (able to point & track citations) 2. Archive: Olive Project • CMU CS & Library: funded by a grant from the IMLS, Elsevier is partner • Goal: Preservation of executable content - nowadays a large part of intellectual output, and very fragile • Identified a series of software packages and prepared VM to preserve • Does it work? Yes – see video (1:24)
  • 7. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 3. Access: Urban Legend 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) • Part 1: Metadata acquisition • Step through experimental process in series of dropdown menus in simple web UI • Can be tailored to workflow of individual researcher • Connected to shared ontologies through lookup table, managed centrally in lab • Connect to data input console (Igor Pro)
  • 8. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 4. Comprehend: Urban Legend 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) • Part 2: Data Dashboard • Access, select and manipulate data (calculate properties, sort and plot) • Final goal: interactive figures linked to data • Plan to expand to more neuroscience labs • Plan to build for geochemistry use case
  • 9. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 5. Discover: Data Indexing proposals • Collaborated on Data Discovery Index proposal with UCSD/Carnegie Mellon • Also worked with UIUC! • Interested in developing distributed infrastructures on making data easier to search: what is the ‘Goldilocks lndex’ where search is scalable, yet useful? • Looking for academic/industry partners/use cases/platforms to address the next stage • Discoverability is key driver for metadata/data format structure! 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 10. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 6. Reproduce: Resource Identifier Initiative Force11 Working Group to add data identifiers to articles that is – 1) Machine readable; – 2) Free to generate and access; – 3) Consistent across publishers and journals. • Authors publishing in participating journals will be asked to provide RRID's for their resources; these are added to the keyword field • RRID's will be drawn from: – The Antibody Registry – Model Organism Databases – NIF Resource Registry • So far, Springer, Wiley, Biomednet, Elsevier journals have signed up with 11 journals, more to come • Wide community adoption! 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 11. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 7.Trust: Moonrocks 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) How can we scale up data curation? Pilot project with IEDA: • A database for lunar geochemistry: leapfrog & improve curation time • 1-year pilot, funded by Elsevier • Main conclusion: if spreadsheet columns/headers map to RDB schema we can scale curation cost!
  • 12. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 8. Cite: Force11 Data Citation Principles • Another Force11 Working group • Defined 8 principles: • Now seeking endorsement/working on implementation 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) 1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 3. Evidence: Where a specific claim rests upon data, the corresponding data citation should be provided. 4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 6. Persistence: Metadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe. 7. Versioning and granularity: Data citations should facilitate identification and access to different versions and/or subsets of data. Citations should include sufficient detail to verifiably link the citing work to the portion and version of data cited. 8. Interoperability and flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities.
  • 13. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 9. Use: Executable Papers • Result of a challenge to come up with cyberinfrastructure components to enable executable papers • Pilot in Computer Science journals – See all code in the paper – Save it, export it – Change it and rerun on data set: 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 14. Putting it all together: 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 2. Archived (long-term & format- independent) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 3. Accessible (can be accessed by others) 8. Citable (able to point & track citations) Experimental Metadata: Workflows, Samples, Settings, Reagents, Organisms, etc. Record Metadata: DOI, Date, Author, Institute, etc. Processed Data: Mathematically/computationally processed data: correlations, plots, etc. Raw Data: Direct outputs from equipment: images, traces, spectra, etc. Methods and Equipment: Reagents, settings, manufacturer’s details, etc. Validation: Approval, Reproduction, Selection, Quality Stamp Morecuration Moreusable
  • 15. So how can we help research data be more happy and productive? • Group therapy: Force11, W3C, other fora – shared standards help everyone (we play well with others !) • Financial therapy: we have a lot of content & IT skills to support data-driven processes to support grant proposals; funders like us. • Creative therapy: innovative collaboration projects that expand everyone’s mind – let’s put your data through its paces • Relationship therapy: happy to address any issues or concerns!
  • 16. Collaborations and discussions gratefully acknowledged: – CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy – UCSD: Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky – NIF: Maryann Martone, Anita Bandrowski – Force11: Ed Hovy, Tim Clark, Ivan Herman, Paul Groth, Maryann Martone, Cameron Neylon, Stephanie Hagstrom – OHSU: Melissa Haendel, Nicole Vasilevsky – Columbia/IEDA: Kerstin Lehnert, Leslie Hsu – MIT: Micah Altman Thank you! http://researchdata.elsevier.com/ Anita de Waard a.dewaard@elsevier.com