Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Looking for Data: Finding New Science
1. Looking for Data:
Finding New Science
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
http://researchdata.elsevier.com/
2. Why should science publishers care
about Research Data?Funding bodies:
Demonstrate impact
Guarantee permanence,
discoverability
Avoid fraud
Avoid double funding
Serve general public
Research Management/Libary:
Generate, track outputs
Comply with mandates
Ensure availability
Phil Bourne, (then) Associate Vice Chancellor, UCSD, 4/13:
“We need to think about the university as a digital enterprise.”
Mike Huerta, Ass. Director NLM:
“Today, the major public product of science are concepts, written
down in papers. But tomorrow, data will be the main product of
science…. We will require scientists to track and share their data as
least as well, if not better, than they are sharing their ideas today.”
Researchers:
Derive credit
Comply with mandates
Discover and use
Cite/acknowledge
Nathan Urban, PI Urban Lab, CMU, 3/13:
“If we can share our data, we can write a paper that will knock
everybody’s socks off!”
Barbara Ransom, NSF Program Director Earth Sciences:
“We’re not going to spend any more money for you to go out and get
more data! We want you first to show us how you’re going to use all
the data we paid y’all to collect in the past!”
3. Research data management today:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
5. But it is also VERY complicated:
http://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
• Interspecies variability: A specimen is not a species
• Gene expression variability: Knowing genes is not
knowing how they are expressed
• Microbiome: An animal is an ecosystem
• Systems biology: A whole is more than the sum of its parts
• Male researchers stress out rodents!
Reductionist science
does not work
for living systems!
Statistics to the rescue!
6. What if the research data was connected?
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observations
Observations
Observations
Across labs,
experiments: track
reagents and how they
are used
9. Maslow Hierarchy of Research Data Needs:
Use
ful
Trusted
Reproducible
Discoverable
Comprehensible
Archived
Accessible
Preserved in digital format
10. 1: Urban Legend
How can we make a standard
neuroscience wet lab store and
share their data?
• Incorporate structured workflows into
the daily practice of a typical
electrophysiology lab (the Urban Lab at
CMU)
– What does it take?
– Where are points of conflict?
• 1-year pilot, funded by Elsevier RDS:
– CMU: Shreejoy Tripathy, manage/user test
– Elsevier: development, UI, project management
• Next steps: NIH grant to scale up to 4 labs
Use
ful
Trusted
Reproducible
Discoverable
Comprehensible
Archived
Accessible
Preserved in digital
format
11. de Waard, A., Burton, S. et al., 2013
Urban Legend Components
14. 2: Moonrocks
How can we scale up data curation?
Pilot project with IEDA:
• Build a database for lunar geochemistry
• Leapfrog & improve curation time
• Write joint report on processes, costs
and challenges
• 1-year pilot, funded by Elsevier
• Next step: NSF grant on schema’s >
spreadsheets
Use
ful
Trus-
ted
Reprodu-
cible
Discoverable
Comprehensible
Archived
Accessible
Preserved in digital format
16. 3: How do we improve how data (and
software) are published?
• Eg with the Virtual Microscope
• Or Interactive Plots
• Or Executable Papers
Use
ful
Trusted
Reprodu-cible
Discoverable
Comprehensible
Archived
Accessible
Preserved in digital format
17. Let’s support the needs of research data!
Experimental Metadata:
Workflows, Samples, Settings, Reagents, Organisms, etc.
Record Metadata: DOI, Date, Author, Institute, etc.
Processed Data:
Mathematically/computationally processed
data: correlations, plots, etc.
Raw Data: Direct outputs from equipment:
images, traces, spectra, etc.
Methods and Equipment: Reagents,
settings, manufacturer’s details, etc.
Validation: Approval, Reproduction, Selection,
Quality Stamp
Use
ful
Trusted
Reproducib
le
Discoverable
Comprehensible
Archived
Accessible
Preserved in digital format
Morecuration
Moreusable
18. Anita de Waard
a.dewaard@elsevier.com
Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy
• UCSD: Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky
• NIF: Maryann Martone, Anita Bandrowski
• OHSU: Melissa Haendel, Nicole Vasilevsky
• Columbia/IEDA: Kerstin Lehnert, Leslie Hsu
• MIT: Micah Altman
Thank you!
http://researchdata.elsevier.com/