The Economics of Data Sharing

| 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
CMMI Workshop
February 6, 2016
The Economics of Data
Sharing

| 2
How do we get scientists to share their data?
How do we make data repositories sustainable?
• The economics of science
• Cost recovery models of data repositories
• Some examples that work
• Some thoughts on the future.
How do we create effective and sustainable
ecosystems for storing, sharing and reusable data—
and get people to use them?

| 3
Debit Economy (like a pie)
• Single pile of ‘stuff’ gets divided:
- Thing can only be for one person
at one time
- “If you get more, I get less”
• Examples:
- Money
- Jobs
- Samples, equipment, space, etc.
• Behaviors:
- Hoarding, secrecy
- (Cut-throat) competition
- Winning by owning
(and not sharing)
Credit Economy (like a song)
• Credit comes from visibility:
- The more you give away,
the more you benefit
- “Only if I share do I really own”
(“You need me to do you!” JW)
• Examples:
- Papers, citations
- Good ideas (if credited)
- Skills
• Behaviors:
- Open access, citation game
- Collaboration with top-X
- Winning by sharing
(to enable priority & visibility)
Two Economies of Science [1]:
[1] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1
<<<DATA???

| 4
RDA IG Repository Cost Recovery
• Interviewed 22 repositories, globally
• Different income streams:
1. Structurally funded
2. Mostly data access charges
3. Mostly data deposit fees
4. Membership fees (for deposits and/or access)
5. Serial project funding
6. Supported by host institution
• Different new models under considerations:
• Sponsorships/services for the commercial sector
• Contracts for specific services offered (hosting, archiving, curation)
• Expanding the number of affiliated institutions
• Deposit fees
• More services for “national memory institutes”
• Some comments:
• Some countries structurally fund repositories (not US!)
• Some repositories embedded in scholarly practice
• Hard to come up with new models: no time, no skill sets!

| 5
Object of
Study
Raw
Data
Processed
Data
Data
With
Paper
Curated
Record
Method Analysis
Tables/
Figures
Curate
Methods Software
Four Types of Repositories:
Research
Question
NOAA: 20 TB/
NASA streaming > 24 PB/day
NASA Reverb: 12 PB Data
NSSD: > 230 TB of digital data
NSIDC: 1 PB data, : 1 PB total
ALMA Telescope: 40 TB/day
Local Storage/
Instrument Repositories
Size: PB
Nr of files: Trillions
Deep Blue (Umich): 80k
MIT Dspace: 75 k
HAL (France): 60 k
D-Space Cambr: 1.5 k
Of which data: hundreds
Institutional/Local
Repositories
Size: GB
Nr of files: Billions
Figshare: 1.2 M
DataDryad: 3 k
Dataverse: 58 k
Non-Domain
Repositories
Size: MB
Nr of files: Milliions
Domain
Repositories
PetDB: 6 k
PDB: 100 k
NIST ASD: 170 k
Size: kB
Nr of files: 100ks
Publication

| 6
YES:
• Astronomy: telescopes
• High-energy physics: accelerators
• Earth science: satellites
• Social science: censuses
• Medicine (sometimes): patient data in
large studies
• Life science: sequence data
NO:
• Low-temperature physics: cryostats
• Earth science: samples
• Materials science: catalysts,
microscopes, etc.
• Social science: interviews
• Medicine: individual patient data
• Neuroscience: microscope
Where is data sharing happening?
• Big equipment, not a single lab/person
can run
• Can’t do science without it
• Tools in place to be effective
• Small equipment, single lab/person can
run
• Can do science without sharing
• No effective tools in place
Communicate
Prepare
Observe
Analyze
Ponder

| 7
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observations
Observations
Observations
Identify entities from the start
Connecting small science

| 8
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observations
Observations
Observations
Compare outcome of interactions
with these entities

| 9
Prepare
Analyze Communicate
Prepare
AnalyzeCommunicate
Observations
Observations
Observations
Build a ‘virtual reagent
spectrogram’ by comparing
how different entities
interacted in different
experiments
Think
Reason collectively!

| 10
A small change for small science: Urban Legend [2]
• Encourage data sharing of raw data files + experimental metadata
• Add metadata to your experiment while you’re performing it
• Improved data practices made lab more productive and more creative, and
enabled effective and novel collaborations
• Lesson: split the data storage and curation from data sharing!
- Provide direct reward to storage: now we can find our own data!
- Enable simple upload to embargo’d data set when owner is ready.
[2] Tripathy et al, 2014: http://www.frontiersin.org/10.3389/conf.fninf.2014.18.00077/event_abstract

| 11
Researche
r
Funding
AgencyInstitution
Data
Repository
Dataset
JournalPaper
Addressing the fear of scooping with embargo’s:
1. Researcher creates datasets
2. Researcher writes paper & publishes in journal
3. (Sometimes,) dataset gets posted to repository
4. Researcher reports (post-hoc) to Institution and Funder
2
2
1
3
4
4

| 12
Researche
r
Funding
AgencyInstitution
Data
Repository
Dataset
JournalPaper
2
2
1
3
4
4
iii. No links between
data and paper
iv. Funders/Institutions informed as an afterthought
i. Too much work for researchers
ii. Data posting not mandatory

| 13
Researche
r
Funding
Agency
Institution
Data
Repository
Dataset
Journal
Paper
1. Researcher creates datasets and posts to repository
(under embargo – not publicly viewable)
2. Funder is automatically notified of dataset posting
3. Researcher writes paper & publishes in journal; embargo is lifted and data linked
- NB this also allows release of non-used data for negative result and reproducibility
4. Funder and institution get report on publication and embargo lifting
2
1
1
3
3
3
4
4

| 14
A System for Linking Data Links: Scholix
• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot
• Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe
PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others
• Proposed long-term architecture and interoperability framework: www.scholix.org
• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
• Making links between datasets and articles available could/should encourage
data citation and deposition
• Together with Force11 Data Citation Principles, encourage Research Object
citation/credit metrics.

| 15
The Commons
Cloud Provider
A
NIH
Option:
Direct Funding
NIH
BD2K
A System for A New Data Economics: NIH Data Commons
Phil Bourne, Dec15
Enables Search
Discovery Index
Indexes
Search
Engines
Cloud Provider
B
Investigator
Provides credits
Uses credits in
the Commons
User

| 16
Drivers for Data Sharing: A Study in Behavioral Economics
• Study scholarly reward systems from point of view of economics
• Develop economic model for entire scholarly rewards ecosystem:
career, prestige, tenure, finances, etc
• Two intended outcomes:
- Understanding current behavior with respect to data sharing: can we
explain what we see, and the differences between different domains?
- Theoretical foundation for recommendations for policies and practices to
stakeholders such as funders, publishers and standards bodies
• Small group working on it, planning first meeting:
- Mike Huerta (NLM), Micah Altman (MIT), Fran Berman (RPI), Carol
Tenopir (TN), Carole Palmer (UW), Greg Gordon (SSRN).
• Thoughts, join?

| 17
• The Economy of Science: pies vs. songs
- RDA Data Repositories Cost Recovery IG:
- Different types of repositories, different types of science
- Need to move from ‘small’ to ‘big’ science thinking
• Some examples of successful data sharing:
- Online electronic lab notebooks: making it too easy not to use
- RDA Scholix: linking systems of links using existing technology
- The NIH Data Commons: enabling a data economy in practice
• Some things we can do:
- Embargo pilots: circumvent the fear of scooping
- Drivers for data sharing report: science is a human endeavor
In summary:
cyberinfrastucture

| 18
Thank you!
Links:
• https://www.hivebench.com
• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-
2015-international-data-rescue-award-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/
• https://www.elsevier.com/books-and-journals/content-innovation/data-base-
linking
• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
• https://rd-alliance.org/bof-data-search.html
• https://data.mendeley.com/
• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
• https://www.force11.org/
• http://www.nationaldataservice.org/
• https://rd-alliance.org/
• https://www.elsevier.com/about/open-science/research-data
Anita de Waard, a.dewaard@elsevier.com

The Economics of Data Sharing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie The Economics of Data Sharing

Ähnlich wie The Economics of Data Sharing (20)

Mehr von Anita de Waard

Mehr von Anita de Waard (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The Economics of Data Sharing

Hinweis der Redaktion