❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
The Economics of Data Sharing
1. | 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
CMMI Workshop
February 6, 2016
The Economics of Data
Sharing
2. | 2
How do we get scientists to share their data?
How do we make data repositories sustainable?
• The economics of science
• Cost recovery models of data repositories
• Some examples that work
• Some thoughts on the future.
How do we create effective and sustainable
ecosystems for storing, sharing and reusable data—
and get people to use them?
3. | 3
Debit Economy (like a pie)
• Single pile of ‘stuff’ gets divided:
- Thing can only be for one person
at one time
- “If you get more, I get less”
• Examples:
- Money
- Jobs
- Samples, equipment, space, etc.
• Behaviors:
- Hoarding, secrecy
- (Cut-throat) competition
- Winning by owning
(and not sharing)
Credit Economy (like a song)
• Credit comes from visibility:
- The more you give away,
the more you benefit
- “Only if I share do I really own”
(“You need me to do you!” JW)
• Examples:
- Papers, citations
- Good ideas (if credited)
- Skills
• Behaviors:
- Open access, citation game
- Collaboration with top-X
- Winning by sharing
(to enable priority & visibility)
Two Economies of Science [1]:
[1] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1
<<<DATA???
4. | 4
RDA IG Repository Cost Recovery
• Interviewed 22 repositories, globally
• Different income streams:
1. Structurally funded
2. Mostly data access charges
3. Mostly data deposit fees
4. Membership fees (for deposits and/or access)
5. Serial project funding
6. Supported by host institution
• Different new models under considerations:
• Sponsorships/services for the commercial sector
• Contracts for specific services offered (hosting, archiving, curation)
• Expanding the number of affiliated institutions
• Deposit fees
• More services for “national memory institutes”
• Some comments:
• Some countries structurally fund repositories (not US!)
• Some repositories embedded in scholarly practice
• Hard to come up with new models: no time, no skill sets!
5. | 5
Object of
Study
Raw
Data
Processed
Data
Data
With
Paper
Curated
Record
Method Analysis
Tables/
Figures
Curate
Methods Software
Four Types of Repositories:
Research
Question
NOAA: 20 TB/
NASA streaming > 24 PB/day
NASA Reverb: 12 PB Data
NSSD: > 230 TB of digital data
NSIDC: 1 PB data, : 1 PB total
ALMA Telescope: 40 TB/day
Local Storage/
Instrument Repositories
Size: PB
Nr of files: Trillions
Deep Blue (Umich): 80k
MIT Dspace: 75 k
HAL (France): 60 k
D-Space Cambr: 1.5 k
Of which data: hundreds
Institutional/Local
Repositories
Size: GB
Nr of files: Billions
Figshare: 1.2 M
DataDryad: 3 k
Dataverse: 58 k
Non-Domain
Repositories
Size: MB
Nr of files: Milliions
Domain
Repositories
PetDB: 6 k
PDB: 100 k
NIST ASD: 170 k
Size: kB
Nr of files: 100ks
Publication
6. | 6
YES:
• Astronomy: telescopes
• High-energy physics: accelerators
• Earth science: satellites
• Social science: censuses
• Medicine (sometimes): patient data in
large studies
• Life science: sequence data
NO:
• Low-temperature physics: cryostats
• Earth science: samples
• Materials science: catalysts,
microscopes, etc.
• Social science: interviews
• Medicine: individual patient data
• Neuroscience: microscope
Where is data sharing happening?
• Big equipment, not a single lab/person
can run
• Can’t do science without it
• Tools in place to be effective
• Small equipment, single lab/person can
run
• Can do science without sharing
• No effective tools in place
Communicate
Prepare
Observe
Analyze
Ponder
10. | 10
A small change for small science: Urban Legend [2]
• Encourage data sharing of raw data files + experimental metadata
• Add metadata to your experiment while you’re performing it
• Improved data practices made lab more productive and more creative, and
enabled effective and novel collaborations
• Lesson: split the data storage and curation from data sharing!
- Provide direct reward to storage: now we can find our own data!
- Enable simple upload to embargo’d data set when owner is ready.
[2] Tripathy et al, 2014: http://www.frontiersin.org/10.3389/conf.fninf.2014.18.00077/event_abstract
13. | 13
Researche
r
Funding
Agency
Institution
Data
Repository
Dataset
Journal
Paper
1. Researcher creates datasets and posts to repository
(under embargo – not publicly viewable)
2. Funder is automatically notified of dataset posting
3. Researcher writes paper & publishes in journal; embargo is lifted and data linked
- NB this also allows release of non-used data for negative result and reproducibility
4. Funder and institution get report on publication and embargo lifting
2
1
1
3
3
3
4
4
Addressing the fear of scooping with embargo’s:
14. | 14
A System for Linking Data Links: Scholix
• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot
• Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe
PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others
• Proposed long-term architecture and interoperability framework: www.scholix.org
• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
• Making links between datasets and articles available could/should encourage
data citation and deposition
• Together with Force11 Data Citation Principles, encourage Research Object
citation/credit metrics.
15. | 15
The Commons
Cloud Provider
A
NIH
Option:
Direct Funding
NIH
BD2K
A System for A New Data Economics: NIH Data Commons
Phil Bourne, Dec15
Enables Search
Discovery Index
Indexes
Search
Engines
Cloud Provider
B
Investigator
Provides credits
Uses credits in
the Commons
User
16. | 16
Drivers for Data Sharing: A Study in Behavioral Economics
• Study scholarly reward systems from point of view of economics
• Develop economic model for entire scholarly rewards ecosystem:
career, prestige, tenure, finances, etc
• Two intended outcomes:
- Understanding current behavior with respect to data sharing: can we
explain what we see, and the differences between different domains?
- Theoretical foundation for recommendations for policies and practices to
stakeholders such as funders, publishers and standards bodies
• Small group working on it, planning first meeting:
- Mike Huerta (NLM), Micah Altman (MIT), Fran Berman (RPI), Carol
Tenopir (TN), Carole Palmer (UW), Greg Gordon (SSRN).
• Thoughts, join?
17. | 17
• The Economy of Science: pies vs. songs
- RDA Data Repositories Cost Recovery IG:
- Different types of repositories, different types of science
- Need to move from ‘small’ to ‘big’ science thinking
• Some examples of successful data sharing:
- Online electronic lab notebooks: making it too easy not to use
- RDA Scholix: linking systems of links using existing technology
- The NIH Data Commons: enabling a data economy in practice
• Some things we can do:
- Embargo pilots: circumvent the fear of scooping
- Drivers for data sharing report: science is a human endeavor
In summary:
cyberinfrastucture
IUPAC has recommendations for what word you should use to describe a given property, but the vocabulary itself isn’t very accessible or usable itself, thus is not universally implemented. Each site decides how it wants to label a given property, which hinders indexing and reuse of the data across silos. Structured capture of information using an ELN such as Hivebench enables the researcher to report data using a consistent vocabulary without extra effort.