Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
1. Using FAIR principles for more
Open & Democratic Science
'if I have seen further it is by
standing on the shoulders of
giants'.
Scott Edmunds
Dobzhansky Center, 24th August 2017
3. Buckheit & Donoho: Scholarly articles are
merely advertisement of scholarship. The
actual scholarly artifacts, i.e. the data and
computational methods, which support
the scholarship, remain largely
inaccessible.
Scientists: what we are doing instead
5. Scientists: what we are doing instead
Focusing on unscientific unreproducibile metrics
Incentivising short term-citations
6. JIFBAIT Network
more
GWAS
GWAS
JIFBAIT NEWS
Arsenic Life forms, will
they take over the planet?
By Melba Ketchum, PhD
Which Overhyped, Unreproducible
Experiment Are You?
Want rapid citations for 2 years only? Carry out this quiz.
You got: STAP Cells
Of course dipping cells in
coffee will make them
pluripotent. Even if the
research gets discredited, it’ll
still get 100’s of citations in
two years.
7. Publish or impoverish: An investigation of the monetary
reward system of science in China (1999-2016)
https://arxiv.org/abs/1707.01162
Scientists: what we are doing instead
8. Attempts to “game the peer-review system on an industrial
scale”
1. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/
2. http://www.grassley.senate.gov/sites/default/files/about/upload/Senator-Grassley-Report.pdf
Companies offering authorship of papers made to order by “paper
mills”1. Common ghostwriting medical papers by pharma2
Guaranteed publication in JIF journal, often using fake referees, ID
theft, etc.
Scientists: what we are doing instead
14. The Solution: Open Access
“By “open access” to [peer-reviewed research literature], we mean its
free availability on the public internet, permitting any users to read,
download, copy, distribute, print, search, or link to the full texts of
these articles, crawl them for indexing, pass them as data to software,
or use them for any other lawful purpose, without financial, legal, or
technical barriers other than those inseparable from gaining access to
the internet itself. The only constraint on reproduction and
distribution, and the only role for copyright in this domain, should be
to give authors control over the integrity of their work and the right to
be properly acknowledged and cited.”
Budapest Open Access Initiative:
• Maximizes reuse and access
• Gives authors control over the integrity of their work and the right
to be properly acknowledged and cited.
• “Real” OA asks for no restrictions/limitations = CC-BY
16. • Review
• Data
• Software
• Models
• Pipelines
• Re-use…
= Credit
}
Credit where credit is overdue:
“One option would be to provide researchers who release data to public repositories with
a means of accreditation.”
“An ability to search the literature for all online papers that used a particular data set
would enable appropriate attribution for those who share. “
Nature Biotechnology 27, 579 (2009)
New incentives/credit
27. Research Objects: a concept & model
http://www.researchobject.org/
• Supporting publication of more than just PDFs, making data, code, & other resources first class citizens
of scholarship.
• Recognizing that there is often a need to publish collections of these resources together as one
shareable, cite-able resource.
• Enriching these resources and collections with any & all additional information required to make
research reusable, & reproducible!
30. First journal with deep integration with
Launched 2nd June 2016
Reward better handling of “wet” protocols…
• Create, share, modify forkeable protocols in repo.
• Download & run on smartphone app.
• Get discoverability, credit, DOIs for sharing methods.
• Create your own, or let us set up & you claim.
http://protocols.io/
31. https://codeocean.com/
New Integration: Code Ocean
Cloud-based executable research platform
Browse, share & run code on AWS
Creates compute capsule: encapsulation of
the data, code, and computation
environment
Integration into the paper, share via DOIs
First examples just published in GigaScience
Integrated plugin into GigaDB
Share your code this way!
33. A mnemonic to remember: FAIR
http://www.nature.com/articles/sdata201618
http://www.datafairport.org/
Lots of models/standards/guidelines
Where does that leave us?
34. A mnemonic to remember: FAIR
http://www.nature.com/articles/sdata201618
http://www.datafairport.org/
39. How FAIR can we get?
Data sets
Analyses
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
>50,000 accesses
& 885 citations
Open-Code
7 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/
>40,000 downloads
Enabled code to being picked apart by bloggers in wiki
http://homolog.us/wiki/index.php?title=SOAPdenovo2
41. The SOAPdenovo2 Case study
Subject to and test with 3 models:
Data
Method/Experi
mental protocol
Findings
Types of resources in an RO
ISA-TAB/ISA2OWL
Nanopublication
Wfdesc/ISA-
TAB/ISA2OWL
Models to describe each resource type
42.
43. 1. While there are huge improvements to the quality of the resulting
assemblies, other than the tables it was not stressed in the text that
the speed of SOAPdenovo2 can be slightly slower than SOAPdenovo
v1.
2. In the testing an assessment section (page 3), based on the correct
results in table 2, where we say the scaffold N50 metric is an order of
magnitude longer from SOAPdenovo2 versus SOAPdenovo1, this was
actually 45 times longer
3. Also in the testing an assessment section, based on the correct
results in table 2, where we say SOAPdenovo2 produced a contig N50
1.53 times longer than ALL-PATHS, this should be 2.18 times longer.
4. Finally in this section, where we say the correct assembly length
produced by SOAPdenovo2 was 3-80 fold longer than SOAPdenovo1,
this should be 3-64 fold longer.
44. Lessons Learned
• Most published research findings are false. Or at
least have errors
• With enough effort is possible to push button(s) &
recreate a result from a paper with current tools
• Being FAIR can be COSTLY. How much are you willing
to spend? Who will build FAIR infrastructure?
• Much easier to make things FAIR before rather than
after publication. BYODs useful intermediate here
50. HK Botanical &
Afforestation Dept.
"The mysterious origin
of the tree & its
magnificent flowers at
once arrest the interest.
The Bauhinia Mystery?
1903
So far, all efforts to identify them with
any foreign species have failed"
64. Student power (MSc @ CUHK)
Education: teaching people with the data
Transcriptomes assembled & annotated by students
Looked at GO/KEGG
& TCM compounds
Looked at parental links
(diversity,
maternal/paternal)
67. www.gigasciencejournal.com
Give us your data, papers
& pipelines
Help GigaPanda
make it happen!
scott@gigasciencejournal.com
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:
68. Thanks to:
Laurie Goodman, Editor in Chief
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Xiao (Jesse) Si Zhe, Database Developer
Chen Qi, Shenzhen Office.
All of BGI
@GigaScience
facebook.com/GigaScience
blogs.biomedcentral.com/gigablog/
Follow us:
www.gigasciencejournal.com
www.gigadb.org
+
Weibo
& WeChat