Scott Edmunds talk at IARC, Lyon. How can we make science more trustworthy and FAIR? Principled publishing for more evidence based research. 8th July 2019
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR? Principled publishing for more evidence based research
1. How can we make science more
trustworthy and FAIR? Principled
publishing for more evidence
based research
The perspective of transparency…
Scott Edmunds
IARC, 8th July 2019
4. Scientists: need to convince public + politicians
https://www.newsweek.com/pruitt-trump-asbestos-chemicals-trump-962703
#FakeNews
#MakeCarcinogensGreatAgain
6. Forensic Bioinformatics: where raw data and reported results
are used to reconstruct what the methods must have been.
https://retractionwatch.com/2011/05/04/the-importance-of-being-reproducible-keith-baggerly-tells-the-anil-potti-story/
How not to regain trust?
8. Data Sharing with Chinese Characteristics
“The 2016 draft HGR Regulation further declares “safeguarding
national security” as one of its legislative purposes, with biosecurity as
a core element of national security.”
“Echoing the repeated warning against illegal seizures of genetic
resources by foreign entities, the drafters identify in particular cross-
border data transfer as a new and covert means of seizure. This
position is in a distinctive contrast with international consensus on
the imperative of genomic data sharing, as recognized under the
Bermuda Principles, the Fort Lauderdale Agreement, and initiatives of
building interoperable rules of sharing, such as the Framework for
Responsible Sharing of Genomic and Health-Related Data of the
GA4GH.
China: concurring regulation of cross-border genomic data sharing for statist control and individual protection
https://doi.org/10.1007/s00439-018-1903-2
How not to regain trust?
9. https://doi.org/10.1007/s00439-018-1903-2
https://doi.org/10.1038/nature14659
“Based on the same rationale, the MOST launched nationwide audit campaigns in
2011 and 2013 to identify sino-overseas projects that are unauthorized or
uncompliant with state policies.
It is noteworthy that in February 2018, the CAHGR revoked the licenses granted to
two high-profile collaborative projects, which concern the Comparative Genetic Study
of Psychosis in Han Chinese (between UCLA & SJTU) and the Genetic Foundation of
Depression in Chinese Women (between Oxford Uni and PKU), respectively, and
confiscated the exported genomic data (CAHGR 2018). The revocation was made
pursuant to the Administrative License Law, but no specific reasons were disclosed in
the formal decision.”
Data Sharing with Chinese Characteristics
How not to regain trust?
10. https://www.nature.com/articles/d41586-018-07222-2
The ministry says genomics giant BGI in Shenzhen and Shanghai’s Huashan Hospital were
caught breaking the rules after they put genetic information online without approval. The
data was part of a large international study on the genetics of depression, which was
published in Nature in 2015. The paper was based on anonymized sequence data from more
than 10,000 Chinese women, which BGI acknowledges it did not have permission to publish
in paper's supplementary material.
A spokesperson from BGI says the company has destroyed the data, as request by the
ministry. They say the company has also requested Nature remove the article from its
website.
11. Open Data saves lives, but kills
candidate gene studies
https://www.theatlantic.com/science/archive/2019/05/waste-1000-studies/589684/
12. Open Data saves lives, and size matters
+Open
Data
https://doi.org/10.1176/appi.ajp.2018.18070881
(inc. Chinese CONVERGE data)
=
13. Focusing on unscientific unreproducibile metrics
Incentivising short term-citations
How not to regain trust?
14. JIFBAIT Network
more
GWAS
GWAS
JIFBAIT NEWS
Arsenic Life forms, will
they take over the planet?
By Melba Ketchum, PhD
Which Overhyped, Unreproducible
Experiment Are You?
Want rapid citations for 2 years only? Carry out this quiz.
You got: STAP Cells
Of course dipping cells in
coffee will make them
pluripotent. Even if the
research gets discredited, it’ll
still get 100’s of citations in
two years.
15. Publish or impoverish: An investigation of the monetary
reward system of science in China (1999-2016)
https://doi.org/10.1108/AJIM-01-2017-0014
http://www.szhrss.gov.cn/xxgk/zcfgjjd/gcjzyrcgl/201708/t20170831_8317284.htm
Scientists: what we are doing instead
Shenzhen "Peacock" "National leading talent scheme”:
Science/Nature = ¥3M RMB, JCI Q1 = ¥1.6M RMB (1st & corresponding authors only)
16. Attempts to “game the peer-review system on an industrial
scale”
http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/
Companies offering authorship of papers made to order by “paper
mills”.
Guaranteed publication in JIF journal, often using fake referees, ID
theft, etc. JIF 1-2 papers = ~$10,000 USD
In China publication + JIF = money = fraud
17. Do you want to be author of an IF 5.168 paper (OncoTarget)?
Title: “…meta-analysis to evaluate the long-term efficacy of different ****
drugs in the treatment of pancreatic cancer…”
Scientists: what we are doing instead
http://www.518sci.com/index.php?catid=17&ydzt=0-9999&zdprice=0-9999
20. How to regain trust?
Areas we need to tackle to allow citizens to trust us
Open Access - Change incentive
systems away from dead tree
advertising to FAIR data &
reproducibility
Open Science - Increase
transparency & fill the data gaps
Citizen Science - Involve the public
in the scientific process
21. Provide evidence not advertising
Transparency or bust
Show me the peer reviews
Give me the data/code/protocols
Let me publish replication studies
Buckheit & Donoho: Scholarly articles are merely advertisement of
scholarship. The actual scholarly artifacts, i.e. the data and
computational methods, which support the scholarship, remain largely
inaccessible.
How to regain trust?
22. GigaScience Ethos/Policies: ‘Impact' is subjective. Data is quantitive.
Reward evidence (data), not advertising
• Data
• Software
• Models
• Pipelines
• Reviews
• Re-use…
= Credit
23. Rewarding open data & code
http://gigasciencejournal.com/
Since July 2012. Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software.
24. Integrated GigaDB repository. DataCite DOIs. No size limits, APC covers storage.
http://gigadb.org/
Rewarding open data & code
27. Rewarding & enabling interaction
Building tools (inc Jbrowse for genomes, sketchfab for 3D images) on top of datasets…
CodeOcean widgets for code, “compute capsule” (data+code+environment) run on AWS
[Insert Widget Here]
28. Aiding reproducibility of imaging studies
OMERO: providing
access to imaging data
Already used by JCB.
View, filter, measure raw
images with direct links
from journal article.
See all image data, not
just cherry picked
examples.
Download and reprocess.
29. The zoom viewer allows whole-slide images to be explored at cellular resolution in the
context of a web browser, and without need for data download.
This example shows a lymph node section from a breast cancer patient.
These data are available at: http://dx.doi.org/10.5524/100439
31. First journal with deep integration with
Launched 2nd June 2016
Reward better handling of “wet” protocols…
• Create, share, modify forkeable protocols in repo.
• Download & run on smartphone app.
• Widgets embedded in GigaDB
• Get discoverability, credit, DOIs for sharing methods.
• Create your own, or let us set up & you claim.
https://www.protocols.io/groups/gigascience-journal
34. To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Open Data to the rescue…
35.
36.
37.
38. Downstream consequences:
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
1. Many Citations 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science
39. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
41. Oxford Nanopore in the spotlight, Sept 2014. Does it work?
https://doi.org/10.1111/1755-0998.12324
http://omicsomics.blogspot.com/2014/09/oxford-takes-some-flak-fires-back.html
42. Nanopore MinION E. Coli genome
released via GigaDB 10-Sep-2014
Curated & converted to ISA-tab, &
worked with EBI to get raw data there
Data Note submitted & preprint version
out 26-Sept-2014
Peer reviewed & published 20-Oct-2014
http://dx.doi.org/10.5524/100102
44. Try before you buy: inspect ALL the data yourselves
https://doi.org/10.1093/gigascience/gix024
• Comparisons with Illumina for PE50,
100 & 150
• Raw sequencing data in NCBI SRA
• FASTQ files in GigaDB
• Raw image files & protocols shared
Would you trust a Chinese sequencer?
45. Open, transparent and peer reviewed benchmarking
https://doi.org/10.1093/gigascience/gix024
http://dx.doi.org/10.5524/review.100698
http://dx.doi.org/10.5524/review.100699Open
Review
Would you trust a Chinese
sequencer?
50. A mnemonic to remember: FAIR
http://www.nature.com/articles/sdata201618
http://www.datafairport.org/
Require stewardship on top of access
51. A mnemonic to remember: FAIR
http://www.nature.com/articles/sdata201618
http://www.datafairport.org/
52. Beyond a mnemonic: FAIR ecosystems
FAIR metrics
https://www.go-fair.org/go-fair-initiative/
53. Beyond a mnemonic: FAIR Evaluation
Evaluating FAIR-Compliance Through an Objective, Automated, Community-Governed
Framework https://www.biorxiv.org/content/early/2018/09/16/418376
54. DTL/ELIXIR-NL
“Bring Your Own Data Party”
GigaScience/BGI HK
Metabolomics ISA-TAB athon v
More FAIR mnemonics: “BYODs”
56. Open Science, the final frontier:
democratising research for citizens
The Hong Kong example…
57. HK Botanical &
Afforestation Dept.
"The mysterious origin
of the tree & its
magnificent flowers at
once arrest the interest.
The Bauhinia Mystery?
1903
So far, all efforts to identify them with
any foreign species have failed"
64. Education: sharing FAIR data
http://dx.doi.org/10.5524/100245
http://dx.doi.org/10.5524/100345
65. Student power (MSc @ CUHK)
Education: teaching people with the data
Transcriptomes assembled & annotated by MSc students
Looked at GO/KEGG
& TCM compounds
Looked at parental links
(diversity, maternal/paternal)
B. Purpurea = motherB. Variegata = father
66. Open Science = Science
• Science needed more than ever to tackle grave
public health challenges
• Need to escape from our ivory towers & increase
transparency to regain stakeholder trust
• Take science back to standing on the shoulders of
giants, rather than unFAIR practices
• Choose evidence not branding
• Once we have Open Data, we then need FAIR data
stewardship
• New EU funder rules on open science/OA coming –
preempt FAIR assessment
67. Help GigaScience make it happen
www.gigasciencejournal.com
Give us your data,
pipelines & papers
scott@gigasciencejournal.com
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:
68. Thanks to:
Laurie Goodman, Editor in Chief
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Chris Armit, Data Scientist
Mary Ann Tulli, Data Ediitor
Xiao (Jesse) Si Zhe, Database Developer
Chen Qi, Shenzhen Office.
@GigaScience
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
Follow us:
www.gigasciencejournal.com
www.gigadb.org
+
Weibo
& WeChat