Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
Reproducible method and benchmarking publishing for the data (and evidence) driven era
1. Reproducible method and
benchmarking publishing for the data
(and evidence) driven era.
Scott Edmunds
SRFC Conference, Yantai
18th September 2018
数据(证据)驱动时代下的可重
复方法与出版标准
3. Forensic Bioinformatics: where raw data and reported results
are used to reconstruct what the methods must have been.
https://retractionwatch.com/2011/05/04/the-importance-of-being-reproducible-keith-baggerly-tells-the-anil-potti-story/
How not to regain trust?
失信的深渊?
4. Buckheit & Donoho: Scholarly articles are merely advertisement of
scholarship. The actual scholarly artifacts, i.e. the data and
computational methods, which support the scholarship, remain largely
inaccessible.
How not to regain trust?
失信的深渊?
5. Provide evidence not advertising
Transparency or bust
Show me the peer reviews
Give me the data/ code/protocols
Let me publish replication studies
Buckheit & Donoho: Scholarly articles are merely advertisement of
scholarship. The actual scholarly artifacts, i.e. the data and
computational methods, which support the scholarship, remain largely
inaccessible.
How to regain trust?
如何重获信任??
用证据说话
6. GigaScience Ethos/Policies: ‘Impact' is subjective. Data is quantitive.
Reward evidence (data), not advertising
鼓励证据(数据)而非包装
• Data
• Software
• Models
• Pipelines
• Reviews
• Re-use…
= Credit
7. Rewarding open data & code
鼓励开放数据和代码
http://gigasciencejournal.com/
Since July 2012. Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software.
8. Integrated GigaDB repository. DataCite DOIs. No size limits, APC covers storage.
http://gigadb.org/
Rewarding open data & code
鼓励开放数据和代码
11. Rewarding & enabling interaction
鼓励并实现互动
Building tools (inc Jbrowse for genomes, sketchfab for 3D images) on top of datasets…
CodeOcean widgets for code, “compute capsule” (data+code+environment) run on AWS
[Insert Widget Here]
13. First journal with deep integration with
Launched 2nd June 2016
Reward better handling of “wet” protocols…
• Create, share, modify forkeable protocols in repo.
• Download & run on smartphone app.
• Widgets embedded in GigaDB
• Get discoverability, credit, DOIs for sharing methods.
• Create your own, or let us set up & you claim.
https://www.protocols.io/groups/gigascience-journal
16. To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Open Data to the rescue…
17.
18.
19.
20. Downstream consequences:
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
1. Many Citations 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science
21. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
23. Oxford Nanopore in the spotlight, Sept 2014. Does it work?
https://doi.org/10.1111/1755-0998.12324
http://omicsomics.blogspot.com/2014/09/oxford-takes-some-flak-fires-back.html
2014年9月面世的Oxford Nanopore,好用吗?
24. Nanopore MinION E. Coli genome
released via GigaDB 10-Sep-2014
Curated & converted to ISA-tab, &
worked with EBI to get raw data there
Data Note submitted & preprint version
out 26-Sept-2014
Peer reviewed & published 20-Oct-2014
http://dx.doi.org/10.5524/100102
27. Try before you buy: inspect ALL the data yourselves
https://doi.org/10.1093/gigascience/gix024
• Comparisons with Illumina for
PE50, 100 & 150
• Raw sequencing data in NCBI SRA
• FASTQ files in GigaDB
• Raw image files also shared
Would you trust a BGI sequencer?
华大测序仪可信吗?
先尝后买:亲自检查所有数据
28. Open, transparent and peer reviewed benchmarking
https://doi.org/10.1093/gigascience/gix024
http://dx.doi.org/10.5524/review.100698
http://dx.doi.org/10.5524/review.100699Open
Review
Would you trust a BGI sequencer?
华大测序仪可信吗?
31. Transparency saves wildlife
User-friendly pipeline for the rapid identification of CITES-listed
species in forensic samples using Illumina data.
• International validation trial by 16 laboratories.
• All input sequence data + results available in GigaDB.
• SOPs available in protocols.io.
https://doi.org/10.1093/gigascience/gix080
32. Open Science = Science
• Science needed more than ever to tackle grave
environmental challenges and fight crime
• Stand on the shoulders of giants, and allow others
to stand on yours
• Choose evidence not branding
• Being closed provokes distrust, prevents
downstream use, and ultimately harms science
• Being open helps science, your immediate
community, and ultimately your career
• Preempt new EU Open Science and MOST rules on
“strengthening research integrity”…
http://most.gov.cn/mostinfo/xinxifenlei/fgzc/gfxwj/gfxwj2018/201805/t20180531_139731.htm
33. Help GigaScience make it happen
www.gigasciencejournal.com
Give us your data,
pipelines & papers
scott@gigasciencejournal.com
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:
Spare people having to do forensic
bioinformatics if they don’t have to
助力GigaScience实现科研过程全公开
34. Thanks to:
Laurie Goodman, Editor in Chief
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Chris Armit, Data Scientist
Mary Ann Tulli, Data Ediitor
Xiao (Jesse) Si Zhe, Database Developer
Chen Qi, Shenzhen Office.
@GigaScience
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
Follow us:
www.gigasciencejournal.com
www.gigadb.org
+
Weibo
& WeChat