Laurie Goodman at the BMC Roadshow: Transparency in Publishing and Being an Open Scientist
1. Transparency in Publishing and
Being an Open Scientist
Laurie Goodman, PhD
Editor-in-Chief, GigaScience
Laurie@gigasciencejournal.com
ORCID ID: 0000-0001-9724-5976
2. Open Access is Only the Beginning
• Open Access serves as a foundation for a
specific way of thinking about scientific
communication and the role of journals and
publishers
• It is a step toward changing how we
communicate science — from start to finish
• But it is only one step toward complete
transparency in publishing— and in science.
3. Open Access is One Component of
Open Science
What is Open Science?
• Open Access
• Open Data
• Open Source
• Open Notebook
4. Open Access is One Component of
Open Science
What does that mean?
Dan Gezelter explained this quite nicely in a blog post
(http://www.openscience.org/blog/?p=269)
It means:
• Having transparency in experimental methodology,
observation, and collection of data.
• Providing public availability and reusability of scientific
data.
• Providing public accessibility and transparency of scientific
communication.
• Using web-based tools to facilitate scientific collaboration.
6. A Tale of Two Bacteria
1. On May 2, 2011 German Doctors Reported the first case of
an E.coli infection, that was accompanied by hemolytic-
uremic syndrome.
2. On May 21, 2011 the first death occurred from this
bacteria (denoted E.coli O104:H4).
3. On May 26, 2011 Cucumbers from Spain were declared the
source of the infection. Resulting in a revenue loss of 200
Million Euros per week.
4. On June 3, 2014, BGI completed a draft sequence of E.coli
O104:H4 from a sample provided by doctors at the
University Medical Centre Hamburg-Eppendorf
5. Evening of Jun 3, 2014 the leaders at BGI and Hamburg-
Eppendorf held a discussion about whether to release the
sequence data immediately: what were the potential
repercussions of doing so.
7. A Tale of Two Bacteria
A main question in this discussion
If the data were released now —
will it affect our ability to publish later?
Will Journal Editors say: “Not enough of an
advance over the information already out there.”
Will Journal Editors say, “You have broken an
embargo by making this information available to
the public and the press.”
Will we be scooped?!?
8. A Tale of Two Bacteria
In World # 1
The researchers — who were concerned, rightly so,
about their ability to publish (remember this is the
way to obtain recognition and obtain grants, which
are essential for them to work) — waited.
The Result
The first publication appeared on July 29th
(~ 2 months after the first death.)
9. A Tale of Two Bacteria
In World # 2
The researchers decided public health was more
important than obtaining a publication — released
the data immediately.
The Result with regard to Publication
The first publication appeared on July 29th — but it
was not from the group who released the data (even
though the released data was included in that
study).
We live in World 2; however, although the data
producers were not the first to publish, what
followed was exciting and had broad repercussions.
10. To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
These data were put on an FTP
server under a CCO waiver and also
given a DOI to make access
‘permanent’
To the extent possible under law, BGI Shenzhen has waived all copyright and related or
neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published
from: China.
11.
12.
13.
14. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. A
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed b
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published pape
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
15. So:
Can we all agree that releasing the E.coli data
ahead of publication was ‘good’
(At least from a public health perspective)
If so- I want to put this case in perspective
Here are the numbers for the E.coli 2011 Outbreak:
In total, ~4000 people were infected and 53 died
16. Infectious Disease
Measles: 122,000 per year
Hepatitis C-related liver disease: 350,000-500,000 per year
Malaria: 627,000 per year
HIV/AIDS: 1.4-1.7 million per year
Non-communicable, with genetic predisposition
Prostate cancer: 307,000 per year
Breast cancer: 522,000 per year
Suicide: 800,000 per year
Diabetes: 1.5 million per year
Cancer: 8.2 million per year
Cardiovascular Disease: 17.5 million per year
Non-genetic/Non-infectious
Pesticide Poisoning: 250,000 per year
Malnutrition: 2.8 million children (under 5) per year
Data from World Health Organization Fact Sheets http://www.who.int/en/
Then… For all research:
From a Public Health perspective…
18. Beyond Altruism:
What Researchers Need
• Recognition:
• Obtained through Publication and Citation
• Money:
• Grants
• Promotions
Both are based on how much and
where you publish
20. What we’re doing at GigaScience
1. Requiring all data supporting work to be Freely available in a
publically available repository
– How we’re this:
• Journal-dedicated data and software repository GigaDB
that hosts ALL data types.
• Have a Biocurator(s) to aid in handling Metadata
• All Datasets are provided a Digital Object Identifier
(DOI) making them citable and countable (reward for
making data available- GigaDB is tracked by Thompson Reuters Data Citation Index)
• All Material in GigaDB is available under a CC0 Waiver
• Data with a publically approved database must be
submitted there as well
• Provide Direct links to all associated information
21. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
22. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
23.
24. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
28. The polar bear DATA was released –prepublication- in 2011
They were used and cited in the following studies- before the main paper on the
sequencing was published
Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct
bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.
Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting
theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345.
doi:10.1371/journal.pgen.1003345.
Morgan, CC et al., Heterogeneous models place the root of the placental mammal
phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.
Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus
maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from
Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.
Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene
Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
29. Even though the data had
been released 2 years earlier
and cited in other papers- the
main analysis paper was
published in Cell
30. Cell Press Journals had indicated
publishing a dataset prior to publication
could be considered as prior publication
31. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
33. For data citation to work, needs:
• Acceptance by journals.
• Data+Citation: inclusion in the references.
• Tracking by citation indexes.
• Usage of the metrics by the community…
34. Analysis of Data Citation Metrics
With the advent of easily available and collated information on Data
Citation, people are beginning to assess the levels of metrics obtained
from different servers and the uses of these data.
http://arxiv.org/abs/1501.03342
35. Funding
• Funding Agencies are now promoting open data release
• Explicitly including a data release plan in your grants can
improve your chances to obtain funding.
• Example: Recent release of information from the Bill and
Melinda Gates Foundation http://www.gatesfoundation.org/how-
we-work/general-information/open-access-policy
The NIH, the Welcome Trust, and other funding agencies are also doing this (as
well as specifically ignoring impact factor of publications)
37. What we’re doing at GigaScience
Requiring all software and work to be Freely available in a
publically available repository
– How we’re promoting this:
• Journal-Dedicated repository GigaDB that hosts
software so it can be downloaded.
• Software and Workflows are provided a DOI making
them citable and countable (reward)
• Journal-dedicated Galaxy Platform to run tools
• Have a Data Manager and Data Scientist to wrap and
deploy software tools
• All software created by authors must be open-source
39. USE preprint servers
• If you are not aware of preprint servers, these are places
where you can post your paper prior to submission to a
journal
• Many journals (more than you think) allow researchers to
do this (At GigaScience we recommend it.)
• Two preprint servers that are widely used by the
community are: Bioarchive (http://biorxiv.org/) for biology
papers and Arxiv (http://arxiv.org), for more
mathematical/physics based papers- but any research
paper can go hear as well)
• Editors are now looking at preprint servers and contacting
authors of papers they are interested in!
41. Don’t be an Anonymous Peer Reviewer!
• There is an increasing trend toward open-peer
review; including the reviewer’s name and
after publication access to the reviews
• Open peer review extensively expands the
transparency of the publication process
• We have found that open peer review is more
constructive and less antagonistic than
anonymous peer review.
42. What we’re doing at GigaScience
Peer Review
• Reviews are signed (Other Journals are doing this)
– Currently Opt-Out. Planning to make it mandatory
• All Reviews (and all pre-publication history) are
available upon publication.
43. Take the Reviewer’s Oath
• Dr. Mick Watson published a blog putting forth the idea of
reviewers taking an oath on how they will carry out peer
review:https://biomickwatson.wordpress.com/2013/02/11
/the-reviewers-oath)
• The Open Science Peer Review Oath (Aleksic et al) F1000 Research
http://f1000research.com/articles/3-271/v1
• Review by Jonathan Eisen http://icis.ucdavis.edu/?p=505
44. Giving Reviewers Credit
• We and other journals are starting to give DOI’s for
reviews that are open and named
– We are doing this because we were asked by researchers
and teachers how they could cite a peer review they found
of value.
• We are using a company called Publons,
https://publons.com, which hosts reviews under the
reviewer’s names where they can be tracked and read
and given credit.
• For every review we post in Publons, we mint a DOI so
that these can be cited and tracked.
• Publons currently has ~35,000 registered reviewers and
~86,000 reviews from ~5,700 journals
45. Open Peer Review as a New
Reviewer’s Tool
• Since many of you are beginning to be peer-
reviewers: Open Reviews are an excellent
learning tool!
• Go to any journal with open peer review and read
early versions of papers and reviews.
• Learn the names of reviewers who’s reviews you
respect, and follow their reviews on Publons.
• Register in Publons for when you start to openly
review. You can post your own reviews (or some
journals, like GigaScience, post them for you.)
46. Being an Open Scientist and Tenure
• It is possible! (And more and more probable)
• Example: Dr. C. Titus Brown
• From his blog: “On Gaining Tenure as an Open Scientist”
written after being awarded tenure as UC Davis.
– I blog and tweet about our research.
– All my senior-author papers are open access and were posted
as preprints.
– I post all of my single-author grants openly, as soon as I submit
them.
– All of our source code is openly available on github and most of
our papers are written in public on github.
– I sign almost all of my paper reviews and post many of them
(the ones that I remember to post ;) on my blog.
• http://ivory.idyll.org/blog/2014-open-and-tenured.html
Titus’ Take Home Message: “It is possible to achieve some measure
of traditional success while being open. Grants; publications;
tenure. 'nuff said.”
48. Scientific Publication as part of a
Continuum of Scientific Communication
Conferences
Preprints
Press Interviews
Public Discourse
Twitter
Interactive web tools
Collaboration
Blogs
Publication
Sharing Data
Sharing Tools
Education
Blogs
Twitter
Press Interviews
Education
Public Discourse
Sharing Data
Sharing Tools
49. The Continuum of Scientific
Communication
Publication should be just a stage
of research where one
component of the process has
been formally vetted and is
available within an easily
accessible and condensed format
Publication
Promote Real-Time Science
50. Infectious Disease
Measles: 334 per DAY
Hepatitis C-related liver disease: 959-1,369 per DAY
Malaria: 1,718 per DAY
HIV/AIDS: 3,836-4,758 per DAY
Non-communicable, with genetic predisposition
Prostate cancer: 841 per DAY
Breast cancer: 1,430 per DAY
Suicide: 2,192 per DAY
Diabetes: 4,110 per DAY
Cancer: 22,466 per DAY
Cardiovascular Disease: 47,945 million per DAY
Non-genetic/Non-infectious
Pesticide Poisoning: 685 per DAY
Malnutrition: 7,671 children (under 5) per DAY
Data from World Health Organization Fact Sheets http://www.who.int/en/
Finally— Consider…
For every DAY you wait to release information:
51. Several Blogs Worth Reading
• The Future of Science by Michael Nielsen
– http://michaelnielsen.org/blog/the-future-of-science-2
• The Reviewers Oath by Mick Watson
– https://biomickwatson.wordpress.com/2013/02/11/the-
reviewers-oath/
• How to Peer Review by Arjun Raj
– http://rajlaboratory.blogspot.com/2014/04/how-to-
review-paper.html
• On Gaining Tenure as an Open Scientist by C. Titus
Brown
– http://ivory.idyll.org/blog/2014-open-and-tenured.html
52. Thanks to:
Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Rob Davidson, Data Scientist
Xiao (Jesse) Si Zhe, Database Developer
Amye Kenall, Journal Development Manager
editorial@gigasciencejournal.com
database@gigasciencejournal.com
@GigaScience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog
Contact us:
Follow us:
www.gigasciencejournal.com
www.gigadb.org