2. www.gigasciencejournal.com
Journal, data-platform and
database for large-scale data
Editor-in-Chief: Laurie Goodman
Executive Editor: Scott Edmunds
Commissioning Editor: Nicole Nogoy
Lead Curator: Chris Hunter
Data Platform: Peter Li
in conjunction with
3. GigaProject: deconstructing the paper
www.gigadb.org
www.gigasciencejournal.com
Worlds largest genomics organisation with:
20PB storage, 20.5K cores, 212TFlops,
>1000 bioinformaticians
Utilizes big-data infrastructure and expertise from:
Combining and integrating:
Open-access journal
Data Publishing Platform
Data Analysis Platform
4. V
Lessons Learned from Genomics:
Sharing is important…
Bermuda Accords 1996/1997/1998
Fort Lauderdale Agreement, 2003
5. Sharing aids individuals…
Piwowar HA, Day RS, Fridsma DB (2007)
PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
Sharing Detailed Research
Data Is Associated with
Increased Citation Rate.
Every 10 datasets collected contributes to at least 4 papers in the
following 3-years.
Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473
(7347), 285-285 DOI: 10.1038/473285a
8. To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G;
Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S;
Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z;
Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and
the Escherichia coli O104:H4 TY-2482 isolate genome sequencing
consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
9.
10.
11. “The way that the genetic data of the 2011 E. coli strain were disseminated
globally suggests a more effective approach for tackling public health
problems. Both groups put their sequencing data on the Internet, so scientists
the world over could immediately begin their own analysis of the bug's
makeup. BGI scientists also are using Twitter to communicate their latest
findings.”
“German scientists and their colleagues at the Beijing Genomics Institute in China have
been working on uncovering secrets of the outbreak. BGI scientists revised their draft
genetic sequence of the E. coli strain and have been sharing their data with dozens of
scientists around the world as a way to "crowdsource" this data. By publishing their data
publicy and freely, these other scientists can have a look at the genetic structure, and try
to sort it out for themselves.”
12.
13. Downstream consequences:
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
1. Citations (~160) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science
14. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully illustrated by
events following an outbreak of a severe gastro-intestinal infection in
Hamburg in Germany in May 2011. This spread through several
European countries and the US, affecting about 4000 people and
resulting in over 50 deaths. All tested positive for an unusual and
little-known Shiga-toxin–producing E. coli bacterium. The strain was
initially analysed by scientists at BGI-Shenzhen in China, working
together with those in Hamburg, and three days later a draft
genome was released under an open data licence. This generated
interest from bioinformaticians on four continents. 24 hours after
the release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site dedicated
to the analysis of the strain. These analyses provided crucial
information about the strain’s virulence and resistance genes – how
it spreads and which antibiotics are effective against it. They
produced results in time to help contain the outbreak. By July
2011, scientists published papers based on this work. By opening up
their early sequencing results to international
collaboration, researchers in Hamburg produced results that were
quickly tested by a wide range of experts, used to produce new
knowledge and ultimately to control a public health emergency.
21. The Peoples Parrot: Amazona vittata
Puerto Rican Parrot Genome Project
Rarest parrot, national bird of Puerto Rico
Community funded from artworks, fashion shows, beer, crowdfunding…
Genome annotated by students in community college as part of bioinformatics education
Paper and Data published in GigaScience and GigaDB
Taras K Oleksyk, et al., (2012) A Locally Funded Puerto Rican Parrot (Amazona vittata) Genome Sequencing Project Increases Avian Data and Advances Young
Researcher Education. GigaScience 2012, 1:14
Steven J. O’Brien. (2012): Genome empowerment for the Puerto Rican parrot – Amazona vittata. GigaScience 2012, 1:13
Oleksyk et al., (2012): Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project. GigaScience.
http://dx.doi.org/10.5524/100039
22. Ruibang Luo (BGI/HKU)
Shaoguang Liang (BGI-SZ)
Tin-Lap Lee (CUHK)
Huayen Gao (CUHK)
Qiong Luo (HKUST)
Senghong Wang (HKUST)
Yan Zhou (HKUST)
Thanks to:
@gigascience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog/
Peter Li
Chris Hunter
Jesse Si Zhe
Nicole Nogoy
Tam Sneddon
Alexandra Basford
Laurie Goodman
Follow us:
www.gigadb.org
galaxy.cbiit.cuhk.edu.hk
www.gigasciencejournal.com
CBIIT
Funding from:Our collaborators:team:
Hinweis der Redaktion
That just leaves me to thank the GigaScience team: Laurie, Scott, Alexandra, Peter and Jesse, BGI for their support - specifically Shaoguang for IT and bioinformatics support – our collaborators on the database, website and tools: Tin-Lap, Qiong, Senhong, Yan, the Cogini web design team, Datacite for providing the DOI service and the isacommons team for their support and advocacy for best practice use of metadata reporting and sharing.Thank you for listening.