Presented by Fiona Nielsen at the International Conference of Genomics at China National Genebank, Shenzhen http://www.icg-11.org
I present the "WHY" of what I am doing, and how I got here. A personal story of frustration, science and family.
Session chaired by Laurie Goodman, Gigascience
3. The long road to results
Raw reads
Read QC
Variant calling
Analysis-ready
reads
Analysis-ready
variants
Variant
Annotation Genotype
Refinement Raw Indels
Raw SVs
Raw SNPs
Mapping
External Data
11. You are not the only one
T. A. van Schaik et al
The need to redefine genomic
data sharing: a focus on data
accessibility, Applied &
Translational Genomics, 2014
10.1016/j.atg.2014.09.013
Researchers spend months to
find and access genomic data,
and often choose to not access
data at all
12. 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Genomes Sequenced
~400k
genomes
produced
And how do you keep up?
29. Making your contribution
should be easy
Supporting best practices:
• get credit for making data available
• give credit to data producers
• Initiate new collaborations
40. Read more at http://repositive.io and http://DNAdigest.org
Thanks for listening!
41. repositive [ re-poz-i-tiv ], noun;
1. a positive experience of accessing
genomic data repositories
Read more at http://repositive.io and tweet us @repositiveio
Hinweis der Redaktion
People are dying from genetic diseases even though the data for their diagnostic and cure is ‘out there’ – but today that data is not accessible
I am Fiona Nielsen, we are Repositive and we this is our vision for redefining genomic data sharing
How did this happen?
I need data for my analysis
Go to GEO – but this is only expression data
Go to GEO – but this is only expression data
You go to dbGaP -
You ask a colleague for help
You go to pubmed to find out what others did
You ask your supervisor for permission to apply for access to a dataset in dbGaP
You write an application for access
You ask your supervisor and possibly your sysadmin to sign the application
You submit you access application
You wait for a response… (can take up to 6 months!!!)
How did this happen?
And what if I want to register my data sets for sharing?
Illustration: another complicated process, fit in the boxes, time consuming
And how to keep up with the latest data?
400k genomes produced an the large proportion is private
And how to keep up with the latest data?
400k genomes produced an the large proportion is private
Searches can be made simple
And how to keep up with the latest data?
400k genomes produced an the large proportion is private
You could search for data in the context of all available data?
You could see comments and reactions from your peers on the data they have used?
What if you could rank data by quality?
What if you could query data directly through API access directly in your scripts and workflows?
Searching for relevant data was as easy as finding the right hotel in a city you have not been before?
Search results can be easy to navigate
FIGSHARE
Collaboration is easy
You could discover new resources you did not know existed?
And you are invited
Sign up for BETA TESTING
And you are invited
Sign up for BETA TESTING
And you are invited
Sign up for BETA TESTING
And you are invited
Sign up for BETA TESTING
And you are invited
Sign up for BETA TESTING
And you are invited
Sign up for BETA TESTING
Through your daily work routine you gain visibility for potential collaborators?
Through your daily work routine you gain visibility for potential collaborators?