SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Genome sharing projects
around the world
– and how you find data for
your research
Fiona Nielsen, October 2015
Find me on twitter: @glyn_dk
• In case my talk will be boring…
First the take home messages…
Do not forget:
By 2025 genome research will produce as much data
as Twitter /YouTube.
You do not have
enough statistical
power to interpret
your data
But
You can
improve your
study design
And
You can access
more data from
public genome
data repositories
As you all know…
Data output is going up
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
400K
Genomes
Sequenced
The output of human genome
sequencing data is growing at
exponential rates
Estimated number of human
genomes sequenced in 2015 
Population scale genome sequencing projects
Population scale genome
sequencing projects have
been launched all over
the world
Soon every research lab
and every genetic clinic
will have a DNA
sequencer
How much data do you need to publish a paper?
2001: 1 human genome
2012: 1000 Genomes (1092 genomes, since increased to ~2500)
2015:
UK10K, Icelandic population (2,636 + 100k imputed),
Cancer genome atlas ~11,000 genomes
Exac consortium 65,000 exomes
?
Statistically speaking, you still need 10s of thousands of samples for
validation
The more severe the phenotype and the more complete penetrance, the
easier it will be for you to find your variant, but
“As the genetic complexity of the disease increases (for example,
reduced penetrance and increased locus heterogeneity), issues of
statistical power quickly become paramount.”
http://www.nature.com/nrg/journal/v15/n5/full/nrg3706.html
But I am just looking at this one disease…
What can I do?
PRO TIP: involve a statistician early on in your study design!
How can I determine significance?
“One potentially powerful approach is to assess conservation across and within
multiple species as whole-genome sequence data become more abundant.”
Look at extreme phenotypes “Sampling cases or controls from the extremes of an
appropriate quantitative distribution can often increase power”
Look at non-SNP variants, they are more likely to have functional effects
- “how to account for the technical features of sequencing, such as incomplete
sequencing and biased coverage over the genome?”
Think of how you can provide evidence that your result is not just a local
technical variation or sampling bias
e.g. data from same cell type, same seq technology, same alignment…
How to account for bias?
PRO TIP: include more reference data in your analysis
• Know what data is available in your lab,
your dept, your org
• Survey from Qiagen showed that one of
the main reasons researchers collaborate
is to get access to data!
How can I access more data for my research?
How can I find collaborators?
PRO TIP: Search for collaborators who have the data you need
PRO TIP: Tell your colleagues and peers what type of data you
have in your lab
Where can I access data?
public repositories
• some you apply for access,
especially if data contains
clinical info or whole genome
PID
• some are open access: GEO,
SRA, PGP, OpenSNP, GigaDB, …
• some are consented for
general research use, some
have specific consent
It may be confusing
And it takes time
Bottlenecks:
• Finding relevant and usable
data
• Getting authorisation to
access data
• Formatting data
• Storing and moving data
We studied the problem by
qualitative interviews followed
by a survey of researchers in
human genetics
And it takes time
T. A. van Schaik et al
The need to redefine genomic
data sharing: a focus on data
accessibility, Applied &
Translational Genomics, 2014
10.1016/j.atg.2014.09.013
Researchers spend months to
find and access genomic data,
and often choose to not access
data at all
Barriers to access
Barriers to access
NIH / eRA Commons
login
No
Yes
Organisation registered
with eRA
Organisation has DUNS
number
No
No
Write research proposal
Yes
+ 2-3 days
+ 1-2 weeks
+ 1 week
Yes
Submit proposal
+ days to weeks
Access granted
Variable: from
weeks to months
dbGaP Application Process
Science…
Find/Download/Decryp
t data
+ 1-2 days
Why the barrier?
• Benefits: strict governance, review of consent, applicant signs for full
responsibility for governance
• Disadvantages: No control of data once access is given, high barrier for
access – too high?
• Start planning your data needs early in your project
• When you find the data you need, start application
• Use Open Access data
How can I save time?
PRO Tip: If you use human genomic data, apply for the GRU
datasets in dbGaP, one application – access to all the GRU
datasets
• Some data is Open Access  requires specific consent
• OpenSNP.org (Bastian)
• Personal Genomes Projects
• Individuals who put their genomes online, e.g. Manuel Corpas
and his family “the Corpasome”
• http://manuelcorpas.com/about/
Not all data is restricted
• Some data is Open Access  requires specific consent
• Individuals who put their genomes online, e.g. Manuel Corpas
and his family “the Corpasome”
• http://manuelcorpas.com/about/
• OpenSNP.org (Bastian)
• Personal Genomes Projects
Not all data is restricted
Personal Genome Project
PGP Harvard PGP Canada PGP UK Genom Austria
Host institution Harvard Medical School
Boston
SickKids Toronto University College London CeMM Research Center for
Molecular Medicine
Principal Investigator George Church Steven Scherer Stephan Beck Christoph Bock & Giulio
Superti-Furga
Launch year 2005 2012 2013 2014
Geographic scope USA, mainly Boston Canada United Kingdom Mainly Austria
Enrollment eligibility At least 18 years old, able to make an informed decision, perfect score in the PGP enrollment exam, certain vulnerable groups
excluded
Data Generated Whole genome sequencing,
upload of additional data
possible
Mainly whole genome
sequencing
Whole genome sequencing,
DNA methylome sequencing,
RNA transcriptome sequencing
Mainly whole genome
sequencing
Number of genomes 100s 10s 10s 10s
Data access
http://personalgenomes.org/harvard/data
http://genomaustria.at/unser-
genom/#genome-der-
pionierinnen
Project funding Discretional funds and
corporate sponsoring
Institutional startup funds Discretional funds and
corporate sponsoring
Institutional startup funds
Areas of emphasis Integration with phenotypic data,
collaboration with other personal
omics initiatives
Genome donations, synergy with
massive-scale clinical genome
sequencing projects
Genomes and society, genetic
literacy, school projects,
education
Website http://personalgenomes.org/harvard/ http://personalgenomes.org/canada/ http://personalgenomes.org/uk/ http://genomaustria.at/
Summary of data access barriers
Data is uploaded
to repository
Data is discovered
by potential user
Data is accessed
by potential user
Where is the data?
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
≈ 5K
Genomes
Available
400K
Genomes
Sequenced
Only a fraction of the data is
findable or available through
public repositories
• “even when researchers are authorised to share data they
report reluctance to do so because of the amount of effort
required“ http://www.sciencedirect.com/science/article/pii/S2212066114000386
• “Clinical geneticists cited a lack of time because their main priority is
diagnosing patients. Industrial researchers cited a lack of time because of
the pressure to meet the deadlines in their job. Researchers in academia
cited both a concern about the potential loss of future publications once
unpublished data is shared, and the lack of time and incentive to share
data as this does not contribute to their publication record. Researchers
from all categories felt that they lacked sufficient resources to make their
data available.”
The barrier of making data available
But I do not want to share my data
• If you expect data to be available to you
– you have to make your data available too!
• Encourage collaborations: power by numbers
1. Get credit – publish and make your data available
2. Give credit – cite data sources
3. Understand consent – for all uses of clinical data
Best practices
• Use all available tools to make your life easier:
• Data publications  visibility and citations for your data, e.g.
GigaScience
• Figshare, Zenodo, Dryad for sharing open access data
• PhenomeCentral, Matchmaker exchange for rare disease research
• Repositive for finding data across repositories and make your own data
discoverable
Best practices: use the tools
Does #OpenScience
matter at
proposal evaluation
Based on: Winning Horizon 2020 with Open Science,
http://dx.doi.org/10.5281/zenodo.12247
“Weakness: Involvement of non-
academic beneficiaries is limited”
“Weakness: highly focused on academic activities, and
lacks an advanced communication strategy”
“Weakness: limited exposure to
non-academic partners & infrastructures”
Excellence
Impact
Implementation
“data accessibility is unclear!”
“data storage & access not considered”
“Strengths: extensive dissemination of data to the
scientific community (open access, databases)”
“outreach activities to a broad audience”
“research software is freely available”
Impact:
Make the (research) world a better place by sharing in return 
Best practices
• Digital consent: towards automatic processing of applications
• Dynamic consent and power to the patient, e.g.
PatientsKnowBest
• Privacy-preserving access to datasets: preserving control and
governance with data custodian, lower barrier for access
What the future holds
In the meantime: It is a jungle out there!
What if finding data was as easy as finding a book on
Amazon, book a hotel on Expedia?
The Repositive vision
Enabling
efficient data
access
Incentivising
best practices
Trusted broker
for data
exchange
Repositive is a web platform
Discover new data sources
We are indexing all the public sources of
data, so users have an easy portal for
searching through data descriptions.
EASY
SEARCH
Repositive is a web platform
Make your data visible
As a two-sided marketplace, the users
can also make their own data findable.
SHARE
KNOWLEDGE
Active Repositive users increase benefits
Build a data community
BUILD
TRUST
Users can interact to find relevant
collaborators for their research either to
analyse their data or to combine data
sources.
Active Repositive users increase benefits
Find data collaborators
SAVE TIME
Feedback from other users through ratings
and comments helps users evaluate data
quality
Benefit for both sides
Data consumers Data producers
Find relevant data faster
Feedback from other users
through ratings and comments to
evaluate data quality
Find collaborators with data
Make your data visible
Build credibility as a trusted
provider of quality data
Find collaborators to analyse
your data
Live demo
Sign up as beta tester: http://repositive.io
Best practices - recap
• Get credit – publish data
• Give credit – cite data
• Understand consent
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data Increasing transparency in Medical Education through Open Data
Increasing transparency in Medical Education through Open Data
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer Nature
 
Open Science: Where Theory Meets Practice
Open Science: Where Theory Meets PracticeOpen Science: Where Theory Meets Practice
Open Science: Where Theory Meets Practice
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015Almaden presentation 15-dec-2015
Almaden presentation 15-dec-2015
 

Andere mochten auch

Clojureによるバイトコードプログラミング
ClojureによるバイトコードプログラミングClojureによるバイトコードプログラミング
Clojureによるバイトコードプログラミング
sohta
 
The drainage basin as a system lesson 2
The drainage basin as a system   lesson 2The drainage basin as a system   lesson 2
The drainage basin as a system lesson 2
Ms Geoflake
 
Recuperação paralela
Recuperação paralelaRecuperação paralela
Recuperação paralela
telasnorte1
 

Andere mochten auch (19)

Biometric System Penetration in Resource Constrained Mobile Device
Biometric System Penetration in Resource Constrained Mobile DeviceBiometric System Penetration in Resource Constrained Mobile Device
Biometric System Penetration in Resource Constrained Mobile Device
 
Teo Migdalovici @ The Best of FMCG
Teo Migdalovici @ The Best of FMCGTeo Migdalovici @ The Best of FMCG
Teo Migdalovici @ The Best of FMCG
 
Navigating the 5 c's - 2012 Beach Retreat
Navigating the 5 c's  - 2012 Beach RetreatNavigating the 5 c's  - 2012 Beach Retreat
Navigating the 5 c's - 2012 Beach Retreat
 
Barometrul educatiei si culturii antreprenoriale - 2014
Barometrul educatiei si culturii antreprenoriale - 2014Barometrul educatiei si culturii antreprenoriale - 2014
Barometrul educatiei si culturii antreprenoriale - 2014
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...
 
Clojureによるバイトコードプログラミング
ClojureによるバイトコードプログラミングClojureによるバイトコードプログラミング
Clojureによるバイトコードプログラミング
 
Campañas de Marketing en #BarcampUIO - 10/05/14
Campañas de Marketing en #BarcampUIO - 10/05/14Campañas de Marketing en #BarcampUIO - 10/05/14
Campañas de Marketing en #BarcampUIO - 10/05/14
 
Government Economic Service - Introduction and Application Process
Government Economic Service - Introduction and Application ProcessGovernment Economic Service - Introduction and Application Process
Government Economic Service - Introduction and Application Process
 
A Spirit Driven Ministry
A Spirit Driven MinistryA Spirit Driven Ministry
A Spirit Driven Ministry
 
Unit Five
Unit FiveUnit Five
Unit Five
 
Economía Popular y Solidaria - Foro 1 - Guido Tunala - ESPE
Economía Popular y Solidaria - Foro 1 - Guido Tunala - ESPE Economía Popular y Solidaria - Foro 1 - Guido Tunala - ESPE
Economía Popular y Solidaria - Foro 1 - Guido Tunala - ESPE
 
Call To Action Inc. Corporate presentation
Call To Action Inc. Corporate presentationCall To Action Inc. Corporate presentation
Call To Action Inc. Corporate presentation
 
Ácidos binarios
Ácidos binariosÁcidos binarios
Ácidos binarios
 
La buena alimentación diapositivas
La buena alimentación diapositivasLa buena alimentación diapositivas
La buena alimentación diapositivas
 
Vote for Whole Grain Crackers
Vote for Whole Grain CrackersVote for Whole Grain Crackers
Vote for Whole Grain Crackers
 
Dossier de presse "Aujourd'hui l'Ifsttar" - 22-23 septembre 2016
Dossier de presse "Aujourd'hui l'Ifsttar" - 22-23 septembre 2016Dossier de presse "Aujourd'hui l'Ifsttar" - 22-23 septembre 2016
Dossier de presse "Aujourd'hui l'Ifsttar" - 22-23 septembre 2016
 
The drainage basin as a system lesson 2
The drainage basin as a system   lesson 2The drainage basin as a system   lesson 2
The drainage basin as a system lesson 2
 
Recuperação paralela
Recuperação paralelaRecuperação paralela
Recuperação paralela
 
【抄読会】化膿性椎体炎の抗菌薬治療期間
【抄読会】化膿性椎体炎の抗菌薬治療期間【抄読会】化膿性椎体炎の抗菌薬治療期間
【抄読会】化膿性椎体炎の抗菌薬治療期間
 

Ähnlich wie Genome sharing projects around the world nijmegen oct 29 - 2015

Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshop
lindahauck
 

Ähnlich wie Genome sharing projects around the world nijmegen oct 29 - 2015 (20)

SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...
 
Data dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryData dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data Discovery
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
 
openSNP - Crowdsourcing Genome Wide Association Studies
openSNP - Crowdsourcing Genome Wide Association StudiesopenSNP - Crowdsourcing Genome Wide Association Studies
openSNP - Crowdsourcing Genome Wide Association Studies
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshop
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative...
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 

Mehr von Fiona Nielsen

Mehr von Fiona Nielsen (13)

EICT Summer School August 2023 - Things I never knew I never knew - about bu...
EICT Summer School August 2023 - Things I never knew  I never knew - about bu...EICT Summer School August 2023 - Things I never knew  I never knew - about bu...
EICT Summer School August 2023 - Things I never knew I never knew - about bu...
 
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona NielsenChallenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
 
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...AIDR2019 - standards - tools - incentives - what does it take to enable data ...
AIDR2019 - standards - tools - incentives - what does it take to enable data ...
 
Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?Genomics for the public is coming - are you ready or not?
Genomics for the public is coming - are you ready or not?
 
Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - sept 5 2017
 
Investing in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of RepositiveInvesting in innovation for genomic medicine - the journey of Repositive
Investing in innovation for genomic medicine - the journey of Repositive
 
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
 
ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016ICG-11 - genomic data projects around the world - nov 5 2016
ICG-11 - genomic data projects around the world - nov 5 2016
 
Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough
 
From Bioinformatics Scientist to Entrepreneur
From Bioinformatics Scientist to EntrepreneurFrom Bioinformatics Scientist to Entrepreneur
From Bioinformatics Scientist to Entrepreneur
 
Session 3 - big (biomedical) data
Session 3 - big (biomedical) dataSession 3 - big (biomedical) data
Session 3 - big (biomedical) data
 
Overcoming barriers for genomic data sharing yaac presentation may 23 2015
Overcoming barriers for genomic data sharing   yaac presentation may 23 2015Overcoming barriers for genomic data sharing   yaac presentation may 23 2015
Overcoming barriers for genomic data sharing yaac presentation may 23 2015
 
DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014DNAdigest Eagle Genomics Symposium March 27, 2014
DNAdigest Eagle Genomics Symposium March 27, 2014
 

Kürzlich hochgeladen

CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Kürzlich hochgeladen (20)

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 

Genome sharing projects around the world nijmegen oct 29 - 2015

  • 1. Genome sharing projects around the world – and how you find data for your research Fiona Nielsen, October 2015 Find me on twitter: @glyn_dk
  • 2. • In case my talk will be boring… First the take home messages…
  • 3. Do not forget: By 2025 genome research will produce as much data as Twitter /YouTube. You do not have enough statistical power to interpret your data But You can improve your study design And You can access more data from public genome data repositories
  • 4. As you all know…
  • 5. Data output is going up 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 400K Genomes Sequenced The output of human genome sequencing data is growing at exponential rates Estimated number of human genomes sequenced in 2015 
  • 6. Population scale genome sequencing projects Population scale genome sequencing projects have been launched all over the world Soon every research lab and every genetic clinic will have a DNA sequencer
  • 7. How much data do you need to publish a paper? 2001: 1 human genome 2012: 1000 Genomes (1092 genomes, since increased to ~2500) 2015: UK10K, Icelandic population (2,636 + 100k imputed), Cancer genome atlas ~11,000 genomes Exac consortium 65,000 exomes ?
  • 8. Statistically speaking, you still need 10s of thousands of samples for validation The more severe the phenotype and the more complete penetrance, the easier it will be for you to find your variant, but “As the genetic complexity of the disease increases (for example, reduced penetrance and increased locus heterogeneity), issues of statistical power quickly become paramount.” http://www.nature.com/nrg/journal/v15/n5/full/nrg3706.html But I am just looking at this one disease…
  • 9. What can I do? PRO TIP: involve a statistician early on in your study design!
  • 10. How can I determine significance? “One potentially powerful approach is to assess conservation across and within multiple species as whole-genome sequence data become more abundant.” Look at extreme phenotypes “Sampling cases or controls from the extremes of an appropriate quantitative distribution can often increase power” Look at non-SNP variants, they are more likely to have functional effects - “how to account for the technical features of sequencing, such as incomplete sequencing and biased coverage over the genome?”
  • 11. Think of how you can provide evidence that your result is not just a local technical variation or sampling bias e.g. data from same cell type, same seq technology, same alignment… How to account for bias? PRO TIP: include more reference data in your analysis
  • 12. • Know what data is available in your lab, your dept, your org • Survey from Qiagen showed that one of the main reasons researchers collaborate is to get access to data! How can I access more data for my research?
  • 13. How can I find collaborators? PRO TIP: Search for collaborators who have the data you need PRO TIP: Tell your colleagues and peers what type of data you have in your lab
  • 14. Where can I access data? public repositories • some you apply for access, especially if data contains clinical info or whole genome PID • some are open access: GEO, SRA, PGP, OpenSNP, GigaDB, … • some are consented for general research use, some have specific consent
  • 15. It may be confusing
  • 16. And it takes time Bottlenecks: • Finding relevant and usable data • Getting authorisation to access data • Formatting data • Storing and moving data We studied the problem by qualitative interviews followed by a survey of researchers in human genetics
  • 17. And it takes time T. A. van Schaik et al The need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 10.1016/j.atg.2014.09.013 Researchers spend months to find and access genomic data, and often choose to not access data at all
  • 19. Barriers to access NIH / eRA Commons login No Yes Organisation registered with eRA Organisation has DUNS number No No Write research proposal Yes + 2-3 days + 1-2 weeks + 1 week Yes Submit proposal + days to weeks Access granted Variable: from weeks to months dbGaP Application Process Science… Find/Download/Decryp t data + 1-2 days
  • 20. Why the barrier? • Benefits: strict governance, review of consent, applicant signs for full responsibility for governance • Disadvantages: No control of data once access is given, high barrier for access – too high?
  • 21. • Start planning your data needs early in your project • When you find the data you need, start application • Use Open Access data How can I save time? PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets
  • 22. • Some data is Open Access  requires specific consent • OpenSNP.org (Bastian) • Personal Genomes Projects • Individuals who put their genomes online, e.g. Manuel Corpas and his family “the Corpasome” • http://manuelcorpas.com/about/ Not all data is restricted
  • 23. • Some data is Open Access  requires specific consent • Individuals who put their genomes online, e.g. Manuel Corpas and his family “the Corpasome” • http://manuelcorpas.com/about/ • OpenSNP.org (Bastian) • Personal Genomes Projects Not all data is restricted
  • 24. Personal Genome Project PGP Harvard PGP Canada PGP UK Genom Austria Host institution Harvard Medical School Boston SickKids Toronto University College London CeMM Research Center for Molecular Medicine Principal Investigator George Church Steven Scherer Stephan Beck Christoph Bock & Giulio Superti-Furga Launch year 2005 2012 2013 2014 Geographic scope USA, mainly Boston Canada United Kingdom Mainly Austria Enrollment eligibility At least 18 years old, able to make an informed decision, perfect score in the PGP enrollment exam, certain vulnerable groups excluded Data Generated Whole genome sequencing, upload of additional data possible Mainly whole genome sequencing Whole genome sequencing, DNA methylome sequencing, RNA transcriptome sequencing Mainly whole genome sequencing Number of genomes 100s 10s 10s 10s Data access http://personalgenomes.org/harvard/data http://genomaustria.at/unser- genom/#genome-der- pionierinnen Project funding Discretional funds and corporate sponsoring Institutional startup funds Discretional funds and corporate sponsoring Institutional startup funds Areas of emphasis Integration with phenotypic data, collaboration with other personal omics initiatives Genome donations, synergy with massive-scale clinical genome sequencing projects Genomes and society, genetic literacy, school projects, education Website http://personalgenomes.org/harvard/ http://personalgenomes.org/canada/ http://personalgenomes.org/uk/ http://genomaustria.at/
  • 25. Summary of data access barriers Data is uploaded to repository Data is discovered by potential user Data is accessed by potential user
  • 26. Where is the data? 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 ≈ 5K Genomes Available 400K Genomes Sequenced Only a fraction of the data is findable or available through public repositories
  • 27. • “even when researchers are authorised to share data they report reluctance to do so because of the amount of effort required“ http://www.sciencedirect.com/science/article/pii/S2212066114000386 • “Clinical geneticists cited a lack of time because their main priority is diagnosing patients. Industrial researchers cited a lack of time because of the pressure to meet the deadlines in their job. Researchers in academia cited both a concern about the potential loss of future publications once unpublished data is shared, and the lack of time and incentive to share data as this does not contribute to their publication record. Researchers from all categories felt that they lacked sufficient resources to make their data available.” The barrier of making data available But I do not want to share my data
  • 28. • If you expect data to be available to you – you have to make your data available too! • Encourage collaborations: power by numbers 1. Get credit – publish and make your data available 2. Give credit – cite data sources 3. Understand consent – for all uses of clinical data Best practices
  • 29. • Use all available tools to make your life easier: • Data publications  visibility and citations for your data, e.g. GigaScience • Figshare, Zenodo, Dryad for sharing open access data • PhenomeCentral, Matchmaker exchange for rare disease research • Repositive for finding data across repositories and make your own data discoverable Best practices: use the tools
  • 30. Does #OpenScience matter at proposal evaluation Based on: Winning Horizon 2020 with Open Science, http://dx.doi.org/10.5281/zenodo.12247
  • 31. “Weakness: Involvement of non- academic beneficiaries is limited” “Weakness: highly focused on academic activities, and lacks an advanced communication strategy” “Weakness: limited exposure to non-academic partners & infrastructures” Excellence Impact Implementation “data accessibility is unclear!” “data storage & access not considered”
  • 32. “Strengths: extensive dissemination of data to the scientific community (open access, databases)” “outreach activities to a broad audience” “research software is freely available” Impact:
  • 33.
  • 34. Make the (research) world a better place by sharing in return  Best practices
  • 35. • Digital consent: towards automatic processing of applications • Dynamic consent and power to the patient, e.g. PatientsKnowBest • Privacy-preserving access to datasets: preserving control and governance with data custodian, lower barrier for access What the future holds
  • 36. In the meantime: It is a jungle out there! What if finding data was as easy as finding a book on Amazon, book a hotel on Expedia?
  • 37. The Repositive vision Enabling efficient data access Incentivising best practices Trusted broker for data exchange
  • 38. Repositive is a web platform Discover new data sources We are indexing all the public sources of data, so users have an easy portal for searching through data descriptions. EASY SEARCH
  • 39. Repositive is a web platform Make your data visible As a two-sided marketplace, the users can also make their own data findable. SHARE KNOWLEDGE
  • 40. Active Repositive users increase benefits Build a data community BUILD TRUST Users can interact to find relevant collaborators for their research either to analyse their data or to combine data sources.
  • 41. Active Repositive users increase benefits Find data collaborators SAVE TIME Feedback from other users through ratings and comments helps users evaluate data quality
  • 42. Benefit for both sides Data consumers Data producers Find relevant data faster Feedback from other users through ratings and comments to evaluate data quality Find collaborators with data Make your data visible Build credibility as a trusted provider of quality data Find collaborators to analyse your data
  • 43. Live demo Sign up as beta tester: http://repositive.io
  • 44. Best practices - recap • Get credit – publish data • Give credit – cite data • Understand consent

Hinweis der Redaktion

  1. It has been shown that the combination of summary single-variant statistics from multiple data sets, rather than the joint analysis of a combined data set, does not result in an appreciable loss of information85, and that taking into account heterogeneity in effect size across studies can improve statistical power
  2. “Although they are harder to call and annotate, insertion or deletions, multinucleotide variants and structural variants (including copy-number variants, translocations and inversions) constitute a smaller set of variation (in terms of the number of discrete events an individual is expected to carry) relative to all SNVs and are more likely to have functional effects.”
  3. It has been shown that the combination of summary single-variant statistics from multiple data sets, rather than the joint analysis of a combined data set, does not result in an appreciable loss of information85, and that taking into account heterogeneity in effect size across studies can improve statistical power
  4. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
  5. Public repositories: default is apply for access -> full access Benefits: strict governance, review of consent, applicant signs for full responsibility for governance Disadvantages: No control of data once access is given, high barrier for access – too high? (researchers giving up, even patients can’t get access to their own data)
  6. Cost of data is going down Data production is going up Growing problem Market opportunity for solutions!
  7. ODP trained, EURO-BASIN manager, – a boring title, for a diverse job, in an exciting research domain. DIP into EACH step of the research cycle, from proposal formulation to providing the best return-on-investment to the funders. So I`d like to share with you some experiences from the last few years of OS advocacy in the Marine Science Community
  8. Excellence at your Research Subject is … excellent, but is it ENOUGH ? To be successful, a candidate will be judged on being complete. MESSAGE: FOSUC only on IF could expose you to risk
  9. ODP trained, EURO-BASIN manager, – a boring title, for a diverse job, in an exciting research domain. DIP into EACH step of the research cycle, from proposal formulation to providing the best return-on-investment to the funders. So I`d like to share with you some experiences from the last few years of OS advocacy in the Marine Science Community
  10. So, if the IMPACT FACTOR is no good, how will it evolve in future? Here is an example from the UK, on how Research Institutes are evaluated … The key message here is that, in future, funders will place even more emphasis on ”Societal Impact” in future, but more pertinent for you right now and today is that it is already affecting your chances for Post-Doc funding.
  11. For Jenn, the inaccessibility of data means it takes her up to 6 months to find and up to 6 months to access to the data she needs for analysis. But for clinical cases like Mabel, she only has days to finish her analysis! THIS IS RIDICULOUS BECAUSE: Today one can:   - Find any hotel on Trivago - Find any book on amazon – with feedback from other users - But researchers have nowhere to find and acess (human) genomics data! 
  12. The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices When Jenn needs data for a specific disease she makes a search on Repositive to find the data directly, understand the value of the data based on feedback from the Repositive user community, access the data securely, because she knows that …Repositive is the trusted broker for secure and efficient data exchange … No more hassles of finding data or hassles of exchanging data with collaborators, [show dbGaP screenshots with a cross over it]
  13. We are changing the landscape of genomics research through the Repositive platform which we have just launched in private beta. Providing the search facility for rapid data discovery of existing data sources and make your own data visible to community We are indexing all the public sources of data, so users have one easy portal for searching through data descriptions The platform UI is described by our users as “slick” “easy” and “refreshing” compared to other bioinformatics tools
  14. We are changing the landscape of genomics research through the Repositive platform which we have just launched in private beta. Providing the search facility for rapid data discovery of existing data sources and make your own data visible to community We are indexing all the public sources of data, so users have one easy portal for searching through data descriptions The platform UI is described by our users as “slick” “easy” and “refreshing” compared to other bioinformatics tools
  15. Providing the community for peer feedback to help you determine what data is relevant Providing the technology to get data insights for secure and efficient data access, e.g. privacy-preserving technology, to remove the barriers for making data available and accessible  
  16. Providing the community for peer feedback to help you determine what data is relevant Providing the technology to get data insights for secure and efficient data access, e.g. privacy-preserving technology, to remove the barriers for making data available and accessible  
  17. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data
  18. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data