SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
Acknowledgments
Genomic Medicine & Translational Pathology, University of Melbourne:
Arthur Lian Chi Hsu, Renate Marquis-Nicholson, Sebastian Lunke, Clare Love, Kym Pham,
Olga Kondrashova, Matt Wakefield, Tiffany Cowie, Barney Rudzki and Paul Waring
Human Variome Project
Tim Smith, Alan Lo, Melvyn Leong, David Perkins, Heather Howard, Rania Horaitis
Dick Cotton
BioGrid
Maureen Turner, Leon Heffer
Royal College of Pathologists of Australasia
Vanessa Tyrrell
Peter MaCallum Cancer Centre
Ken Doig, Andrew Fellowes
Victorian Clinical Genetics Service
John-Paul Plazzer, Desiree Du Sart
Human Variome Project (Australasia)
• The bigger picture
• Infrastructure and search interface
• Linkage to other datasets
• Panel, exome and genome testing
• Database accreditation
• Next steps
The big picture
• Rediscovery at the genomics community level
that data sharing is win-win
• The Genomic Alliance, HGVS, HUGO
– Data standards
– Nomenclature
– Infrastructure
Nature (Perspective) 508 469-475 2014
Guidelines for investigating causality of sequence variants in human disease
D. G. MacArthur, T. A. Manolio, D. P. Dimmock, H. L. Rehm, J. Shendure, G. R. Abecasis, D. R. Adams, R. B. Altman, S. E. Antonarakis, E.
A. Ashley, J. C. Barrett, L. G. Biesecker, D. F. Conrad, G. M. Cooper, N. J. Cox, M. J. Daly, M. B. Gerstein, D. B. Goldstein, J. N. Hirschhorn,
S. M. Leal, L. A. Pennacchio, J. A. Stamatoyannopoulos, S. R. Sunyaev, D. Valle, B. F. Voight, W. Winckler & C. Gunter.
Priorities for research and infrastructure development
1. Improved public databases of human genetic variants incorporating explicit, up-to-date supporting
evidence for variant implication in disease and audit trails recording changes in interpretation.
2. Improved incentives, and ethical and logistical solutions, for sharing of genetic and phenotypic data from
both research and clinical diagnostic laboratories.
3. Public databases of variant and allele frequency data from large sets of population reference samples
from a wide range of ancestries.
4. Large-scale genotyping of reported human disease-causing variants in large, well-phenotyped
population cohorts, reducing biases in the assessment of the associated penetrance and phenotypic
heterogeneity.
5. Development and benchmarking of standardized, quantitative statistical approaches for objectively
assigning probability of causation to new candidate disease genes and variants.
DĂŠjĂ  vu all over again?
Nature Genetics 46, 107–115 (2014)
Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch
repair gene variants in the InSiGHT locus-specific database
Bryony A Thompson, Amanda B Spurdle, John-Paul Plazzer, Marc S Greenblatt, Kiwamu Akagi, Fahd Al-Mulla, Bharati Bapat, Inge
Bernstein, Gabriel Capellá, Johan T den Dunnen, Desiree du Sart, Aurelie Fabre, Michael P Farrell, Susan M Farrington, Ian M
Frayling, Thierry Frebourg, David E Goldgar, Christopher D Heinen, Elke Holinski-Feder, Maija Kohonen-Corish, Kristina Lagerstedt
Robinson, Suet Yi Leung, Alexandra Martins, Pal Moller, Monika Morak, Minna Nystrom, Paivi Peltomaki, Marta Pineda, Ming Qi,
Rajkumar Ramesar, Lene Juel Rasmussen, Brigitte Royer-Pokora, Rodney J Scott, Rolf Sijmons, Sean V Tavtigian, Carli M Tops,
Thomas Weber, Juul Wijnen, Michael O Woods, Finlay Macrae & Maurizio Genuardi, on behalf of InSiGHT.
Nature Genetics 46, 107–115 (2014)
1. Leiden Open Variation Database (LOVD)
2. Micro- attribution using Open Researcher & Contributor Identification (ORCID)
3. Variant Interpretation Committee (VIC) apply a 5-tiered scheme developed by the
International Agency for Research on Cancer (IARC) classification system
4. Endorsed by the Human Variome Project (HVP)
Not everything in the Nature portfolio is gold
It is good to supplement your pocket money
Early nomenclature papers
• Beaudet
• Tsui
• Antanorakis
Translation into diagnostic practice
• 15 years ago Cotton predicted that the
majority of human genetic variants will be
detected in a diagnostic context
• As NGS moves into a service setting this
transition will become even clearer
• Genetic variants will become part of a
patient’s medical record
HVPA database
• Primarily for and of diagnostics
• Diagnostic services are busy
• And cash and time limited
• We have to make it easy for them
• And secure
• And useful
• Maybe even essential
HVPA Objective
A national data sharing facility for improving
clinical genetic testing services and supporting
medical research
Constitutional, not somatic, mutations
NECTAR project grant UoM FE31082
“Clinical and Molecular Data Linkage Tools”,
completion date 30th June 2014
Infrastructure and search interface
• Data repository (“the database”)
• Data handling tools that support data upload
from laboratories
• Portal though which the database can be
browsed
• Website for news and notifications
Human Variome Project Australian
Node
What We’ve Done
• NeAT Funding (2010-2011)
– Pilot Phase
– 4 labs, 3 diseases
• Breast Cancer
• Colon Cancer
• Huntington’s
– Portal Launched April 2011
– Molecular Data Only
– Collaboration with Mawson
• NeCTAR Funding (2012-2014)
– 12 more labs + all genes they test
for
– Configuration Tool
– Clinical Data/Phenotype Linkage
– Transfer data internationally
What We Built
• Collection Tool
• Portal
• Data Model
• Ethics Processes
• Access & Usage Policy
• Data Sharing Agreements
How it works
• Software to interface with existing LIMS (or lack thereof)
• Collection occurs after report has been issued
• Data types:
– All classified variants reported by a lab
– Benign variants
– NGS/Incidental findings
– Not collecting negative results
• Secure data link between lab and Node
• (Semi)-automatic transfer of data
• Portal to allow interrogation of all Australian data
– http://www.hvpaustralia.org.au
• Linkage key generator
• Submission to BioGrid Platform
Open-Source Solutions
• HVP Portal (v1.0, r512) - A web application which features the basic
interface for browsing and querying a HVP node.
– Open source – MIT License
– Python/django
• HVP Exporter (v1.0, r512) - Basic HVP exporting tool for
laboratories. Features simple GUI and error checking interface,
plug-in architecture for customisation between sites and common
libraries for working with MS Access and MS Excel data sources
– Open source – MIT License
– .NET C#, python/ironpython
• HVP Importer (v1.0, r512) - A series of tools and web services that
receive, decrypt and process information by submitting laboratories
using the standard transaction XML format
– Open source – MIT License
– python
Access to HVPA
• Controlled Access
– Diagnostic Lab Staff
– Registered Medical Practitioners
– Board Certified Genetic Counsellors
• Online application
HVPA Status at November 2013
Strengths
1. Database available on demand
for diagnostic labs
2. Tools for data sharing
3. Community engagement with
RCPA (QUUP), SA/Mawson,
BioGrid, VCGS
4. National reach with
international connections via
HVPI, WHO & UNESCO
Weaknesses
1. Performance of the existing
HVPA database is limited
2. Laboratory buy-in to the
database across Australia is
limited
3. The database itself has been
hard to access because of low
server bandwidth
4. The project has not anticipated
the likely impact of next
generation sequencing and risks
missing inclusion in genomic-
scale initiatives now underway.
HVPA 24th March 2014
• 5 laboratories submitting
• 295 Unique Variants
• 27410 Instances
• 25 Registered users
Developments proposed in November
ID Area Idea Priority
1 B. Presentation Statistics of number of variants for that gene as table or bar graph (# unique, # instances, top 5
qty submitted)
1
15 D. Feedback Raise a concern about an instance's interpretation 1
2 A. Search Search by range 2
3 A. Search Search by genomic position 2
4 A. Search Filter by pathogenicity 2
5 B. Presentation Sort by ... (pathogenicity, other fields) 2
6 C. Relevant Info Display links to related database for gene by referencing genenames.org 2
7 A. Search Wildcard search of variants 2
9 A. Search Search by disease which shows multiple genes and variant results 2
10 E. NGS VCF data imports into HVP Australia 2
13 B. Presentation VarVis - visualisation of gene and variants reported 2
11 B. Presentation VCF data export from HVP Australia of a set of results 3
12 B. Presentation At instance level - see other variants from this test/patient 3
14 C. Relevant Info Capture & display SIFT score 3
16 D. Feedback Notify labs the general concensus of pathogencity of something they submitted has
changed/updated. i.e They submitted benign and its now likely pathogenic or submited
unknown and know its something else
3
17 B. Presentation Integration with EBI/NCBI tools for queries and displays 3
19 B. Presentation Display last date uploaded for this variant (or last 10 dates) 3
Accessing the test database
http://115.146.85.61/
Username:
lab_tester
Password:
hvpaustralia2013
Search Interface
• The search interface has to provide useful tools for
clinicians and lab scientists so that the HPVA project offers
them direct benefits and incentivises them to participate.
Following a request for feedback from users, a series of
improvements were implemented, initially on a
demonstration server and then on the live server following
review by the Steering Committee. The highest priorities
were for more information about numbers of times
particular variants were recorded, the ability to search by
range and to filter by pathogenicity. There was also interest
in enabling direct uploading of VCF files and the automated
calculation of pathogenicity scores. Many of these features
are now implemented and examples will be presented.
Purpose of the HVPA Database
• Working database
– Record and share diagnostic quality data genetic variation
data
– Integrate with clinical phenotype data
– Integrate with international efforts
– Heads up for NGS gene panel data sets
• Test database
– Showcase enhancements
– Real world testing and feedback
– Uses data edited from actual database
– Not accurate or reliable: some parameters edited for test
purposes
Major improvements to search facility
Searching by expression match
BRCA BR
Instances of a variant
Pathogenic Variants
Direct Import from Results Lists
• Can recover historical data sets
• Reformat on the fly
• Useful as low-overhead catch up to enable labs to
transition to using uplaoding toals as their IT
permits
– PathWest (John Bielby)
– Institute of Health and Biomedical Innovation,
Queensland (Lyn Griffiths)
– Kconfab (Heather Thorne)
– Peter MaCallum Cancer Centre (Ken Doig)
Variant Fields Mandatory
GeneName RefSeqName RefSeqVer cDNA mRNA Genomic Protein Location
Official HGNC
Symbol
Name of
reference
sequence (NCBI's
RefSeq project)
Version of
reference
sequence
(RefSeq)
HGVS variant
name (c.)
HGVS variant
name (m.)
HGVS variant
name (g.)
HGVS variant
name (g.)
Exon or intron
number
VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255)
Mandatory Mandatory Mandatory At least one required
Pathogenicity PatientID TestID InstanceDate GenomicRefSeq GenomicRefSeqVer
Level of pathogenicity
(1=Pathogenic, 2=Possibly
Pathogenic, 3=Unknown,
4=Possible benign,
5=Certainly Benign)
Internal ID for
the patient
used within
the lab
Internal ID
for the test
used within
the lab
Date instance
was tested
Genomic
reference
sequence
Genomic reference
sequence version
VARCHAR(20) DateTime VARCHAR(255) VARCHAR(255)
Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory
Variant Fields (Optional)
PatientAge TestMethod SampleTissue SampleSource Justification
Age of patient
when test was
taken
The name of the
test method used
Type of sample
taken
The source of the
sample e.g.: DNA,
g.DNA, RNA...
Justification by medical
scientist
INT32 VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(65535)
Optional Optional Optional Optional Optional
PubMed RecordedInDatabase SampleStored
VariantSegregatesWi
thDisease HistologyStored
PedigreeA
vailable SIFTScore
PubMed
Identifier/Data
Object Identifier
Whether it is
recorded in disease
specific or gene
specific
Whether lab still
has sample left
Whether pedigreee
was consideed during
diagnosis of
pathogenicity
Whether
histograms are
stored
Whether
organisati
on has
pedigree
data
Calculated
SIFT Score
VARCHAR(255) Boolean Boolean Boolean Boolean Boolean INT32
Optional Optional Optional Optional Optional Optional Optional
Linkage to other datasets
• HVPA have implemented the hash key
algorithm and work is in progress with BioGrid
to link variation data to clinical data sets.
• More details from Maureen Turner, BioGrid
CEO who is speaking at this meeting
Cost and performance will force
diagnostic labs to adopt NGS as front-line approach
cost per base Illumina share price
Hype cycle
HVPA LOVD3 database pilot
• Established an HVPA LOVD3 database and
working with the Human Genetics Society of
Australasia on a pilot study to sequence the
exomes of two trios and review the data using
this database.
• Includes exome-scale data
• Open access to Coriell cases with no “consent”
issues
• Explore staging of variant “credibility
classification” and access
Relationship to Gene Panel Databases?
e.g. http://genomics.bio21.unimelb.edu.au/lovd/
Melbourne Genomics Health Alliance
34
• Clinically led, rather than technology driven
• Fostering ‘end use’ of genomic data
• Common clinical repository
• Prospective : first tier test
• Evaluation to inform implementation
• Engineering collaboration
• Fostering system change
• A/Prof Clara Gaff: Program Leader
PARADIGM FOR IMPLEMENTING GENOMIC MEDICINE
35
Melbourne Genomics Health Alliance
Connected nationally
and internationally
36
How many variants per exome?
SNP count Study
20,000 Choi et al. PNAS 2009
142,000 Mullikin NIH, unpublished 2010
50,000 Clark et al. Nature biotechnology 2011
125,000 Smith et al. Genome Biology 2011
100,000 Johnston & Biesecker Human Molecular Genetics 2013
200,000 to 400,000 Yang et al.N Engl J Med 2013
• 20-fold range
• Exome designs vary
• Likely to be higher variant count in African populations as the
reference sequence is non-African
Low concordance of multiple
variant-calling pipelines
Rawe et al Genomic Medicine 2013
• 15 exomes
• 4 families
• HiSeq 2000
• Agilent SureSelect v.2
• ~120X mean coverage
• SOAP, BWA-GATK, BWA-SNVer,
GNUMAP, and BWA- SAMTools
• SNV concordance between five Illumina
pipelines across all 15 exomes was 57.4%
• 0.5-5.1% variants were called as unique to
each pipeline
• Indel concordance was only 26.8% between
three indel calling pipelines
• 11% of CG variants that fall within targeted
regions in exome sequencing were not called
by any of the Illumina-based exome analysis
pipelines
• 97.1%, 60.2% and 99.1% of the GATK-only,
SOAP-only and shared SNVs can be validated
• 54.0%, 44.6% and 78.1% of the GATK-only,
SOAP-only and shared indels can be validated
• Additional accuracy gained in variant
discovery by having access to genetic data
from a multi- generational family
Low concordance of multiple variant-calling pipelines
O’Rawe et al. Genome Medicine 2013, 5:28
SNV concordance: 57.4% Indel concordance 26.8%
Venn diagrams of selected CNV detection
methods in real data processing
Duan J, Zhang J-G, Deng H-W, Wang Y-P (2013) Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing
Technologies. PLoS ONE 8(3): e59128. doi:10.1371/journal.pone.0059128
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059128
Sequence errors
Post processing errors
Remove errors before processing
K-mer selection
Merging'forward'and'reverse'reads'
0
200
400
600
800
1000
1200
1400
1600
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTTTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGTATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGTA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATTAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
TAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATCTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGGAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAGAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
Discard rare reads
Use a HiFi polymerase
Four capture panels at SOD1
• Known SNV concordance 100%, all assays
• Known indel <6bp concordance 100%, all assays
• Not able to detect c9orf72 hexanucleotide expansion or PRNP
octapeptide region repeat with standard pipeline
• Diagnostic yield within appropriate clinical context (based on
very limited sample size)
- NimbleGen SeqCap EZ Neuro: 33% (2/6)
- Nextera Neuro: 23% (6/26)
Results – detection of variants
Filtering Variants
All variants None Qual Not in Blood
Blood 9828 8551 NA
Frozen 9920 8736 126
FFPE 9709 8163 199
Variants in Gene List None Qual Not in Blood
Blood 27 18 NA
Frozen 27 23 2 (EGFR)
FFPE 25 19 3 (EGFR, ROS)
EGFR p.L858R
EGFR p.T790M
Confirmation by PCR
0.0
50.0
100.0
150.0
200.0
250.0
EGFR_NM
_005228.3
T790
T790
W
T
EGFR_NM
_005228.3
784
"c.2350T>C,p.S784P"
EGFR_NM
_005228.3
784
"c.2351C>T,p.S784F"
EGFR_NM
_005228.3
785
"c.2354C>T,p.T785I"
EGFR_NM
_005228.3
786
"c.2356G>A,p.V786M
"
EGFR_NM
_005228.3
790
"c.2368A>G,p.T790A"
EGFR_NM
_005228.3
790
"c.2369C>T,p.T790M
"
EGFR_NM
_005228.3
828
&
861
"828
&
861,w
t"
EGFR_NM
_005228.3
858
"c.2572C>A,p.L858M
"
EGFR_NM
_005228.3
858
"c.2573_2574delinsGT,
EGFR_NM
_005228.3
858
"c.2573T>A,p.L858Q"
EGFR_NM
_005228.3
858
"c.2573T>G,p.L858R"
EGFR_NM
_005228.3
860
"c.2579A>T,p.K860I"
EGFR_NM
_005228.3
861
"c.2582T>A,p.L861Q"
EGFR_NM
_005228.3
861
"c.2582T>G,p.L861R"
EGFR normalised
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
KRAS_NM
_033360.2
12
"c.34G>A,p.G12S"
KRAS_NM
_033360.2
12
"c.34G>C,p.G12R"
KRAS_NM
_033360.2
12
"c.34G>T,p.G12C"
KRAS_NM
_033360.2
12
"c.35G>A,p.G12D"
KRAS_NM
_033360.2
12
"c.35G>C,p.G12A"
KRAS_NM
_033360.2
12
"c.35G>T,p.G12V"
KRAS_NM
_033360.2
13
"c.37G>A,p.G13S"
KRAS_NM
_033360.2
13
"c.37G>C,p.G13R"
KRAS_NM
_033360.2
13
"c.37G>T,p.G13C"
KRAS_NM
_033360.2
13
"c.38G>A,p.G13D"
KRAS_NM
_033360.2
13
"c.38G>C,p.G13A"
KRAS_NM
_033360.2
13
"c.38G>T,p.G13V"
KRAS normalised
Auto Upload Database of Results in LOVD
Local LOVD instances sharable via HVPA
• Coriell pedigree comparison
• Subset of 19 genes – targeted by all four assays
• Variant allele frequency cut-off of 35% (interested in germline
variants)
Results – detection of variants
Total number of variants
detected
Non-synonymous variants
detected # variants with GAF <5% # variants with African AF 5%
Y077
Mother
Y077
Father
Y077
Child
Y077
Mother
Y077
Father
Y077
Child
Y077
Mother
Y077
Father
Y077
Child
Y077
Mother
Y077
Father
Y077
Child
NimbleGen SeqCap EZ
Neuro
194 241 196 16 22 20 4 5 7 2 3 4
Nextera Neuro 250 296 283 17 23 22 4 6 7 2 3 4
TruSight One 121 137 119 16 23 20 3 6 6 1 3 3
Nextera Exome 101 118 114 16 22 22 4 5 7 2 2 4
Y117
Mother
Y117
Father
Y117
Child
Y117
Mother
Y117
Father
Y117
Child
Y117
Mother
Y117
Father
Y117
Child
Y117
Mother
Y117
Father
Y117
Child
NimbleGen SeqCap EZ
Neuro
279 245 263 20 20 20 4 5 6 3 2 4
Nextera Neuro 382 371 342 20 21 21 5 5 6 3 2 4
TruSight One 148 154 148 18 18 17 4 4 5 3 2 3
Nextera Exome 121 67 66 19 15 16 5 3 4 3 1 3
Example case showing concordance
Gene Variant Chr Coordinate zyg Gene Variant Chr Coordinate zyg KEY
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het exome
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het nimble neuro
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het next neuro
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het trusight 1
APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het
APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het
APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het
APOE C>C/T 19 45412040 het NPC1 TAA>TAA/T 18 21123536 het
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het
LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
Describing Coverage
% target region
with non-zero
depth
% target
regions >=
5x
% target
regions >=
15x
% target
regions >=
30x
% target
regions >=
50x
average
depth of
coverage
5th-centile 20th-centile50th-centile 95th-centile
MiSeq
(12plex)
99.54% 98.25% 94.80% 89.10% 81.56% 180.76 19.08 67.42 160.42 414.00
HiSeq
(48plex)
99.90% 99.71% 99.34% 98.85% 98.17% 920.84 126.75 408.83 871.17 1879.92
Mapping quality >= 15
Base quality score >= 15
Coverage reproducibility
Coverage Coefficient of variation
Higher coverage greater reproducibility
Coverage Coefficient of variation
Can capture coverage report dosage to
diagnostic standards?samples
targets
samples
autosomaltargetschrXtargets
Inter-sample
variation is low,
But low coverage
prevents dosage
estimation
Chr X is a good first pass test for dosage
XX vs. XY
8 Female cases and 16 Male cases showing reproducibility of coverage of X loci
within each group. Loci with higher SDs were associated with reduced coverage.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 10 20 30 40 50 60 70 80
Average XX
Average XY
-0.5
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60 70 80
AVGE XX
AVGE XY
870
160
Report
Sharing Experience with TruSight One
• In partnership with Illumina, RCPA and the HGSA
Kim Flintoff (Wellington Regional Genetics
Laboratory) is leading an evaluation of exon
sequencing using Illumina’s True Sight One
panel. Two Coriell family trios will be sequenced
by New Zealand Genomics Limited and the data
will be shared on a HVPA database
• The VCF file will be available on the HVPA LOVD
database and performance stats will also be
made available.
Next Steps
• Robust standards for genomic medicine
• Databases and data content
– Access to identified and de-identified data (consent
and confidentiality)
– Database accreditation process in prep with RCPA
– Defining the performance of various aligners, variant
callers and annotation programs
– Clinical grade Variant Call Format (VCF)
– Metafile covering data trail: what was tested, what
was not tested
Standards for Accreditation of DNA
Sequence Variation Databases
Quality Use of Pathology Program (QUPP), a national project for the Development of Standards for
Accreditation of DNA Sequence Variation Data Bases has been jointly initiated by the Royal College of
Pathologists of Australasia (RCPA), and the Human Variome Project (HVP).
Background
• There is a rapidly increasing volume, spectrum, and complexity of genetic tests emerging within
diagnostic pathology laboratories. In particular, high throughput sequencing methods such as
targeted panel, exome (WES), and whole genome sequencing (WGS), are producing an increasing
quantity of genetic data requiring analysis and interpretation, forming a substantial proportion of
the workload.
• Currently, there is a plethora of online mutation databases to refer to, however there is a distinct
lack of such databases that meet the stringent accuracy and reproducibility that the clinical
diagnostic environment demands. Additionally, The current databases are “Fractured”, with varied
access and sharing of the data within; and variable quality due to errors / inaccurate data posting,
all of which is a clear risk to the quality of patient care. With more widespread, secure sharing of
variants and associated phenotypes, the value of cumulative variant information will accelerate the
delivery of accurate, actionable, and efficient clinical reports.
• There are currently no standards or equivalent mechanisms for accreditation of databases to
ensure the accuracy and quality of uploaded data into any central repository to meet the needs of
the clinical diagnostics environment.
Data quality classes
Differentiate between three classes of data:
The Clinically Reported data label would denote the class of data that the HVP
Australian Node was originally designed to collect and share: data that has been
generated in a NATA accredited Australian diagnostic laboratory and is able to be
included in a clinical report.
Unreported Clinical quality data would denote data that has been generated in a
NATA accredited diagnostic laboratory, but is not capable of being included in a
clinical report. This class would comprise, primarily, of next-generation
sequencing (NGS) type data.
Unaccredited data would be used to denote data that has been generated by an
Australian laboratory that has not been NATA accredited
A new filtering option would be made available to allow users to view only data
of a certain class
Beyond the NeCTAR funding
• Academic or charitable funding required
• Integrate NGS data resource into the HVPA
portfolio
• Move database development into a medical
academic centre of excellence
• Seek active partnerships with current and
future collaborators with investment and risk
sharing

Weitere ähnliche Inhalte

Was ist angesagt?

CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
CINECAProject
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
GenomeInABottle
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
CINECAProject
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 

Was ist angesagt? (20)

Dalton presentation
Dalton presentationDalton presentation
Dalton presentation
 
Dalton
DaltonDalton
Dalton
 
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
Next Generation Companion Diagnostics; Adoption, Drivers, and Moderators of N...
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized Medicine
 
NCI HTAN, cancer trajectories, precision oncology
NCI HTAN, cancer trajectories, precision oncologyNCI HTAN, cancer trajectories, precision oncology
NCI HTAN, cancer trajectories, precision oncology
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
 
Update on the Adoption and Utilization of Emerging Precision Medicine Biomark...
Update on the Adoption and Utilization of Emerging Precision Medicine Biomark...Update on the Adoption and Utilization of Emerging Precision Medicine Biomark...
Update on the Adoption and Utilization of Emerging Precision Medicine Biomark...
 
Crowds Cure Canver: Annotating Data from The Cancer Imaging Archive
Crowds Cure Canver: Annotating Data from The Cancer Imaging ArchiveCrowds Cure Canver: Annotating Data from The Cancer Imaging Archive
Crowds Cure Canver: Annotating Data from The Cancer Imaging Archive
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
 
Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...
 
Next-Generation Immuno-Oncology Biomarkers: Insights for Developing Companion...
Next-Generation Immuno-Oncology Biomarkers: Insights for Developing Companion...Next-Generation Immuno-Oncology Biomarkers: Insights for Developing Companion...
Next-Generation Immuno-Oncology Biomarkers: Insights for Developing Companion...
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Big Data and Genomic Medicine by Corey Nislow
Big Data and Genomic Medicine by Corey NislowBig Data and Genomic Medicine by Corey Nislow
Big Data and Genomic Medicine by Corey Nislow
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
 
Pathema: A Clade Specific Bioinformatics Resource Center
Pathema: A Clade Specific Bioinformatics Resource CenterPathema: A Clade Specific Bioinformatics Resource Center
Pathema: A Clade Specific Bioinformatics Resource Center
 

Andere mochten auch

Haemoglobinopathies
HaemoglobinopathiesHaemoglobinopathies
Haemoglobinopathies
Dang Thanh Tuan
 

Andere mochten auch (15)

Report from the International Scientific Advisory Committee - John Burn
Report from the International Scientific Advisory Committee - John BurnReport from the International Scientific Advisory Committee - John Burn
Report from the International Scientific Advisory Committee - John Burn
 
Potential impact of conflict on Haemoglobinopathies - Carsten W. Lederer
Potential impact of conflict on Haemoglobinopathies - Carsten W. LedererPotential impact of conflict on Haemoglobinopathies - Carsten W. Lederer
Potential impact of conflict on Haemoglobinopathies - Carsten W. Lederer
 
LOVD & ITHANET: Synergies and plans under GG2020 - Carsten W Lederer
LOVD & ITHANET: Synergies and plans under GG2020 - Carsten W LedererLOVD & ITHANET: Synergies and plans under GG2020 - Carsten W Lederer
LOVD & ITHANET: Synergies and plans under GG2020 - Carsten W Lederer
 
TIF Collaborations: Common goals between the Thalassaemina International Fede...
TIF Collaborations: Common goals between the Thalassaemina International Fede...TIF Collaborations: Common goals between the Thalassaemina International Fede...
TIF Collaborations: Common goals between the Thalassaemina International Fede...
 
The InSiGHT-Human Variome Project Collaboration - Finlay Macrae
The InSiGHT-Human Variome Project Collaboration - Finlay MacraeThe InSiGHT-Human Variome Project Collaboration - Finlay Macrae
The InSiGHT-Human Variome Project Collaboration - Finlay Macrae
 
Richard GH Cotton: He may have been a bit before his time - Michael Watson
Richard GH Cotton: He may have been a bit before his time - Michael WatsonRichard GH Cotton: He may have been a bit before his time - Michael Watson
Richard GH Cotton: He may have been a bit before his time - Michael Watson
 
Global Globin 2020 Challenge: Ethical, Legal and Social Issues - Helen Robinson
Global Globin 2020 Challenge: Ethical, Legal and Social Issues - Helen RobinsonGlobal Globin 2020 Challenge: Ethical, Legal and Social Issues - Helen Robinson
Global Globin 2020 Challenge: Ethical, Legal and Social Issues - Helen Robinson
 
ITHANET - Information and Database Portal for the Thalassaemias and other Hae...
ITHANET - Information and Database Portal for the Thalassaemias and other Hae...ITHANET - Information and Database Portal for the Thalassaemias and other Hae...
ITHANET - Information and Database Portal for the Thalassaemias and other Hae...
 
Project Roadmap 2016-2020 - Chris Arnold
Project Roadmap 2016-2020 - Chris ArnoldProject Roadmap 2016-2020 - Chris Arnold
Project Roadmap 2016-2020 - Chris Arnold
 
GG2020 Developing capacity for variant data sharing in low- and middle-income...
GG2020 Developing capacity for variant data sharing in low- and middle-income...GG2020 Developing capacity for variant data sharing in low- and middle-income...
GG2020 Developing capacity for variant data sharing in low- and middle-income...
 
HVP Country Node: Venezuela - Aida Falcon de Vargas
HVP Country Node: Venezuela - Aida Falcon de VargasHVP Country Node: Venezuela - Aida Falcon de Vargas
HVP Country Node: Venezuela - Aida Falcon de Vargas
 
A process for accreditation of HVP Country Nodes
A process for accreditation of HVP Country NodesA process for accreditation of HVP Country Nodes
A process for accreditation of HVP Country Nodes
 
The BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
The BRCA Challenge & Exchange: Progress and Plans - Gunnar RätschThe BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
The BRCA Challenge & Exchange: Progress and Plans - Gunnar Rätsch
 
Project Roadmap 2012-2016
Project Roadmap 2012-2016Project Roadmap 2012-2016
Project Roadmap 2012-2016
 
Haemoglobinopathies
HaemoglobinopathiesHaemoglobinopathies
Haemoglobinopathies
 

Ähnlich wie The Human Variome Database in Australia in 2014 - Graham Taylor

ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
James Warren
 
Utilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group setting
BIT002
 

Ähnlich wie The Human Variome Database in Australia in 2014 - Graham Taylor (20)

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
A Vision for a Cancer Research Knowledge System
A Vision for a Cancer Research Knowledge SystemA Vision for a Cancer Research Knowledge System
A Vision for a Cancer Research Knowledge System
 
DisGeNET: a discovery platform to support translational research and drug dis...
DisGeNET: a discovery platform to support translational research and drug dis...DisGeNET: a discovery platform to support translational research and drug dis...
DisGeNET: a discovery platform to support translational research and drug dis...
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science Workshop
 
Building a National Data Infrastructure to Advance Patient-Centered Comparati...
Building a National Data Infrastructure to Advance Patient-Centered Comparati...Building a National Data Infrastructure to Advance Patient-Centered Comparati...
Building a National Data Infrastructure to Advance Patient-Centered Comparati...
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
 
Professor Dipak Kalra Digital Health Assembly 2015
Professor Dipak Kalra Digital Health Assembly 2015Professor Dipak Kalra Digital Health Assembly 2015
Professor Dipak Kalra Digital Health Assembly 2015
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
FROM KAMPALA TO CAPE TOWN
 FROM KAMPALA TO CAPE TOWN FROM KAMPALA TO CAPE TOWN
FROM KAMPALA TO CAPE TOWN
 
Utilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group setting
 

Mehr von Human Variome Project

ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
Human Variome Project
 
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Legal and regulatory challenges to  data sharing for clinical genetics and ge...Legal and regulatory challenges to  data sharing for clinical genetics and ge...
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Human Variome Project
 
Human variome project quality assessment criteria for variation databases - M...
Human variome project quality assessment criteria for variation databases - M...Human variome project quality assessment criteria for variation databases - M...
Human variome project quality assessment criteria for variation databases - M...
Human Variome Project
 
GENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtig
GENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtigGENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtig
GENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtig
Human Variome Project
 
Use of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha KnoppersUse of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Human Variome Project
 
Checking the experts: compliance with author instructions regarding HGVS nome...
Checking the experts: compliance with author instructions regarding HGVS nome...Checking the experts: compliance with author instructions regarding HGVS nome...
Checking the experts: compliance with author instructions regarding HGVS nome...
Human Variome Project
 
Pathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary Ekong
Pathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary EkongPathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary Ekong
Pathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary Ekong
Human Variome Project
 

Mehr von Human Variome Project (20)

ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
 
The BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe BeroudThe BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe Beroud
 
Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...
 
The PhenX Toolkit: Standard Measures for Collaborative Research - Wayne Huggins
The PhenX Toolkit: Standard Measures for  Collaborative Research - Wayne HugginsThe PhenX Toolkit: Standard Measures for  Collaborative Research - Wayne Huggins
The PhenX Toolkit: Standard Measures for Collaborative Research - Wayne Huggins
 
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Legal and regulatory challenges to  data sharing for clinical genetics and ge...Legal and regulatory challenges to  data sharing for clinical genetics and ge...
Legal and regulatory challenges to data sharing for clinical genetics and ge...
 
Report from the International Confederation of Countries Advisory Council - M...
Report from the International Confederation of Countries Advisory Council - M...Report from the International Confederation of Countries Advisory Council - M...
Report from the International Confederation of Countries Advisory Council - M...
 
Human variome project quality assessment criteria for variation databases - M...
Human variome project quality assessment criteria for variation databases - M...Human variome project quality assessment criteria for variation databases - M...
Human variome project quality assessment criteria for variation databases - M...
 
Human Genetics of Infectious Diseases - Laurent Abel
Human Genetics of Infectious Diseases - Laurent AbelHuman Genetics of Infectious Diseases - Laurent Abel
Human Genetics of Infectious Diseases - Laurent Abel
 
HVP Country Node: Malaysia - Zilfalil bin Alwi
HVP Country Node: Malaysia - Zilfalil bin AlwiHVP Country Node: Malaysia - Zilfalil bin Alwi
HVP Country Node: Malaysia - Zilfalil bin Alwi
 
GENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtig
GENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtigGENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtig
GENETIC HETEROGENEITY OF MITOCHONDRIAL DISORDERS - Agnès RÜtig
 
Professor Richard Cotton - Finlay Macrae
Professor Richard Cotton - Finlay MacraeProfessor Richard Cotton - Finlay Macrae
Professor Richard Cotton - Finlay Macrae
 
HVP Country Node: Canada - Matthew Lebo
HVP Country Node: Canada - Matthew LeboHVP Country Node: Canada - Matthew Lebo
HVP Country Node: Canada - Matthew Lebo
 
Use of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha KnoppersUse of open, curated variant databases: ethics? Liability? - Bartha Knoppers
Use of open, curated variant databases: ethics? Liability? - Bartha Knoppers
 
HVP6: Final Thoughts - John Burn & Raj Ramesar
HVP6: Final Thoughts - John Burn & Raj RamesarHVP6: Final Thoughts - John Burn & Raj Ramesar
HVP6: Final Thoughts - John Burn & Raj Ramesar
 
HVP Country Node: Italy - Domenico Coviello
HVP Country Node: Italy - Domenico CovielloHVP Country Node: Italy - Domenico Coviello
HVP Country Node: Italy - Domenico Coviello
 
Rare and common variants contribute to the complex inheritance of Hirschsprun...
Rare and common variants contribute to the complex inheritance of Hirschsprun...Rare and common variants contribute to the complex inheritance of Hirschsprun...
Rare and common variants contribute to the complex inheritance of Hirschsprun...
 
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...
Report from the  Gene & Disease Specific Database Advisory Council - Peter Ta...Report from the  Gene & Disease Specific Database Advisory Council - Peter Ta...
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...
 
Checking the experts: compliance with author instructions regarding HGVS nome...
Checking the experts: compliance with author instructions regarding HGVS nome...Checking the experts: compliance with author instructions regarding HGVS nome...
Checking the experts: compliance with author instructions regarding HGVS nome...
 
Pathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary Ekong
Pathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary EkongPathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary Ekong
Pathogenicity Decision Pathway in Tuberous Sclerosis Complex - Rosemary Ekong
 
HVP Country Node: The Netherlands - Marielle van Gijn
HVP Country Node: The Netherlands - Marielle van GijnHVP Country Node: The Netherlands - Marielle van Gijn
HVP Country Node: The Netherlands - Marielle van Gijn
 

KĂźrzlich hochgeladen

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
levieagacer
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 

KĂźrzlich hochgeladen (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 

The Human Variome Database in Australia in 2014 - Graham Taylor

  • 1.
  • 2. Acknowledgments Genomic Medicine & Translational Pathology, University of Melbourne: Arthur Lian Chi Hsu, Renate Marquis-Nicholson, Sebastian Lunke, Clare Love, Kym Pham, Olga Kondrashova, Matt Wakefield, Tiffany Cowie, Barney Rudzki and Paul Waring Human Variome Project Tim Smith, Alan Lo, Melvyn Leong, David Perkins, Heather Howard, Rania Horaitis Dick Cotton BioGrid Maureen Turner, Leon Heffer Royal College of Pathologists of Australasia Vanessa Tyrrell Peter MaCallum Cancer Centre Ken Doig, Andrew Fellowes Victorian Clinical Genetics Service John-Paul Plazzer, Desiree Du Sart
  • 3. Human Variome Project (Australasia) • The bigger picture • Infrastructure and search interface • Linkage to other datasets • Panel, exome and genome testing • Database accreditation • Next steps
  • 4. The big picture • Rediscovery at the genomics community level that data sharing is win-win • The Genomic Alliance, HGVS, HUGO – Data standards – Nomenclature – Infrastructure
  • 5. Nature (Perspective) 508 469-475 2014 Guidelines for investigating causality of sequence variants in human disease D. G. MacArthur, T. A. Manolio, D. P. Dimmock, H. L. Rehm, J. Shendure, G. R. Abecasis, D. R. Adams, R. B. Altman, S. E. Antonarakis, E. A. Ashley, J. C. Barrett, L. G. Biesecker, D. F. Conrad, G. M. Cooper, N. J. Cox, M. J. Daly, M. B. Gerstein, D. B. Goldstein, J. N. Hirschhorn, S. M. Leal, L. A. Pennacchio, J. A. Stamatoyannopoulos, S. R. Sunyaev, D. Valle, B. F. Voight, W. Winckler & C. Gunter. Priorities for research and infrastructure development 1. Improved public databases of human genetic variants incorporating explicit, up-to-date supporting evidence for variant implication in disease and audit trails recording changes in interpretation. 2. Improved incentives, and ethical and logistical solutions, for sharing of genetic and phenotypic data from both research and clinical diagnostic laboratories. 3. Public databases of variant and allele frequency data from large sets of population reference samples from a wide range of ancestries. 4. Large-scale genotyping of reported human disease-causing variants in large, well-phenotyped population cohorts, reducing biases in the assessment of the associated penetrance and phenotypic heterogeneity. 5. Development and benchmarking of standardized, quantitative statistical approaches for objectively assigning probability of causation to new candidate disease genes and variants. DĂŠjĂ  vu all over again?
  • 6. Nature Genetics 46, 107–115 (2014) Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database Bryony A Thompson, Amanda B Spurdle, John-Paul Plazzer, Marc S Greenblatt, Kiwamu Akagi, Fahd Al-Mulla, Bharati Bapat, Inge Bernstein, Gabriel Capellá, Johan T den Dunnen, Desiree du Sart, Aurelie Fabre, Michael P Farrell, Susan M Farrington, Ian M Frayling, Thierry Frebourg, David E Goldgar, Christopher D Heinen, Elke Holinski-Feder, Maija Kohonen-Corish, Kristina Lagerstedt Robinson, Suet Yi Leung, Alexandra Martins, Pal Moller, Monika Morak, Minna Nystrom, Paivi Peltomaki, Marta Pineda, Ming Qi, Rajkumar Ramesar, Lene Juel Rasmussen, Brigitte Royer-Pokora, Rodney J Scott, Rolf Sijmons, Sean V Tavtigian, Carli M Tops, Thomas Weber, Juul Wijnen, Michael O Woods, Finlay Macrae & Maurizio Genuardi, on behalf of InSiGHT. Nature Genetics 46, 107–115 (2014) 1. Leiden Open Variation Database (LOVD) 2. Micro- attribution using Open Researcher & Contributor Identification (ORCID) 3. Variant Interpretation Committee (VIC) apply a 5-tiered scheme developed by the International Agency for Research on Cancer (IARC) classification system 4. Endorsed by the Human Variome Project (HVP)
  • 7. Not everything in the Nature portfolio is gold It is good to supplement your pocket money
  • 8. Early nomenclature papers • Beaudet • Tsui • Antanorakis
  • 9. Translation into diagnostic practice • 15 years ago Cotton predicted that the majority of human genetic variants will be detected in a diagnostic context • As NGS moves into a service setting this transition will become even clearer • Genetic variants will become part of a patient’s medical record
  • 10. HVPA database • Primarily for and of diagnostics • Diagnostic services are busy • And cash and time limited • We have to make it easy for them • And secure • And useful • Maybe even essential
  • 11. HVPA Objective A national data sharing facility for improving clinical genetic testing services and supporting medical research Constitutional, not somatic, mutations NECTAR project grant UoM FE31082 “Clinical and Molecular Data Linkage Tools”, completion date 30th June 2014
  • 12. Infrastructure and search interface • Data repository (“the database”) • Data handling tools that support data upload from laboratories • Portal though which the database can be browsed • Website for news and notifications
  • 13. Human Variome Project Australian Node What We’ve Done • NeAT Funding (2010-2011) – Pilot Phase – 4 labs, 3 diseases • Breast Cancer • Colon Cancer • Huntington’s – Portal Launched April 2011 – Molecular Data Only – Collaboration with Mawson • NeCTAR Funding (2012-2014) – 12 more labs + all genes they test for – Configuration Tool – Clinical Data/Phenotype Linkage – Transfer data internationally What We Built • Collection Tool • Portal • Data Model • Ethics Processes • Access & Usage Policy • Data Sharing Agreements
  • 14. How it works • Software to interface with existing LIMS (or lack thereof) • Collection occurs after report has been issued • Data types: – All classified variants reported by a lab – Benign variants – NGS/Incidental findings – Not collecting negative results • Secure data link between lab and Node • (Semi)-automatic transfer of data • Portal to allow interrogation of all Australian data – http://www.hvpaustralia.org.au • Linkage key generator • Submission to BioGrid Platform
  • 15. Open-Source Solutions • HVP Portal (v1.0, r512) - A web application which features the basic interface for browsing and querying a HVP node. – Open source – MIT License – Python/django • HVP Exporter (v1.0, r512) - Basic HVP exporting tool for laboratories. Features simple GUI and error checking interface, plug-in architecture for customisation between sites and common libraries for working with MS Access and MS Excel data sources – Open source – MIT License – .NET C#, python/ironpython • HVP Importer (v1.0, r512) - A series of tools and web services that receive, decrypt and process information by submitting laboratories using the standard transaction XML format – Open source – MIT License – python
  • 16. Access to HVPA • Controlled Access – Diagnostic Lab Staff – Registered Medical Practitioners – Board Certified Genetic Counsellors • Online application
  • 17. HVPA Status at November 2013 Strengths 1. Database available on demand for diagnostic labs 2. Tools for data sharing 3. Community engagement with RCPA (QUUP), SA/Mawson, BioGrid, VCGS 4. National reach with international connections via HVPI, WHO & UNESCO Weaknesses 1. Performance of the existing HVPA database is limited 2. Laboratory buy-in to the database across Australia is limited 3. The database itself has been hard to access because of low server bandwidth 4. The project has not anticipated the likely impact of next generation sequencing and risks missing inclusion in genomic- scale initiatives now underway.
  • 18. HVPA 24th March 2014 • 5 laboratories submitting • 295 Unique Variants • 27410 Instances • 25 Registered users
  • 19. Developments proposed in November ID Area Idea Priority 1 B. Presentation Statistics of number of variants for that gene as table or bar graph (# unique, # instances, top 5 qty submitted) 1 15 D. Feedback Raise a concern about an instance's interpretation 1 2 A. Search Search by range 2 3 A. Search Search by genomic position 2 4 A. Search Filter by pathogenicity 2 5 B. Presentation Sort by ... (pathogenicity, other fields) 2 6 C. Relevant Info Display links to related database for gene by referencing genenames.org 2 7 A. Search Wildcard search of variants 2 9 A. Search Search by disease which shows multiple genes and variant results 2 10 E. NGS VCF data imports into HVP Australia 2 13 B. Presentation VarVis - visualisation of gene and variants reported 2 11 B. Presentation VCF data export from HVP Australia of a set of results 3 12 B. Presentation At instance level - see other variants from this test/patient 3 14 C. Relevant Info Capture & display SIFT score 3 16 D. Feedback Notify labs the general concensus of pathogencity of something they submitted has changed/updated. i.e They submitted benign and its now likely pathogenic or submited unknown and know its something else 3 17 B. Presentation Integration with EBI/NCBI tools for queries and displays 3 19 B. Presentation Display last date uploaded for this variant (or last 10 dates) 3
  • 20. Accessing the test database http://115.146.85.61/ Username: lab_tester Password: hvpaustralia2013
  • 21. Search Interface • The search interface has to provide useful tools for clinicians and lab scientists so that the HPVA project offers them direct benefits and incentivises them to participate. Following a request for feedback from users, a series of improvements were implemented, initially on a demonstration server and then on the live server following review by the Steering Committee. The highest priorities were for more information about numbers of times particular variants were recorded, the ability to search by range and to filter by pathogenicity. There was also interest in enabling direct uploading of VCF files and the automated calculation of pathogenicity scores. Many of these features are now implemented and examples will be presented.
  • 22. Purpose of the HVPA Database • Working database – Record and share diagnostic quality data genetic variation data – Integrate with clinical phenotype data – Integrate with international efforts – Heads up for NGS gene panel data sets • Test database – Showcase enhancements – Real world testing and feedback – Uses data edited from actual database – Not accurate or reliable: some parameters edited for test purposes
  • 23. Major improvements to search facility
  • 24. Searching by expression match BRCA BR
  • 25. Instances of a variant
  • 27. Direct Import from Results Lists • Can recover historical data sets • Reformat on the fly • Useful as low-overhead catch up to enable labs to transition to using uplaoding toals as their IT permits – PathWest (John Bielby) – Institute of Health and Biomedical Innovation, Queensland (Lyn Griffiths) – Kconfab (Heather Thorne) – Peter MaCallum Cancer Centre (Ken Doig)
  • 28. Variant Fields Mandatory GeneName RefSeqName RefSeqVer cDNA mRNA Genomic Protein Location Official HGNC Symbol Name of reference sequence (NCBI's RefSeq project) Version of reference sequence (RefSeq) HGVS variant name (c.) HGVS variant name (m.) HGVS variant name (g.) HGVS variant name (g.) Exon or intron number VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255) Mandatory Mandatory Mandatory At least one required Pathogenicity PatientID TestID InstanceDate GenomicRefSeq GenomicRefSeqVer Level of pathogenicity (1=Pathogenic, 2=Possibly Pathogenic, 3=Unknown, 4=Possible benign, 5=Certainly Benign) Internal ID for the patient used within the lab Internal ID for the test used within the lab Date instance was tested Genomic reference sequence Genomic reference sequence version VARCHAR(20) DateTime VARCHAR(255) VARCHAR(255) Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory
  • 29. Variant Fields (Optional) PatientAge TestMethod SampleTissue SampleSource Justification Age of patient when test was taken The name of the test method used Type of sample taken The source of the sample e.g.: DNA, g.DNA, RNA... Justification by medical scientist INT32 VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(65535) Optional Optional Optional Optional Optional PubMed RecordedInDatabase SampleStored VariantSegregatesWi thDisease HistologyStored PedigreeA vailable SIFTScore PubMed Identifier/Data Object Identifier Whether it is recorded in disease specific or gene specific Whether lab still has sample left Whether pedigreee was consideed during diagnosis of pathogenicity Whether histograms are stored Whether organisati on has pedigree data Calculated SIFT Score VARCHAR(255) Boolean Boolean Boolean Boolean Boolean INT32 Optional Optional Optional Optional Optional Optional Optional
  • 30. Linkage to other datasets • HVPA have implemented the hash key algorithm and work is in progress with BioGrid to link variation data to clinical data sets. • More details from Maureen Turner, BioGrid CEO who is speaking at this meeting
  • 31. Cost and performance will force diagnostic labs to adopt NGS as front-line approach cost per base Illumina share price Hype cycle
  • 32. HVPA LOVD3 database pilot • Established an HVPA LOVD3 database and working with the Human Genetics Society of Australasia on a pilot study to sequence the exomes of two trios and review the data using this database. • Includes exome-scale data • Open access to Coriell cases with no “consent” issues • Explore staging of variant “credibility classification” and access
  • 33. Relationship to Gene Panel Databases? e.g. http://genomics.bio21.unimelb.edu.au/lovd/
  • 35. • Clinically led, rather than technology driven • Fostering ‘end use’ of genomic data • Common clinical repository • Prospective : first tier test • Evaluation to inform implementation • Engineering collaboration • Fostering system change • A/Prof Clara Gaff: Program Leader PARADIGM FOR IMPLEMENTING GENOMIC MEDICINE 35 Melbourne Genomics Health Alliance
  • 37. How many variants per exome? SNP count Study 20,000 Choi et al. PNAS 2009 142,000 Mullikin NIH, unpublished 2010 50,000 Clark et al. Nature biotechnology 2011 125,000 Smith et al. Genome Biology 2011 100,000 Johnston & Biesecker Human Molecular Genetics 2013 200,000 to 400,000 Yang et al.N Engl J Med 2013 • 20-fold range • Exome designs vary • Likely to be higher variant count in African populations as the reference sequence is non-African
  • 38. Low concordance of multiple variant-calling pipelines Rawe et al Genomic Medicine 2013 • 15 exomes • 4 families • HiSeq 2000 • Agilent SureSelect v.2 • ~120X mean coverage • SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA- SAMTools • SNV concordance between five Illumina pipelines across all 15 exomes was 57.4% • 0.5-5.1% variants were called as unique to each pipeline • Indel concordance was only 26.8% between three indel calling pipelines • 11% of CG variants that fall within targeted regions in exome sequencing were not called by any of the Illumina-based exome analysis pipelines • 97.1%, 60.2% and 99.1% of the GATK-only, SOAP-only and shared SNVs can be validated • 54.0%, 44.6% and 78.1% of the GATK-only, SOAP-only and shared indels can be validated • Additional accuracy gained in variant discovery by having access to genetic data from a multi- generational family
  • 39. Low concordance of multiple variant-calling pipelines O’Rawe et al. Genome Medicine 2013, 5:28 SNV concordance: 57.4% Indel concordance 26.8%
  • 40. Venn diagrams of selected CNV detection methods in real data processing Duan J, Zhang J-G, Deng H-W, Wang Y-P (2013) Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing Technologies. PLoS ONE 8(3): e59128. doi:10.1371/journal.pone.0059128 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059128
  • 43. Remove errors before processing K-mer selection Merging'forward'and'reverse'reads' 0 200 400 600 800 1000 1200 1400 1600 CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTTTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGTATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGTA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATTAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA TAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATCTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAAAGTAGGAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA CAGAAAGAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA Discard rare reads Use a HiFi polymerase
  • 45. • Known SNV concordance 100%, all assays • Known indel <6bp concordance 100%, all assays • Not able to detect c9orf72 hexanucleotide expansion or PRNP octapeptide region repeat with standard pipeline • Diagnostic yield within appropriate clinical context (based on very limited sample size) - NimbleGen SeqCap EZ Neuro: 33% (2/6) - Nextera Neuro: 23% (6/26) Results – detection of variants
  • 46. Filtering Variants All variants None Qual Not in Blood Blood 9828 8551 NA Frozen 9920 8736 126 FFPE 9709 8163 199 Variants in Gene List None Qual Not in Blood Blood 27 18 NA Frozen 27 23 2 (EGFR) FFPE 25 19 3 (EGFR, ROS)
  • 49. Confirmation by PCR 0.0 50.0 100.0 150.0 200.0 250.0 EGFR_NM _005228.3 T790 T790 W T EGFR_NM _005228.3 784 "c.2350T>C,p.S784P" EGFR_NM _005228.3 784 "c.2351C>T,p.S784F" EGFR_NM _005228.3 785 "c.2354C>T,p.T785I" EGFR_NM _005228.3 786 "c.2356G>A,p.V786M " EGFR_NM _005228.3 790 "c.2368A>G,p.T790A" EGFR_NM _005228.3 790 "c.2369C>T,p.T790M " EGFR_NM _005228.3 828 & 861 "828 & 861,w t" EGFR_NM _005228.3 858 "c.2572C>A,p.L858M " EGFR_NM _005228.3 858 "c.2573_2574delinsGT, EGFR_NM _005228.3 858 "c.2573T>A,p.L858Q" EGFR_NM _005228.3 858 "c.2573T>G,p.L858R" EGFR_NM _005228.3 860 "c.2579A>T,p.K860I" EGFR_NM _005228.3 861 "c.2582T>A,p.L861Q" EGFR_NM _005228.3 861 "c.2582T>G,p.L861R" EGFR normalised 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 KRAS_NM _033360.2 12 "c.34G>A,p.G12S" KRAS_NM _033360.2 12 "c.34G>C,p.G12R" KRAS_NM _033360.2 12 "c.34G>T,p.G12C" KRAS_NM _033360.2 12 "c.35G>A,p.G12D" KRAS_NM _033360.2 12 "c.35G>C,p.G12A" KRAS_NM _033360.2 12 "c.35G>T,p.G12V" KRAS_NM _033360.2 13 "c.37G>A,p.G13S" KRAS_NM _033360.2 13 "c.37G>C,p.G13R" KRAS_NM _033360.2 13 "c.37G>T,p.G13C" KRAS_NM _033360.2 13 "c.38G>A,p.G13D" KRAS_NM _033360.2 13 "c.38G>C,p.G13A" KRAS_NM _033360.2 13 "c.38G>T,p.G13V" KRAS normalised
  • 50. Auto Upload Database of Results in LOVD Local LOVD instances sharable via HVPA
  • 51. • Coriell pedigree comparison • Subset of 19 genes – targeted by all four assays • Variant allele frequency cut-off of 35% (interested in germline variants) Results – detection of variants Total number of variants detected Non-synonymous variants detected # variants with GAF <5% # variants with African AF 5% Y077 Mother Y077 Father Y077 Child Y077 Mother Y077 Father Y077 Child Y077 Mother Y077 Father Y077 Child Y077 Mother Y077 Father Y077 Child NimbleGen SeqCap EZ Neuro 194 241 196 16 22 20 4 5 7 2 3 4 Nextera Neuro 250 296 283 17 23 22 4 6 7 2 3 4 TruSight One 121 137 119 16 23 20 3 6 6 1 3 3 Nextera Exome 101 118 114 16 22 22 4 5 7 2 2 4 Y117 Mother Y117 Father Y117 Child Y117 Mother Y117 Father Y117 Child Y117 Mother Y117 Father Y117 Child Y117 Mother Y117 Father Y117 Child NimbleGen SeqCap EZ Neuro 279 245 263 20 20 20 4 5 6 3 2 4 Nextera Neuro 382 371 342 20 21 21 5 5 6 3 2 4 TruSight One 148 154 148 18 18 17 4 4 5 3 2 3 Nextera Exome 121 67 66 19 15 16 5 3 4 3 1 3
  • 52. Example case showing concordance Gene Variant Chr Coordinate zyg Gene Variant Chr Coordinate zyg KEY APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het exome APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het nimble neuro APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het next neuro APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het trusight 1 APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het APOE C>C/T 19 45412040 het NPC1 TAA>TAA/T 18 21123536 het ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20975727 het LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het LRRK2 C>C/G 12 40657700 het PSEN2 G>G/A 1 227071449 het LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het
  • 53. Describing Coverage % target region with non-zero depth % target regions >= 5x % target regions >= 15x % target regions >= 30x % target regions >= 50x average depth of coverage 5th-centile 20th-centile50th-centile 95th-centile MiSeq (12plex) 99.54% 98.25% 94.80% 89.10% 81.56% 180.76 19.08 67.42 160.42 414.00 HiSeq (48plex) 99.90% 99.71% 99.34% 98.85% 98.17% 920.84 126.75 408.83 871.17 1879.92 Mapping quality >= 15 Base quality score >= 15
  • 55. Higher coverage greater reproducibility Coverage Coefficient of variation
  • 56. Can capture coverage report dosage to diagnostic standards?samples targets samples autosomaltargetschrXtargets Inter-sample variation is low, But low coverage prevents dosage estimation Chr X is a good first pass test for dosage
  • 57. XX vs. XY 8 Female cases and 16 Male cases showing reproducibility of coverage of X loci within each group. Loci with higher SDs were associated with reduced coverage. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 10 20 30 40 50 60 70 80 Average XX Average XY -0.5 0 0.5 1 1.5 2 2.5 3 0 10 20 30 40 50 60 70 80 AVGE XX AVGE XY 870 160
  • 59. Sharing Experience with TruSight One • In partnership with Illumina, RCPA and the HGSA Kim Flintoff (Wellington Regional Genetics Laboratory) is leading an evaluation of exon sequencing using Illumina’s True Sight One panel. Two Coriell family trios will be sequenced by New Zealand Genomics Limited and the data will be shared on a HVPA database • The VCF file will be available on the HVPA LOVD database and performance stats will also be made available.
  • 60. Next Steps • Robust standards for genomic medicine • Databases and data content – Access to identified and de-identified data (consent and confidentiality) – Database accreditation process in prep with RCPA – Defining the performance of various aligners, variant callers and annotation programs – Clinical grade Variant Call Format (VCF) – Metafile covering data trail: what was tested, what was not tested
  • 61. Standards for Accreditation of DNA Sequence Variation Databases Quality Use of Pathology Program (QUPP), a national project for the Development of Standards for Accreditation of DNA Sequence Variation Data Bases has been jointly initiated by the Royal College of Pathologists of Australasia (RCPA), and the Human Variome Project (HVP). Background • There is a rapidly increasing volume, spectrum, and complexity of genetic tests emerging within diagnostic pathology laboratories. In particular, high throughput sequencing methods such as targeted panel, exome (WES), and whole genome sequencing (WGS), are producing an increasing quantity of genetic data requiring analysis and interpretation, forming a substantial proportion of the workload. • Currently, there is a plethora of online mutation databases to refer to, however there is a distinct lack of such databases that meet the stringent accuracy and reproducibility that the clinical diagnostic environment demands. Additionally, The current databases are “Fractured”, with varied access and sharing of the data within; and variable quality due to errors / inaccurate data posting, all of which is a clear risk to the quality of patient care. With more widespread, secure sharing of variants and associated phenotypes, the value of cumulative variant information will accelerate the delivery of accurate, actionable, and efficient clinical reports. • There are currently no standards or equivalent mechanisms for accreditation of databases to ensure the accuracy and quality of uploaded data into any central repository to meet the needs of the clinical diagnostics environment.
  • 62. Data quality classes Differentiate between three classes of data: The Clinically Reported data label would denote the class of data that the HVP Australian Node was originally designed to collect and share: data that has been generated in a NATA accredited Australian diagnostic laboratory and is able to be included in a clinical report. Unreported Clinical quality data would denote data that has been generated in a NATA accredited diagnostic laboratory, but is not capable of being included in a clinical report. This class would comprise, primarily, of next-generation sequencing (NGS) type data. Unaccredited data would be used to denote data that has been generated by an Australian laboratory that has not been NATA accredited A new filtering option would be made available to allow users to view only data of a certain class
  • 63. Beyond the NeCTAR funding • Academic or charitable funding required • Integrate NGS data resource into the HVPA portfolio • Move database development into a medical academic centre of excellence • Seek active partnerships with current and future collaborators with investment and risk sharing