The Human Variome Database in Australia in 2014 - Graham Taylor

Acknowledgments
Genomic Medicine & Translational Pathology, University of Melbourne:
Arthur Lian Chi Hsu, Renate Marquis-Nicholson, Sebastian Lunke, Clare Love, Kym Pham,
Olga Kondrashova, Matt Wakefield, Tiffany Cowie, Barney Rudzki and Paul Waring
Human Variome Project
Tim Smith, Alan Lo, Melvyn Leong, David Perkins, Heather Howard, Rania Horaitis
Dick Cotton
BioGrid
Maureen Turner, Leon Heffer
Royal College of Pathologists of Australasia
Vanessa Tyrrell
Peter MaCallum Cancer Centre
Ken Doig, Andrew Fellowes
Victorian Clinical Genetics Service
John-Paul Plazzer, Desiree Du Sart

Human Variome Project (Australasia)
• The bigger picture
• Infrastructure and search interface
• Linkage to other datasets
• Panel, exome and genome testing
• Database accreditation
• Next steps

The big picture
• Rediscovery at the genomics community level
that data sharing is win-win
• The Genomic Alliance, HGVS, HUGO
– Data standards
– Nomenclature
– Infrastructure

Nature (Perspective) 508 469-475 2014
Guidelines for investigating causality of sequence variants in human disease
D. G. MacArthur, T. A. Manolio, D. P. Dimmock, H. L. Rehm, J. Shendure, G. R. Abecasis, D. R. Adams, R. B. Altman, S. E. Antonarakis, E.
A. Ashley, J. C. Barrett, L. G. Biesecker, D. F. Conrad, G. M. Cooper, N. J. Cox, M. J. Daly, M. B. Gerstein, D. B. Goldstein, J. N. Hirschhorn,
S. M. Leal, L. A. Pennacchio, J. A. Stamatoyannopoulos, S. R. Sunyaev, D. Valle, B. F. Voight, W. Winckler & C. Gunter.
Priorities for research and infrastructure development
1. Improved public databases of human genetic variants incorporating explicit, up-to-date supporting
evidence for variant implication in disease and audit trails recording changes in interpretation.
2. Improved incentives, and ethical and logistical solutions, for sharing of genetic and phenotypic data from
both research and clinical diagnostic laboratories.
3. Public databases of variant and allele frequency data from large sets of population reference samples
from a wide range of ancestries.
4. Large-scale genotyping of reported human disease-causing variants in large, well-phenotyped
population cohorts, reducing biases in the assessment of the associated penetrance and phenotypic
heterogeneity.
5. Development and benchmarking of standardized, quantitative statistical approaches for objectively
assigning probability of causation to new candidate disease genes and variants.
Déjà vu all over again?

Nature Genetics 46, 107–115 (2014)
Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch
repair gene variants in the InSiGHT locus-specific database
Bryony A Thompson, Amanda B Spurdle, John-Paul Plazzer, Marc S Greenblatt, Kiwamu Akagi, Fahd Al-Mulla, Bharati Bapat, Inge
Bernstein, Gabriel Capellá, Johan T den Dunnen, Desiree du Sart, Aurelie Fabre, Michael P Farrell, Susan M Farrington, Ian M
Frayling, Thierry Frebourg, David E Goldgar, Christopher D Heinen, Elke Holinski-Feder, Maija Kohonen-Corish, Kristina Lagerstedt
Robinson, Suet Yi Leung, Alexandra Martins, Pal Moller, Monika Morak, Minna Nystrom, Paivi Peltomaki, Marta Pineda, Ming Qi,
Rajkumar Ramesar, Lene Juel Rasmussen, Brigitte Royer-Pokora, Rodney J Scott, Rolf Sijmons, Sean V Tavtigian, Carli M Tops,
Thomas Weber, Juul Wijnen, Michael O Woods, Finlay Macrae & Maurizio Genuardi, on behalf of InSiGHT.
Nature Genetics 46, 107–115 (2014)
1. Leiden Open Variation Database (LOVD)
2. Micro- attribution using Open Researcher & Contributor Identification (ORCID)
3. Variant Interpretation Committee (VIC) apply a 5-tiered scheme developed by the
International Agency for Research on Cancer (IARC) classification system
4. Endorsed by the Human Variome Project (HVP)

Not everything in the Nature portfolio is gold
It is good to supplement your pocket money

Early nomenclature papers
• Beaudet
• Tsui
• Antanorakis

Translation into diagnostic practice
• 15 years ago Cotton predicted that the
majority of human genetic variants will be
detected in a diagnostic context
• As NGS moves into a service setting this
transition will become even clearer
• Genetic variants will become part of a
patient’s medical record

HVPA database
• Primarily for and of diagnostics
• Diagnostic services are busy
• And cash and time limited
• We have to make it easy for them
• And secure
• And useful
• Maybe even essential

HVPA Objective
A national data sharing facility for improving
clinical genetic testing services and supporting
medical research
Constitutional, not somatic, mutations
NECTAR project grant UoM FE31082
“Clinical and Molecular Data Linkage Tools”,
completion date 30th June 2014

Infrastructure and search interface
• Data repository (“the database”)
• Data handling tools that support data upload
from laboratories
• Portal though which the database can be
browsed
• Website for news and notifications

Human Variome Project Australian
Node
What We’ve Done
• NeAT Funding (2010-2011)
– Pilot Phase
– 4 labs, 3 diseases
• Breast Cancer
• Colon Cancer
• Huntington’s
– Portal Launched April 2011
– Molecular Data Only
– Collaboration with Mawson
• NeCTAR Funding (2012-2014)
– 12 more labs + all genes they test
for
– Configuration Tool
– Clinical Data/Phenotype Linkage
– Transfer data internationally
What We Built
• Collection Tool
• Portal
• Data Model
• Ethics Processes
• Access & Usage Policy
• Data Sharing Agreements

How it works
• Software to interface with existing LIMS (or lack thereof)
• Collection occurs after report has been issued
• Data types:
– All classified variants reported by a lab
– Benign variants
– NGS/Incidental findings
– Not collecting negative results
• Secure data link between lab and Node
• (Semi)-automatic transfer of data
• Portal to allow interrogation of all Australian data
– http://www.hvpaustralia.org.au
• Linkage key generator
• Submission to BioGrid Platform

Open-Source Solutions
• HVP Portal (v1.0, r512) - A web application which features the basic
interface for browsing and querying a HVP node.
– Open source – MIT License
– Python/django
• HVP Exporter (v1.0, r512) - Basic HVP exporting tool for
laboratories. Features simple GUI and error checking interface,
plug-in architecture for customisation between sites and common
libraries for working with MS Access and MS Excel data sources
– .NET C#, python/ironpython
• HVP Importer (v1.0, r512) - A series of tools and web services that
receive, decrypt and process information by submitting laboratories
using the standard transaction XML format
– python

Access to HVPA
• Controlled Access
– Diagnostic Lab Staff
– Registered Medical Practitioners
– Board Certified Genetic Counsellors
• Online application

HVPA Status at November 2013
Strengths
1. Database available on demand
for diagnostic labs
2. Tools for data sharing
3. Community engagement with
RCPA (QUUP), SA/Mawson,
BioGrid, VCGS
4. National reach with
international connections via
HVPI, WHO & UNESCO
Weaknesses
1. Performance of the existing
HVPA database is limited
2. Laboratory buy-in to the
database across Australia is
limited
3. The database itself has been
hard to access because of low
server bandwidth
4. The project has not anticipated
the likely impact of next
generation sequencing and risks
missing inclusion in genomic-
scale initiatives now underway.

HVPA 24th March 2014
• 5 laboratories submitting
• 295 Unique Variants
• 27410 Instances
• 25 Registered users

Developments proposed in November
ID Area Idea Priority
1 B. Presentation Statistics of number of variants for that gene as table or bar graph (# unique, # instances, top 5
qty submitted)
1
15 D. Feedback Raise a concern about an instance's interpretation 1
2 A. Search Search by range 2
3 A. Search Search by genomic position 2
4 A. Search Filter by pathogenicity 2
5 B. Presentation Sort by ... (pathogenicity, other fields) 2
6 C. Relevant Info Display links to related database for gene by referencing genenames.org 2
7 A. Search Wildcard search of variants 2
9 A. Search Search by disease which shows multiple genes and variant results 2
10 E. NGS VCF data imports into HVP Australia 2
13 B. Presentation VarVis - visualisation of gene and variants reported 2
11 B. Presentation VCF data export from HVP Australia of a set of results 3
12 B. Presentation At instance level - see other variants from this test/patient 3
14 C. Relevant Info Capture & display SIFT score 3
16 D. Feedback Notify labs the general concensus of pathogencity of something they submitted has
changed/updated. i.e They submitted benign and its now likely pathogenic or submited
unknown and know its something else
3
17 B. Presentation Integration with EBI/NCBI tools for queries and displays 3
19 B. Presentation Display last date uploaded for this variant (or last 10 dates) 3

Accessing the test database
http://115.146.85.61/
Username:
lab_tester
Password:
hvpaustralia2013

Search Interface
• The search interface has to provide useful tools for
clinicians and lab scientists so that the HPVA project offers
them direct benefits and incentivises them to participate.
Following a request for feedback from users, a series of
improvements were implemented, initially on a
demonstration server and then on the live server following
review by the Steering Committee. The highest priorities
were for more information about numbers of times
particular variants were recorded, the ability to search by
range and to filter by pathogenicity. There was also interest
in enabling direct uploading of VCF files and the automated
calculation of pathogenicity scores. Many of these features
are now implemented and examples will be presented.

Purpose of the HVPA Database
• Working database
– Record and share diagnostic quality data genetic variation
data
– Integrate with clinical phenotype data
– Integrate with international efforts
– Heads up for NGS gene panel data sets
• Test database
– Showcase enhancements
– Real world testing and feedback
– Uses data edited from actual database
– Not accurate or reliable: some parameters edited for test
purposes

Major improvements to search facility

Searching by expression match
BRCA BR

Direct Import from Results Lists
• Can recover historical data sets
• Reformat on the fly
• Useful as low-overhead catch up to enable labs to
transition to using uplaoding toals as their IT
permits
– PathWest (John Bielby)
– Institute of Health and Biomedical Innovation,
Queensland (Lyn Griffiths)
– Kconfab (Heather Thorne)
– Peter MaCallum Cancer Centre (Ken Doig)

Variant Fields Mandatory
GeneName RefSeqName RefSeqVer cDNA mRNA Genomic Protein Location
Official HGNC
Symbol
Name of
reference
sequence (NCBI's
RefSeq project)
Version of
reference
sequence
(RefSeq)
HGVS variant
name (c.)
HGVS variant
name (m.)
HGVS variant
name (g.)
HGVS variant
name (g.)
Exon or intron
number
VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255) VARCHAR(255)
Mandatory Mandatory Mandatory At least one required
Pathogenicity PatientID TestID InstanceDate GenomicRefSeq GenomicRefSeqVer
Level of pathogenicity
(1=Pathogenic, 2=Possibly
Pathogenic, 3=Unknown,
4=Possible benign,
5=Certainly Benign)
Internal ID for
the patient
used within
the lab
Internal ID
for the test
used within
the lab
Date instance
was tested
Genomic
reference
sequence
Genomic reference
sequence version
VARCHAR(20) DateTime VARCHAR(255) VARCHAR(255)
Mandatory Mandatory Mandatory Mandatory Mandatory Mandatory

Variant Fields (Optional)
PatientAge TestMethod SampleTissue SampleSource Justification
Age of patient
when test was
taken
The name of the
test method used
Type of sample
taken
The source of the
sample e.g.: DNA,
g.DNA, RNA...
Justification by medical
scientist
INT32 VARCHAR(20) VARCHAR(20) VARCHAR(20) VARCHAR(65535)
Optional Optional Optional Optional Optional
PubMed RecordedInDatabase SampleStored
VariantSegregatesWi
thDisease HistologyStored
PedigreeA
vailable SIFTScore
PubMed
Identifier/Data
Object Identifier
Whether it is
recorded in disease
specific or gene
specific
Whether lab still
has sample left
Whether pedigreee
was consideed during
diagnosis of
pathogenicity
Whether
histograms are
stored
Whether
organisati
on has
pedigree
data
Calculated
SIFT Score
VARCHAR(255) Boolean Boolean Boolean Boolean Boolean INT32
Optional Optional Optional Optional Optional Optional Optional

Linkage to other datasets
• HVPA have implemented the hash key
algorithm and work is in progress with BioGrid
to link variation data to clinical data sets.
• More details from Maureen Turner, BioGrid
CEO who is speaking at this meeting

Cost and performance will force
diagnostic labs to adopt NGS as front-line approach
cost per base Illumina share price
Hype cycle

HVPA LOVD3 database pilot
• Established an HVPA LOVD3 database and
working with the Human Genetics Society of
Australasia on a pilot study to sequence the
exomes of two trios and review the data using
this database.
• Includes exome-scale data
• Open access to Coriell cases with no “consent”
issues
• Explore staging of variant “credibility
classification” and access

Relationship to Gene Panel Databases?
e.g. http://genomics.bio21.unimelb.edu.au/lovd/

Melbourne Genomics Health Alliance
34

• Clinically led, rather than technology driven
• Fostering ‘end use’ of genomic data
• Common clinical repository
• Prospective : first tier test
• Evaluation to inform implementation
• Engineering collaboration
• Fostering system change
• A/Prof Clara Gaff: Program Leader
PARADIGM FOR IMPLEMENTING GENOMIC MEDICINE
35
Melbourne Genomics Health Alliance

Connected nationally
and internationally
36

How many variants per exome?
SNP count Study
20,000 Choi et al. PNAS 2009
142,000 Mullikin NIH, unpublished 2010
50,000 Clark et al. Nature biotechnology 2011
125,000 Smith et al. Genome Biology 2011
100,000 Johnston & Biesecker Human Molecular Genetics 2013
200,000 to 400,000 Yang et al.N Engl J Med 2013
• 20-fold range
• Exome designs vary
• Likely to be higher variant count in African populations as the
reference sequence is non-African

Low concordance of multiple
variant-calling pipelines
Rawe et al Genomic Medicine 2013
• 15 exomes
• 4 families
• HiSeq 2000
• Agilent SureSelect v.2
• ~120X mean coverage
• SOAP, BWA-GATK, BWA-SNVer,
GNUMAP, and BWA- SAMTools
• SNV concordance between five Illumina
pipelines across all 15 exomes was 57.4%
• 0.5-5.1% variants were called as unique to
each pipeline
• Indel concordance was only 26.8% between
three indel calling pipelines
• 11% of CG variants that fall within targeted
regions in exome sequencing were not called
by any of the Illumina-based exome analysis
pipelines
• 97.1%, 60.2% and 99.1% of the GATK-only,
SOAP-only and shared SNVs can be validated
• 54.0%, 44.6% and 78.1% of the GATK-only,
SOAP-only and shared indels can be validated
• Additional accuracy gained in variant
discovery by having access to genetic data
from a multi- generational family

Low concordance of multiple variant-calling pipelines
O’Rawe et al. Genome Medicine 2013, 5:28
SNV concordance: 57.4% Indel concordance 26.8%

Venn diagrams of selected CNV detection
methods in real data processing
Duan J, Zhang J-G, Deng H-W, Wang Y-P (2013) Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing
Technologies. PLoS ONE 8(3): e59128. doi:10.1371/journal.pone.0059128
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059128

Remove errors before processing
K-mer selection
Merging'forward'and'reverse'reads'
0
200
400
600
800
1000
1200
1400
1600
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTTTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGTATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGTA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATTAAGAAATCGATAGCATTTGCA
TAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
CAGAAAAAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATCTGCA
CAGAAAAAGTAGGAAATGGAAGTCTATGTGATCAAGAAATTGATAGCATTTGCA
CAGAAAGAGTAGAAAATGGAAGTCTATGTGATCAAGAAATCGATAGCATTTGCA
Discard rare reads
Use a HiFi polymerase

• Known SNV concordance 100%, all assays
• Known indel <6bp concordance 100%, all assays
• Not able to detect c9orf72 hexanucleotide expansion or PRNP
octapeptide region repeat with standard pipeline
• Diagnostic yield within appropriate clinical context (based on
very limited sample size)
- NimbleGen SeqCap EZ Neuro: 33% (2/6)
- Nextera Neuro: 23% (6/26)
Results – detection of variants

Filtering Variants
All variants None Qual Not in Blood
Blood 9828 8551 NA
Frozen 9920 8736 126
FFPE 9709 8163 199
Variants in Gene List None Qual Not in Blood
Blood 27 18 NA
Frozen 27 23 2 (EGFR)
FFPE 25 19 3 (EGFR, ROS)

Confirmation by PCR
0.0
50.0
100.0
150.0
200.0
250.0
EGFR_NM
_005228.3
T790
T790
W
T
EGFR_NM
_005228.3
784
"c.2350T>C,p.S784P"
EGFR_NM
_005228.3
784
"c.2351C>T,p.S784F"
EGFR_NM
_005228.3
785
"c.2354C>T,p.T785I"
EGFR_NM
_005228.3
786
"c.2356G>A,p.V786M
"
EGFR_NM
_005228.3
790
"c.2368A>G,p.T790A"
EGFR_NM
_005228.3
790
"c.2369C>T,p.T790M
"
EGFR_NM
_005228.3
828
&
861
"828
&
861,w
t"
EGFR_NM
_005228.3
858
"c.2572C>A,p.L858M
"
EGFR_NM
_005228.3
858
"c.2573_2574delinsGT,
EGFR_NM
_005228.3
858
"c.2573T>A,p.L858Q"
EGFR_NM
_005228.3
858
"c.2573T>G,p.L858R"
EGFR_NM
_005228.3
860
"c.2579A>T,p.K860I"
EGFR_NM
_005228.3
861
"c.2582T>A,p.L861Q"
EGFR_NM
_005228.3
861
"c.2582T>G,p.L861R"
EGFR normalised
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
KRAS_NM
_033360.2
12
"c.34G>A,p.G12S"
KRAS_NM
_033360.2
12
"c.34G>C,p.G12R"
KRAS_NM
_033360.2
12
"c.34G>T,p.G12C"
KRAS_NM
_033360.2
12
"c.35G>A,p.G12D"
KRAS_NM
_033360.2
12
"c.35G>C,p.G12A"
KRAS_NM
_033360.2
12
"c.35G>T,p.G12V"
KRAS_NM
_033360.2
13
"c.37G>A,p.G13S"
KRAS_NM
_033360.2
13
"c.37G>C,p.G13R"
KRAS_NM
_033360.2
13
"c.37G>T,p.G13C"
KRAS_NM
_033360.2
13
"c.38G>A,p.G13D"
KRAS_NM
_033360.2
13
"c.38G>C,p.G13A"
KRAS_NM
_033360.2
13
"c.38G>T,p.G13V"
KRAS normalised

Auto Upload Database of Results in LOVD
Local LOVD instances sharable via HVPA

• Coriell pedigree comparison
• Subset of 19 genes – targeted by all four assays
• Variant allele frequency cut-off of 35% (interested in germline
variants)
Results – detection of variants
Total number of variants
detected
Non-synonymous variants
detected # variants with GAF <5% # variants with African AF 5%
Y077
Mother
Y077
Father
Y077
Child
Y077
Mother
Y077
Father
Y077
Child
Y077
Mother
Y077
Father
Y077
Child
Y077
Mother
Y077
Father
Y077
Child
NimbleGen SeqCap EZ
Neuro
194 241 196 16 22 20 4 5 7 2 3 4
Nextera Neuro 250 296 283 17 23 22 4 6 7 2 3 4
TruSight One 121 137 119 16 23 20 3 6 6 1 3 3
Nextera Exome 101 118 114 16 22 22 4 5 7 2 2 4
Y117
Mother
Y117
Father
Y117
Child
Y117
Mother
Y117
Father
Y117
Child
Y117
Mother
Y117
Father
Y117
Child
Y117
Mother
Y117
Father
Y117
Child
NimbleGen SeqCap EZ
Neuro
279 245 263 20 20 20 4 5 6 3 2 4
Nextera Neuro 382 371 342 20 21 21 5 5 6 3 2 4
TruSight One 148 154 148 18 18 17 4 4 5 3 2 3
Nextera Exome 121 67 66 19 15 16 5 3 4 3 1 3

Example case showing concordance
Gene Variant Chr Coordinate zyg Gene Variant Chr Coordinate zyg KEY
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het exome
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het nimble neuro
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het next neuro
APOE T>T/C 19 45411941 het NPC1 T>T/C 18 21120444 het trusight 1
APOE C>C/T 19 45412040 het NPC1 TA>TA/T 18 21123536 het
APOE C>C/T 19 45412040 het NPC1 TAA>TAA/T 18 21123536 het
ATP7B G>G/A 13 52511606 het NPC1 C>G/G 18 21124945 hom
ATP7B A>A/G 13 52515354 het PARK2 G>G/C 6 162622239 het
ATP7B C>C/T 13 52523808 het PINK1 A>A/G 1 20964328 het
ATP7B T>T/C 13 52524488 het PINK1 G>G/A 1 20972048 het
LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 G>G/A 1 20975727 het
LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het
LRRK2 G>A/A 12 40619082 hom PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PINK1 A>A/C 1 20977000 het
LRRK2 C>C/G 12 40657700 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/A 12 40713901 het PSEN2 G>G/A 1 227071449 het
LRRK2 T>T/C 12 40758652 het VCP C>T/T 9 35062972 hom
NPC1 G>G/A 18 21119777 het VCP A>A/G 9 35068364 het

Describing Coverage
% target region
with non-zero
depth
% target
regions >=
5x
% target
regions >=
15x
% target
regions >=
30x
% target
regions >=
50x
average
depth of
coverage
5th-centile 20th-centile50th-centile 95th-centile
MiSeq
(12plex)
99.54% 98.25% 94.80% 89.10% 81.56% 180.76 19.08 67.42 160.42 414.00
HiSeq
(48plex)
99.90% 99.71% 99.34% 98.85% 98.17% 920.84 126.75 408.83 871.17 1879.92
Mapping quality >= 15
Base quality score >= 15

Coverage reproducibility
Coverage Coefficient of variation

Higher coverage greater reproducibility
Coverage Coefficient of variation

Can capture coverage report dosage to
diagnostic standards?samples
targets
samples
autosomaltargetschrXtargets
Inter-sample
variation is low,
But low coverage
prevents dosage
estimation
Chr X is a good first pass test for dosage

XX vs. XY
8 Female cases and 16 Male cases showing reproducibility of coverage of X loci
within each group. Loci with higher SDs were associated with reduced coverage.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 10 20 30 40 50 60 70 80
Average XX
Average XY
-0.5
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60 70 80
AVGE XX
AVGE XY
870
160

Sharing Experience with TruSight One
• In partnership with Illumina, RCPA and the HGSA
Kim Flintoff (Wellington Regional Genetics
Laboratory) is leading an evaluation of exon
sequencing using Illumina’s True Sight One
panel. Two Coriell family trios will be sequenced
by New Zealand Genomics Limited and the data
will be shared on a HVPA database
• The VCF file will be available on the HVPA LOVD
database and performance stats will also be
made available.

Next Steps
• Robust standards for genomic medicine
• Databases and data content
– Access to identified and de-identified data (consent
and confidentiality)
– Database accreditation process in prep with RCPA
– Defining the performance of various aligners, variant
callers and annotation programs
– Clinical grade Variant Call Format (VCF)
– Metafile covering data trail: what was tested, what
was not tested

Standards for Accreditation of DNA
Sequence Variation Databases
Quality Use of Pathology Program (QUPP), a national project for the Development of Standards for
Accreditation of DNA Sequence Variation Data Bases has been jointly initiated by the Royal College of
Pathologists of Australasia (RCPA), and the Human Variome Project (HVP).
Background
• There is a rapidly increasing volume, spectrum, and complexity of genetic tests emerging within
diagnostic pathology laboratories. In particular, high throughput sequencing methods such as
targeted panel, exome (WES), and whole genome sequencing (WGS), are producing an increasing
quantity of genetic data requiring analysis and interpretation, forming a substantial proportion of
the workload.
• Currently, there is a plethora of online mutation databases to refer to, however there is a distinct
lack of such databases that meet the stringent accuracy and reproducibility that the clinical
diagnostic environment demands. Additionally, The current databases are “Fractured”, with varied
access and sharing of the data within; and variable quality due to errors / inaccurate data posting,
all of which is a clear risk to the quality of patient care. With more widespread, secure sharing of
variants and associated phenotypes, the value of cumulative variant information will accelerate the
delivery of accurate, actionable, and efficient clinical reports.
• There are currently no standards or equivalent mechanisms for accreditation of databases to
ensure the accuracy and quality of uploaded data into any central repository to meet the needs of
the clinical diagnostics environment.

Data quality classes
Differentiate between three classes of data:
The Clinically Reported data label would denote the class of data that the HVP
Australian Node was originally designed to collect and share: data that has been
generated in a NATA accredited Australian diagnostic laboratory and is able to be
included in a clinical report.
Unreported Clinical quality data would denote data that has been generated in a
NATA accredited diagnostic laboratory, but is not capable of being included in a
clinical report. This class would comprise, primarily, of next-generation
sequencing (NGS) type data.
Unaccredited data would be used to denote data that has been generated by an
Australian laboratory that has not been NATA accredited
A new filtering option would be made available to allow users to view only data
of a certain class

Beyond the NeCTAR funding
• Academic or charitable funding required
• Integrate NGS data resource into the HVPA
portfolio
• Move database development into a medical
academic centre of excellence
• Seek active partnerships with current and
future collaborators with investment and risk
sharing

The Human Variome Database in Australia in 2014 - Graham Taylor

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie The Human Variome Database in Australia in 2014 - Graham Taylor

Ähnlich wie The Human Variome Database in Australia in 2014 - Graham Taylor (20)

Mehr von Human Variome Project

Mehr von Human Variome Project (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The Human Variome Database in Australia in 2014 - Graham Taylor