A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomics Platform
`
A Retrospective Analysis of Exome Sequencing Cases Using the
GenePool™ Genomics Platform
Abstract
Materials and Methods
Conclusions
References
For further information
Antoaneta Vladimirova2, Tod Klingler2, Richard Goold2, Erik G. Puffenberger1
1The Clinic for Special Children, Strasburg, PA; 2Station X, 185 Berry Street, Suite 2001, San Francisco, CA
Figure 4. Summary table of all affected probands and their phenotypes, along
with the results of the GenePool genomic analysis with the putative candidate
genes identified
The candidate genes are split into different categories, homozygous recessive, de novo,
autosomal dominant and compound heterozygous, based on the potential mode of
inheritance in each case. The gene symbols in bold represent the most likely candidates
based on a combination of genomic analysis, family medical history assessment, and
diseases and HPOs known to be associated with the candidate genes in CGD or the
literature. The underlined gene symbols represent full concordance between the CSC
results and GenePool gene candidates.
Figure 1. Analysis and results of Family #14
Proband and family members were analyzed in GenePool. Figure 1A represents the clinical data
gathered for family #14. Other than the proband and an affected brother, which presents with
skeletal displasia, scoliosis, ASD, cleft palate, etc., the rest of the family members are unaffected.
Figure 1B represents the Analysis Designer setup in GenePool where the project, analysis type
and sample groups are selected along with the desired parameters for analysis. Figure 1C
represents the two the top candidates variants, associated with two CGD diseases. Figure 1D
demonstrates a variation distribution analysis of the two top candidates across all family members.
Both variants, in SH3TC2 and SLC26A2 genes, are homozygous while the rest of the unaffected
family members are in a heterozygous form.
Subjects of Amish or Mennonite descent from multiple families were chosen for
next-generation DNA testing if they presented to CSC with clinical signs of an
underlying genetic lesion and remained without a diagnosis following standard
biochemical and genetic investigations (e.g. metabolic testing, targeted gene
sequencing, cytogenetic or lower-density molecular karyotyping, etc.). The study was
approved by the Lancaster General Hospital institutional review board. All probands
(or their parents) had pre-test counseling to explain the goals, process, timing, and
limitations of microarray and exome testing. All subjects consented in writing to
participate on behalf of themselves or their children. Prior to molecular testing, every
proband underwent detailed phenotyping by one of three CSC clinicians. The process
included pre- and perinatal history, a record of illness and hospitalizations, and
annotated medical problem list of HPO1 classifications. Probands and their family
members were exome-sequenced through the Regeneron Genetics Center
(RGC). Briefly, 1ug of high-quality genomic DNA was exome captured using the
NimbleGen VCRome SeqCap 2.1 reagent; captured libraries were sequenced on the
Illumina HiSeq 2500 platform using v4 chemistry. Exome sequencing was performed
such that >85% of the bases were covered at 20x or greater. Raw sequence reads
were mapped and aligned to the GRCh37/hg19 human genome reference assembly
using standard bioinformatics algorithms (BWA/GATK). Called variants were
filtered based on standard quality metrics: minimum read depth (>10), genotype
quality (>30), and allelic balance (>20%).Generated VCF files were uploaded to
GenePool2 cloud-based genomics platform for subsequent analysis via DNAnexus3
integration. Clinical data such as age, gender, family relation and associated HPOs
was uploaded in GenePool along with the molecular data and used in integrative
analyses. Only variants passing standard quality criteria were further analyzed in
GenePool (coverage>=10, quality>=30, variant frequency>=10%). All variants
generated through the standard trio analysis workflow, variation comparison,
variation comparison or variant distribution workflows were analyzed. Variants were
filtered to exclude previously determined CSC “common” variants. Subsequently,
SNPeff4 annotations were utilized so that variants of high and moderate impact were
prioritized. Additionally, allele frequencies form 1000 Genomes Project5, Exome
Sequencing Project6 (ESP) and The Exome Aggregation Consortium7 (ExAC), and
specifically, the European descent-related ones (AF>1% for homozygous recessive
and AF>=0 for de novo variants) were applied to prioritize variants further. Clinical
Genomics Database8 (CGD) and ClinVar9 disease annotations were also used to
identify most likely candidates related to the proband phenotypes and HPOs. Allele
frequencies of all unaffected or likely unaffected individuals were also calculated and
used for variant prioritization. Disease Ontology10 and genes associated with each
disease term curated in GenePool were also applied to identify relevant variant
candidates. In addition to trio analysis, variant profile and variant distribution
workflows, in cases where families were very large and we could identify groups of
affected and unaffected individuals, we also applied variation comparison analysis.
Variants were also analyzed with the “gene pivot” functionality in GenePool to
rapidly identify compound heterozygous scenarios.
Figure 2. Analysis and results of Family #75
Proband and family members were analyzed in GenePool. Figure 1A represents the clinical data
gathered for family #75. Other than the proband, which presents with anxiety, aggression, OCD,
autism, intellectual disability, epilepsy etc., the rest of the family members are unaffected. Figure
1B represents the prioritized variant results in GenePool after removing the common” variants,
filtering for de novo variants, allele frequencies and Disease Ontology “epilepsy syndrome” term,
resulting in two top candidates. Interactive pie chart widgets represent the ability to dynamically
and visually quickly filter results. Figure 1C CDH2 variant, shown as present in heterozygous
form in the affected proband, but no in the any of the unaffected family members.
• GenePool cloud-based genomics software platform was successfully applied in a
retrospective CSC exome analysis project to store, manage and analyze genomic data
form over two dozen probands and their families to address a variety of undiagnosed
medical conditions and their genetic underpinnings.
• For each proband one or more causative variants and candidate genes were identified
that support the mode of inheritance of the condition within the family, and the
collected clinical data for both the proband and the family members.
• CSC and GenePool genomic results demonstrate a high level of concordance
• A variety of integrated workflows in GenePool such as trio analysis, variation profile,
variation cohort comparison and variation distribution allowed for quick, intuitive and
efficient process to analyze each family and identify a short list of candidate variants
and genes spanning homozygous, de novo, autosomal dominant and compound
heterozygous modes of inheritance
• The ability to integrate clinical information along with the molecular data in
GenePool was critical for segmenting the family members based on their phenotypes
and conditions, and for efficiency of the analysis
• Multiple annotations for variants, genes and diseases in GenePool were instrumental
in streamlining the process of variant prioritization and interpretation
• GenePool platform served as an efficient tool in the analysis and identification of
putative causative variants to facilitate diagnosis and optimize patient management
• More information on Station X and GenePool platform can be obtained at http://
ww.stationxinc.com.
• For more information on this poster please contact antoaneta@stationxinc.com
• Follow us on Twitter @StationXInc
1. Human Phenotype Ontology: http://human-phenotype-ontology.github.io
2. GenePool by Station X, Inc.: http://www.stationxinc.com/
3. DNAnexus: https://www.dnanexus.com
4. SnpEff: http://snpeff.sourceforge.net
5. 1000 Genomes Project: http://www.internationalgenome.org
6. Exome Sequencing Project: https://esp.gs.washington.edu/drupal/
7. Exome Aggregation Consortium: http://exac.broadinstitute.org
8. Clinical Genomics Database: https://research.nhgri.nih.gov/CGD/
9. ClinVar Database: http://www.ncbi.nlm.nih.gov/clinvar/
10. Human Disease Ontology: http://www.obofoundry.org/ontology/doid.html
The Clinic for Special Children (CSC) is a rural pediatric
non-profit medical practice serving uninsured Amish and
Mennonite (Plain) children with genetic disorders. The
clinic strives to identify genetic causes of childhood
disability and disease and uses modern genetic
technologies to diagnose and treat patients. Whole exome
sequencing (WES) and data analysis in conjunction with
deep phenotyping has enabled the scientific community to
achieve great success in identifying the molecular bases
of disease. The CSC has used these technologies
successfully as well over the past several years. The CSC
employs a diagnostic pipeline for new patients that
involves detailed phenotyping, targeted mutation
detection, chromosomal microarray analysis, and exome
sequencing in order to generate a molecular diagnosis for
the patient. Due to a deep knowledge of segregating
mutations in the Plain populations, nearly 50% of all new
patients receive a diagnosis through targeted mutation
detection while roughly 3% have diagnostic copy number
changes. Of the remaining patients, our diagnostic yield
for clinical exomes is approximately 49%.
We present a validation study of solved WES cases from
the CSC where we demonstrate the ability to efficiently
identify putative causative variants in GenePool, a cloud-
based genomics platform for analysis of genomics data.
We utilized the built-in analytical workflows for trio
analysis and the pipelines designed for population-size
cohort analyses. The latter analyses compared groups of
affected and unaffected individuals. We used GenePool’s
interactive visualization filters with the comprehensive
library of annotations to quickly prioritize the list of
potential causative variants to a small highly-relevant set
and validated our results. GenePool allowed us to
efficiently screen for pathogenic variants associated with
autosomal recessive and de novo dominant phenotypes,
as well as with more complex genetic diseases. Rapid
diagnosis is crucial to optimal patient outcomes, and
GenePool solves a critical part of this process by enabling
the analysis and identification of a small set of putative
pathogenic variants in a short time frame. In this study, we
found high concordance between GenePool variant
prioritization and the prior ad hoc manual prioritization.
The study we present was conducted in specific regional
founder populations, but it provides important lessons for
WES studies in non-founder populations.
Results
A.
C.
B.
A.
B.
C.
A.
B.
C.
D.
Figure 3. Analysis and results of Family #76
Proband and family members were analyzed in GenePool. Figure 1A represents the clinical
data gathered for family #76. Along with the proband, which presents with decreased fetal
movement, neonatal hypotonia, global delays, ADD, relative macrocephaly, triangular facies,
narrow forehead, flat profile, small mouth, show similar symptoms. In contrast, the sister of
the proband is not affected, suggesting an X-linked condition. Figure 1B represents prioritized
variant results in GenePool after removing the common” variants, filtering for allele
frequencies and associated CGD diseases. Figure 1C demonstrated a variant distribution
analysis of the whole family where the HUWE1 variants is in a hemizygous form in the
affected proband and brothers, and in a heterozygous form in the unaffected sister and mother.
The unaffected father does not have the variant in HUWE1 gene.