VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAAST
Deciphering Genetic Disease with Next-Generation
Sequencing
Barry Moore, M.S.
Research Scientist
Department of Human Genetics
Department of Biomedical Informatics

Outline

 The VAAST Analysis Pipeline
 Ogden Syndrome: Application of VAAST to a Genetic Disease
of Unknown Cause
 The Future of VAAST Development

$10,000,000
Venter Genome

$1,000,000
Watson

$5,000
You?

Next Generation Sequencing

Disease

Healthy

geneA geneB geneX geneY geneZ

Variant
Variant Annotation
Annotation Tool

Variant
Variant Selection
Selection Tool

Variant
Variant Annotation
Analysis
Analysis Search
Tool

GVF

VAAST Pipeline 3.5 Million
Variants

Reference VAT Reference
Genome (Variant Annotation Tool) Genes
Fasta GFF3

Annotated Annotated Annotated
GVF
Variants Variants Variants

VST
(Variant Selection Tool)

CDR
Merged
Variant Sets

GVF

VAAST Pipeline Variant Effect
3.5 Million
•sequence_variant
Variants
•gene_variant
Reference VAT Reference
•five_prime_UTR_variant
Genome Type
Variant Genes
•three_prime_UTR_variant
(Variant Annotation Tool)
•sequence_alteration
Fasta •exon_variant GFF3
•deletion •splice_region_variant
•insertion •splice_donor_variant
•duplication
Annotated Annotated •splice_acceptor_variant
Annotated
•inversion GVF •intron_variant
Variants
•substitution
Variants Variants
•coding_sequence_variant
•SNV •stop_retained
•MNP •stop_lost
•complex substitution •stop_gained
•translocation VST •synonymous_codon
•non_synonymous_codon
(Variant Selection Tool)
•amino_acid_substitution
•frameshift_variant
•inframe_variant

CDR
Merged
Variant Sets

CDR CDR

Background Target
Genomes Genomes

VAAST

Prioritized
Candidate
Genes
VAAST
Report

Key Features of VAAST

• Probabilistic
• Feature Based
• Both Allele and AAS Frequencies
• Considers Inheritance Model
• Fast
• Standardized Ontology Based Format
• Modular and Flexible in Design

VAAST Uses Variant Frequencies in a
Probabilistic Fashion

Likelihood Ratio Test

Maximum Likelihood
of the Null Model
(No Difference)
Maximum Likelihood
of the Alternate Model
(There is Difference)

• VAAST gives us the likelihood of the composite genotype
at GENE X in the target given the background.

• Do allele frequencies differ between Background and
Target genomes within a given gene or feature?

• Composite likelihood calculation assumes independence
across sites. To control for LD, statistical significance is
estimated by permutation test.

• Multiple test correction for number of features (~20,000)
is two orders of magnitude better than for the number of
variants (~3,500,000).

Noise Decreases Dramatically with
Increasing Number of Genomes
1 genome target
1 genome background

1 genome target
10 genome background

1 genome target

1 genome target
Trio Data

Alleles Responsible for Miller
Syndrome in Utah Kindred
CHR 16: DHODH CHR 5: DNAH5
Mom Dad Mom Dad
G:R R:Q
G:A R:
*

Son Daughter Son Daughter
G:R G:R R:Q R:Q
R: R:
G:A G:A
* *

•Ng et al, Nature Genetics 42, 30–35 (2010) doi:10.1038/ng.499
•Roach, et al, Science , 328 636, 2101

Schematic of VAAST Analysis of Utah
Miller Kindred Using a Single Quartet

DHODH

DNAH5

Average Rank for 100 Dominant and
Recessive Diseases
1300
Ave. rank genome-wide

SIZE OF CASE COHORT
1100
2 allele copies
900
4 allele copies
700
6 allele copies
500

300
156 132
100 21 9 8 3

-100
DOMINANT RECESSIVE
-300

-500 443 genomes in background

Impact of Missing Data
4000

3500
2 of 6 allele copies
Ave. rank genome-wide

3000
2500
2000

1500

1000
639
500 373
61
21
9 3
0

-500
DOMINANT RECESSIVE

443 genomes in background

Outline

 The VAAST Analysis Pipeline
 Ogden Syndrome: Application of VAAST to a Genetic
Disease of Unknown Cause

 The Future of VAAST Development

An Rare X-linked Mendelian Disorder

• A Utah family coming to the
University Hospital for 20+
years
• About half of the male offspring
die around 1 year of age
• Aged appearance
• Craniofacial anomalies
• Hypotonia
• Global developmental delays
• Cardiac arrhythmias

Four Affected Boys over Two
Generations
I

II

III

Exome Sequencing
• Agilent SureSelect In-Solution X Chromosome Capture
• Covaris S series Sonication (150-200 bp)
• 76 bp single-end reads on one lane each of the
IlluminaGAIIx

Variant Calling
• Sequence alignment with bwa
• Remove duplicate reads with PICARD
• Realign indel regions with GATK
• Variant calling with Samtools, GATK

Identifying Candidate Genes

VAAST Identifies NAA10 as Candidate Gene
• About 20 min. run time
• 3 candidate genes (NAA10 ranked 2) proband only
• 1 candidate gene (NAA10) with pedigree

Additional Analyses

• Microarray based CNV analysis
• No likely causal variants found
• Sanger sequencing confirmation
• Variant segregates perfectly with disease in 13
family members
• Haplotype sharing (STR genotyping)
• ~11 MB shared between two affected boys
• A second family discovered – same mutation
• IBD relatedness analysis – independent mutational
events

N(alpha)-acetyltransferase
• N-alpha-acetylation is one of the most common protein
modifications that occurs during protein synthesis.
• NatA (catalytic subunit NAA10 (hARD1)
• Eight exons, Crick strand, highly conserved
• A:G transition causes p.Ser37Pro

Functional Analyses
• Quantitative in vitro N-terminal acetylation assay (RP-
HPLC).
• Four peptide substrates previously shown to be
acetylated by NatA (NAA10)
• Assays indicate loss-of-function allele.

VAAST in Summary

• Probabilistic Disease Gene Finder
• Feature Based not Variant Based
• Both Allele and AAS Frequencies
• Considers Inheritance Model
• As few as two target genomes can be sufficient to
identify causative gene.
• Background Genomes are “Reusable”
• Not Limited to Human Analyses

VAAST: Future Directions

• Indel support
• Splice-site
• No-call support
• Pedigree support
• Phylogenetic conservation

Acknowledgements
VAAST Development Ogden
•Chad Huff Syndrome •Thomas Arnesen
•HaoHu •John Carey •Rune Evjenth
•Lynn Jorde •Steven Chin •Johan R. Lillehaug
•Barry Moore •Heidi Deborah Fain
•Martin Reese •Gholson Lyon •Leslie G. Biesecker
•Marc Singleton •John Optiz •Jennifer J.
•Jinchuan Xing •Theodore J. Pysher Johnston
•Mark Yandell •Alan Rope •Cathy A. Stevens
Yandell Lab •Reid Robison
•Sarah T. South •Brian Dalley
•Michael Campbell •Tao Jiang
•Daniel Ence •JeffereySwensen
•Chad Huff
•Guozhen Fan
•Evan Johnson
•Steven Flygare •HakonHakonarson
•Barry Moore
•HaoHu •Lynn B. Jorde
•Christa Schank
•Zev Kronenberg •Mark Yandell
•Kai Wang
•Barry Moore
•Jinchuan Xing
•Marc Singleton
•Robert Ross
•Mark Yandell

VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Empfohlen

Empfohlen (20)

VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Hinweis der Redaktion