SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
WTAC NGS Course, Hinxton 12th
April 2014
Lecture 2: Identification of SNPs, Indels, and
structural variants
Thomas Keane
Sequence Variation Infrastructure Group
WTSI
Today's slides: ftp://ftp-mouse.sanger.ac.uk/other/tk2/WTAC-2014/Lecture2.pdf
WTAC NGS Course, Hinxton 12th
April 2014
Lecture 2: Identification of SNPs, Indels, and structural
variants
➢ VCF Format
➢ SNP/indel Identification
➢ Structural Variation
WTAC NGS Course, Hinxton 10th
April 2014
VCF: Variant Call Format
VCF is a standardised format for storing DNA polymorphism data
● SNPs, insertions, deletions and structural variants
● With rich annotations (e.g. context, predicted function, sequence data support)
Indexed for fast data retrieval of variants from a range of positions
Store variant information across many samples
Record meta-data about the site
● dbSNP accession, filter status, validation status
Very flexible format
● Arbitrary tags can be introduced to describe new types of variants
● No two VCF files are necessarily the same
● User extensible annotation fields supported
● Same event can be expressed in multiple ways by including different numbers
● Recommendation on VCF format website to ensure consistency
WTAC NGS Course, Hinxton 10th
April 2014
VCF Format
Header section and a data section
Header
● Arbitrary number of meta-data information lines
● Starting with characters ‘##’
● Column definition line starts with single ‘#’
Mandatory columns
● Chromosome (CHROM)
● Position of the start of the variant (POS)
● Unique identifiers of the variant (ID)
● Reference allele (REF)
● Comma separated list of alternate non-reference alleles (ALT)
● Phred-scaled quality score (QUAL)
● Site filtering information (FILTER)
● User extensible annotation (INFO)
WTAC NGS Course, Hinxton 10th
April 2014
Example VCF (SNPs/indels)
WTAC NGS Course, Hinxton 10th
April 2014
VCF Trivia 1
What version of the human reference genome was used?
What does the DB INFO tag stand for?
What does the ALT column contain?
At position 17330, what is the total depth? What is the depth for sample NA00002?
At position 17330, what is the genotype of NA00002?
Which position is a tri-allelic SNP site?
What sort of variant is at position 1234567? What is the genotype of NA00002?
WTAC NGS Course, Hinxton 10th
April 2014
Functional Annotation
VCF can store arbitrary
● INFO tags per site
● Genotype FORMAT tags
Use tags to describe
● Genomic context of the variant (e.g. coding, intronic, non-coding, UTR,
intergenic)
● Predicted functional consequence of the variant (e.g. synonymous/non-
synonymous, protein structure change)
● Presence of the variant in other large resequencing studies
Several tools for annotating a VCF
● SnpEff: http://snpeff.sourceforge.net/
● Ensembl VEP: http://www.ensembl.org/info/docs/tools/vep/script/index.html
● FunSeq: http://funseq.gersteinlab.org/
WTAC NGS Course, Hinxton 10th
April 2014
Ensembl - VEP
"VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants)
on genes, transcripts, and protein sequence, as well as regulatory regions."
Species must be included in either Ensembl OR Ensembl genomes
Sequence ontology (SO) terms to describe genomic context
Pubmed IDs for variants cited
Output only the most severe consequence per variation.
Online or off-line mode
● Off-line recommended for large numbers of variants (download relevant cache)
Human specific annotations
● Sift - predicts whether an amino acid substitution affects protein function
● Polyphen - predicts impact of an amino acid substitution on the structure of human proteins
● 1000 genomes frequencies - global or per population
WTAC NGS Course, Hinxton 10th
April 2014
VEP VCF
VEP INFO tag:
● ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
predicted by VEP. Format:
Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Prote
in_position|Amino_acids|Codons|Existing_variation|AA_MAF|EA_MAF|DISTANCE|S
TRAND|CLIN_SIG|SYMBOL|SYMBOL_SOURCE|SIFT|PolyPhen|AFR_MAF|AMR_
MAF|ASN_MAF|EUR_MAF">
Example
● CSQ=T|ENSG00000238962|ENST00000458792|Transcript|upstream_gene_variant|
|||||rs72779452|||3789|-1||RNU7-176P|HGNC|||0.02|0.10|0.07|0.17,
T|ENSG00000143870|ENST00000404824|Transcript|synonymous_variant|474|102|
34|A|gcC/gcA|rs72779452||||-1||PDIA6|HGNC|||0.02|0.10|0.07|0.17,
T|ENSG00000143870|ENST00000381611|Transcript|5_prime_UTR_variant|264|||||r
s72779452||||-1||PDIA6|HGNC|||0.02|0.10|0.07|0.17
WTAC NGS Course, Hinxton 10th
April 2014
More Information
VCF
● http://bioinformatics.oxfordjournals.org/content/27/15/2156.full
● http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-
variant-call-format-version-41
VCFTools
● http://vcftools.sourceforge.net
GATK
● http://www.broadinstitute.org/gatk/
● http://www.broadinstitute.org/gatk/guide/article?id=1268
VCF Annotation
● Ensembl VEP: http://www.ensembl.org/info/docs/tools/vep/index.html
● SNPeff: http://snpeff.sourceforge.net/
● Anntools: http://anntools.sourceforge.net/
WTAC NGS Course, Hinxton 12th
April 2014
Lecture 2: Identification of SNPs, Indels, and structural
variants
➢ VCF Format
➢ SNP/indel Identification
➢ Structural Variation
WTAC NGS Course, Hinxton 12th
April 2014
SNP Identification
SNP - single nucleotide polymorphisms
● Examine the bases aligned to position and look for differences
SNP discovery vs genotyping
● Finding new variant sites
● Determining the genotype at a set of already known sites
Factors to consider when calling SNPs
● Base call qualities of each supporting base
● Proximity to
○ Small indel
○ Homopolymer run (>4-5bp for 454 and >10bp for illumina)
● Mapping qualities of the reads supporting the SNP
○ Low mapping qualities indicates repetitive sequence
● Read length
○ Possible to align reads with high confidence to larger portion of the genome with
longer reads
● Paired reads
● Sequencing depth
WTAC NGS Course, Hinxton 12th
April 2014
Mouse SNP
WTAC NGS Course, Hinxton 12th
April 2014
Read Length vs. Uniqueness
WTAC NGS Course, Hinxton 12th
April 2014
Inaccessible Genome
WTAC NGS Course, Hinxton 12th
April 2014
Is this a real SNP?
WTAC NGS Course, Hinxton 12th
April 2014
Evaluating SNPs
Specificity vs sensitivity
● False positives vs. false negatives
Desirable to have high sensitivity and specificity
Sensitivity
● External sources of validation
Specificity
● Test a random selection of snps by another technology
● e.g. Sequenom, Sanger sequencing…
Receiver operator curves to investigate effects of varying parameters
WTAC NGS Course, Hinxton 12th
April 2014
Known Systematic Biases
Many biases can be introduced in either sample preparation, sequencing
process, computational alignment steps etc.
● Can generate false positive SNPs/indels
Potential biases
● Strand bias
● End distance bias
● Consistency across replicates/libraries
● Variant distance bias
VCF Tools
● Soft filter variants file for these biases
● Variants kept in the file - just annotated with potential bias affecting the
variant
WTAC NGS Course, Hinxton 12th
April 2014
Strand Bias
WTAC NGS Course, Hinxton 12th
April 2014
End Distance Bias
WTAC NGS Course, Hinxton 12th
April 2014
Variant Distance Bias
WTAC NGS Course, Hinxton 12th
April 2014
Reproducibility
WTAC NGS Course, Hinxton 12th
April 2014
Future of Variant Calling?
Current approaches
● Rely heavily on the supplied alignment
● Largely site based, don't examine local haplotype
Local denovo assembly based variant callers
● Calls SNP, INDEL, MNP and small SV
simultaneously
● Can removes mapping artifacts
● e.g. GATK haplotype caller
WTAC NGS Course, Hinxton 12th
April 2014
Haplotype Based Calling - GATK
WTAC NGS Course, Hinxton 12th
April 2014
Lecture 2: Identification of SNPs, Indels, and structural
variants
➢ VCF Format
➢ SNP/indel Identification
➢ Structural Variation
WTAC NGS Course, Hinxton 12th
April 2014
Genomic Structural Variation
Large DNA rearrangements (>100bp)
Frequent causes of disease
● Referred to as genomic disorders
● Mendelian diseases or complex traits such as behaviors
● E.g. increase in gene dosage due to increase in copy number
● Prevalent in cancer genomes
Many types of genomic structural variation (SV)
● Insertions, deletions, copy number changes, inversions, translocations & complex events
Comparative genomic hybridization (CGH) traditionally used to for copy number discovery
● CNVs of 1-50 kb in size have been under-ascertained
Next-gen sequencing revolutionised field of SV discovery
● Parallel sequencing of ends of large numbers of DNA fragments
● Examine alignment distance of reads to discover presence of genomic rearrangements
● Resolution down to ~100bp
WTAC NGS Course, Hinxton 12th
April 2014
Human Disease
Stankiewicz and Lupski (2010) Ann. Rev. Med.
WTAC NGS Course, Hinxton 12th
April 2014
Structural Variation
Several types of structural variations (SVs)
● Large Insertions/deletions
● Inversions
● Translocations
Read pair information used to detect these events
● Paired end sequencing of either end of DNA
fragment
● Observe deviations from the expected fragment
size
● Presence/absence of mate pairs
WTAC NGS Course, Hinxton 12th
April 2014
Structural Variation Types
WTAC NGS Course, Hinxton 10th
April 2014
Fragment Size QC
WTAC NGS Course, Hinxton 10th
April 2014
What is this?
WTAC NGS Course, Hinxton 12th
April 2014
What is this?
WTAC NGS Course, Hinxton 12th
April 2014
What is this?
WTAC NGS Course, Hinxton 12th
April 2014
Mobile Element Insertions
Transposons are segments of DNA that can move within the genome
● A minimal ‘genome’ - ability to replicate and change location
● Relics of ancient viral infections
Dominate landscape of mammalian genomes
● 38-45% of rodent and primate genomes
● Genome size proportional to number of TEs
Class 1 (RNA intermediate) and 2 (DNA intermediate)
Potent genetic mutagens
● Disrupt expression of genes
● Genome reorganisation and evolution
● Transduction of flanking sequence
Species specific families
● Human: Alu, L1, SVA
● Mouse: SINE, LINE, ERV
Many other families in other species
WTAC NGS Course, Hinxton 12th
April 2014
Human Mobile Elements
WTAC NGS Course, Hinxton 12th
April 2014
Mobile Element Insertions
WTAC NGS Course, Hinxton 12th
April 2014
Mouse Example - LookSeq
WTAC NGS Course, Hinxton 12th
April 2014
Human Alu - IGV
WTAC NGS Course, Hinxton 12th
April 2014
Detecting Mobile Element Insertions
Most algorithms for locating non-reference mobile elements operate in a similar manner
Goal: Detect all read pairs where one-end is flanking the insertion point and mate is in the
inserted sequence
Pseudo algorithm
● Read through BAM file and make list of all discordant read pairs
● Filter the reads where one end is similar to your library of mobile elements
● Remove anchor reads with low mapping quality
● Cluster the anchor reads and examine breakpoint
● Filter out any clusters close to annotated elements of the same type
WTAC NGS Course, Hinxton 12th
April 2014
1000 Genomes CEU Trio
Typical human sample ~900-1000 non-reference mobile elements
● ~800 Alu elements, ~100 L1
Why are there 44 calls private to the child?
WTAC NGS Course, Hinxton 12th
April 2014
Mobile Element Software
RetroSeq: https://github.com/tk2/RetroSeq
VariationHunter: http://compbio.cs.sfu.ca/strvar.
htm
T-LEX: http://petrov.stanford.edu/cgi-bin/Tlex.
html
Tea: http://compbio.med.harvard.edu/Tea/

Weitere ähnliche Inhalte

Was ist angesagt?

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionMinesh A. Jethva
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data PreprocessingcursoNGS
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vstQiang Kou
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGScursoNGS
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities Paolo Dametto
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 

Was ist angesagt? (20)

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools Selection
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 

Andere mochten auch

New Strategy to detect SNPs
New Strategy to detect SNPsNew Strategy to detect SNPs
New Strategy to detect SNPsMiguel Galves
 
Non-synonymous SNP ID
Non-synonymous SNP IDNon-synonymous SNP ID
Non-synonymous SNP IDcgstorer
 
Next generation sequencing for snp discovery(final)
Next generation sequencing for snp discovery(final)Next generation sequencing for snp discovery(final)
Next generation sequencing for snp discovery(final)UAS,GKVK<BANGALORE
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Thermo Fisher Scientific
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4MUBOSScz
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Data Science Thailand
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphismBipul Das
 

Andere mochten auch (8)

New Strategy to detect SNPs
New Strategy to detect SNPsNew Strategy to detect SNPs
New Strategy to detect SNPs
 
Non-synonymous SNP ID
Non-synonymous SNP IDNon-synonymous SNP ID
Non-synonymous SNP ID
 
Next generation sequencing for snp discovery(final)
Next generation sequencing for snp discovery(final)Next generation sequencing for snp discovery(final)
Next generation sequencing for snp discovery(final)
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
 
Snp
SnpSnp
Snp
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphism
 

Ähnlich wie 2014 Wellcome Trust Advances Course: NGS Course - Lecture2

The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...Human Variome Project
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...OECD Environment
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 
Browsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblBrowsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblDenise Carvalho-Silva, PhD
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1QIAGEN
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptxMalihaTanveer1
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Jonathan Eisen
 
Biological databases
Biological databasesBiological databases
Biological databasesAshfaq Ahmad
 

Ähnlich wie 2014 Wellcome Trust Advances Course: NGS Course - Lecture2 (20)

The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Browsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with EnsemblBrowsing Genes, Variation and Regulation data with Ensembl
Browsing Genes, Variation and Regulation data with Ensembl
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptx
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
20140710 3 l_paul_ercc2.0_workshop
20140710 3 l_paul_ercc2.0_workshop20140710 3 l_paul_ercc2.0_workshop
20140710 3 l_paul_ercc2.0_workshop
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Mehr von Thomas Keane

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsThomas Keane
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Thomas Keane
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesThomas Keane
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Thomas Keane
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Thomas Keane
 
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraNext generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraThomas Keane
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...Thomas Keane
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Thomas Keane
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 

Mehr von Thomas Keane (11)

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotations
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and Challenges
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...
 
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraNext generation sequencing in cloud computing era
Next generation sequencing in cloud computing era
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 

Kürzlich hochgeladen

Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 

Kürzlich hochgeladen (20)

Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 

2014 Wellcome Trust Advances Course: NGS Course - Lecture2

  • 1. WTAC NGS Course, Hinxton 12th April 2014 Lecture 2: Identification of SNPs, Indels, and structural variants Thomas Keane Sequence Variation Infrastructure Group WTSI Today's slides: ftp://ftp-mouse.sanger.ac.uk/other/tk2/WTAC-2014/Lecture2.pdf
  • 2. WTAC NGS Course, Hinxton 12th April 2014 Lecture 2: Identification of SNPs, Indels, and structural variants ➢ VCF Format ➢ SNP/indel Identification ➢ Structural Variation
  • 3. WTAC NGS Course, Hinxton 10th April 2014 VCF: Variant Call Format VCF is a standardised format for storing DNA polymorphism data ● SNPs, insertions, deletions and structural variants ● With rich annotations (e.g. context, predicted function, sequence data support) Indexed for fast data retrieval of variants from a range of positions Store variant information across many samples Record meta-data about the site ● dbSNP accession, filter status, validation status Very flexible format ● Arbitrary tags can be introduced to describe new types of variants ● No two VCF files are necessarily the same ● User extensible annotation fields supported ● Same event can be expressed in multiple ways by including different numbers ● Recommendation on VCF format website to ensure consistency
  • 4. WTAC NGS Course, Hinxton 10th April 2014 VCF Format Header section and a data section Header ● Arbitrary number of meta-data information lines ● Starting with characters ‘##’ ● Column definition line starts with single ‘#’ Mandatory columns ● Chromosome (CHROM) ● Position of the start of the variant (POS) ● Unique identifiers of the variant (ID) ● Reference allele (REF) ● Comma separated list of alternate non-reference alleles (ALT) ● Phred-scaled quality score (QUAL) ● Site filtering information (FILTER) ● User extensible annotation (INFO)
  • 5. WTAC NGS Course, Hinxton 10th April 2014 Example VCF (SNPs/indels)
  • 6. WTAC NGS Course, Hinxton 10th April 2014 VCF Trivia 1 What version of the human reference genome was used? What does the DB INFO tag stand for? What does the ALT column contain? At position 17330, what is the total depth? What is the depth for sample NA00002? At position 17330, what is the genotype of NA00002? Which position is a tri-allelic SNP site? What sort of variant is at position 1234567? What is the genotype of NA00002?
  • 7. WTAC NGS Course, Hinxton 10th April 2014 Functional Annotation VCF can store arbitrary ● INFO tags per site ● Genotype FORMAT tags Use tags to describe ● Genomic context of the variant (e.g. coding, intronic, non-coding, UTR, intergenic) ● Predicted functional consequence of the variant (e.g. synonymous/non- synonymous, protein structure change) ● Presence of the variant in other large resequencing studies Several tools for annotating a VCF ● SnpEff: http://snpeff.sourceforge.net/ ● Ensembl VEP: http://www.ensembl.org/info/docs/tools/vep/script/index.html ● FunSeq: http://funseq.gersteinlab.org/
  • 8. WTAC NGS Course, Hinxton 10th April 2014 Ensembl - VEP "VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions." Species must be included in either Ensembl OR Ensembl genomes Sequence ontology (SO) terms to describe genomic context Pubmed IDs for variants cited Output only the most severe consequence per variation. Online or off-line mode ● Off-line recommended for large numbers of variants (download relevant cache) Human specific annotations ● Sift - predicts whether an amino acid substitution affects protein function ● Polyphen - predicts impact of an amino acid substitution on the structure of human proteins ● 1000 genomes frequencies - global or per population
  • 9. WTAC NGS Course, Hinxton 10th April 2014 VEP VCF VEP INFO tag: ● ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Prote in_position|Amino_acids|Codons|Existing_variation|AA_MAF|EA_MAF|DISTANCE|S TRAND|CLIN_SIG|SYMBOL|SYMBOL_SOURCE|SIFT|PolyPhen|AFR_MAF|AMR_ MAF|ASN_MAF|EUR_MAF"> Example ● CSQ=T|ENSG00000238962|ENST00000458792|Transcript|upstream_gene_variant| |||||rs72779452|||3789|-1||RNU7-176P|HGNC|||0.02|0.10|0.07|0.17, T|ENSG00000143870|ENST00000404824|Transcript|synonymous_variant|474|102| 34|A|gcC/gcA|rs72779452||||-1||PDIA6|HGNC|||0.02|0.10|0.07|0.17, T|ENSG00000143870|ENST00000381611|Transcript|5_prime_UTR_variant|264|||||r s72779452||||-1||PDIA6|HGNC|||0.02|0.10|0.07|0.17
  • 10. WTAC NGS Course, Hinxton 10th April 2014 More Information VCF ● http://bioinformatics.oxfordjournals.org/content/27/15/2156.full ● http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf- variant-call-format-version-41 VCFTools ● http://vcftools.sourceforge.net GATK ● http://www.broadinstitute.org/gatk/ ● http://www.broadinstitute.org/gatk/guide/article?id=1268 VCF Annotation ● Ensembl VEP: http://www.ensembl.org/info/docs/tools/vep/index.html ● SNPeff: http://snpeff.sourceforge.net/ ● Anntools: http://anntools.sourceforge.net/
  • 11. WTAC NGS Course, Hinxton 12th April 2014 Lecture 2: Identification of SNPs, Indels, and structural variants ➢ VCF Format ➢ SNP/indel Identification ➢ Structural Variation
  • 12. WTAC NGS Course, Hinxton 12th April 2014 SNP Identification SNP - single nucleotide polymorphisms ● Examine the bases aligned to position and look for differences SNP discovery vs genotyping ● Finding new variant sites ● Determining the genotype at a set of already known sites Factors to consider when calling SNPs ● Base call qualities of each supporting base ● Proximity to ○ Small indel ○ Homopolymer run (>4-5bp for 454 and >10bp for illumina) ● Mapping qualities of the reads supporting the SNP ○ Low mapping qualities indicates repetitive sequence ● Read length ○ Possible to align reads with high confidence to larger portion of the genome with longer reads ● Paired reads ● Sequencing depth
  • 13. WTAC NGS Course, Hinxton 12th April 2014 Mouse SNP
  • 14. WTAC NGS Course, Hinxton 12th April 2014 Read Length vs. Uniqueness
  • 15. WTAC NGS Course, Hinxton 12th April 2014 Inaccessible Genome
  • 16. WTAC NGS Course, Hinxton 12th April 2014 Is this a real SNP?
  • 17. WTAC NGS Course, Hinxton 12th April 2014 Evaluating SNPs Specificity vs sensitivity ● False positives vs. false negatives Desirable to have high sensitivity and specificity Sensitivity ● External sources of validation Specificity ● Test a random selection of snps by another technology ● e.g. Sequenom, Sanger sequencing… Receiver operator curves to investigate effects of varying parameters
  • 18. WTAC NGS Course, Hinxton 12th April 2014 Known Systematic Biases Many biases can be introduced in either sample preparation, sequencing process, computational alignment steps etc. ● Can generate false positive SNPs/indels Potential biases ● Strand bias ● End distance bias ● Consistency across replicates/libraries ● Variant distance bias VCF Tools ● Soft filter variants file for these biases ● Variants kept in the file - just annotated with potential bias affecting the variant
  • 19. WTAC NGS Course, Hinxton 12th April 2014 Strand Bias
  • 20. WTAC NGS Course, Hinxton 12th April 2014 End Distance Bias
  • 21. WTAC NGS Course, Hinxton 12th April 2014 Variant Distance Bias
  • 22. WTAC NGS Course, Hinxton 12th April 2014 Reproducibility
  • 23. WTAC NGS Course, Hinxton 12th April 2014 Future of Variant Calling? Current approaches ● Rely heavily on the supplied alignment ● Largely site based, don't examine local haplotype Local denovo assembly based variant callers ● Calls SNP, INDEL, MNP and small SV simultaneously ● Can removes mapping artifacts ● e.g. GATK haplotype caller
  • 24. WTAC NGS Course, Hinxton 12th April 2014 Haplotype Based Calling - GATK
  • 25. WTAC NGS Course, Hinxton 12th April 2014 Lecture 2: Identification of SNPs, Indels, and structural variants ➢ VCF Format ➢ SNP/indel Identification ➢ Structural Variation
  • 26. WTAC NGS Course, Hinxton 12th April 2014 Genomic Structural Variation Large DNA rearrangements (>100bp) Frequent causes of disease ● Referred to as genomic disorders ● Mendelian diseases or complex traits such as behaviors ● E.g. increase in gene dosage due to increase in copy number ● Prevalent in cancer genomes Many types of genomic structural variation (SV) ● Insertions, deletions, copy number changes, inversions, translocations & complex events Comparative genomic hybridization (CGH) traditionally used to for copy number discovery ● CNVs of 1-50 kb in size have been under-ascertained Next-gen sequencing revolutionised field of SV discovery ● Parallel sequencing of ends of large numbers of DNA fragments ● Examine alignment distance of reads to discover presence of genomic rearrangements ● Resolution down to ~100bp
  • 27. WTAC NGS Course, Hinxton 12th April 2014 Human Disease Stankiewicz and Lupski (2010) Ann. Rev. Med.
  • 28. WTAC NGS Course, Hinxton 12th April 2014 Structural Variation Several types of structural variations (SVs) ● Large Insertions/deletions ● Inversions ● Translocations Read pair information used to detect these events ● Paired end sequencing of either end of DNA fragment ● Observe deviations from the expected fragment size ● Presence/absence of mate pairs
  • 29. WTAC NGS Course, Hinxton 12th April 2014 Structural Variation Types
  • 30. WTAC NGS Course, Hinxton 10th April 2014 Fragment Size QC
  • 31. WTAC NGS Course, Hinxton 10th April 2014 What is this?
  • 32. WTAC NGS Course, Hinxton 12th April 2014 What is this?
  • 33. WTAC NGS Course, Hinxton 12th April 2014 What is this?
  • 34. WTAC NGS Course, Hinxton 12th April 2014 Mobile Element Insertions Transposons are segments of DNA that can move within the genome ● A minimal ‘genome’ - ability to replicate and change location ● Relics of ancient viral infections Dominate landscape of mammalian genomes ● 38-45% of rodent and primate genomes ● Genome size proportional to number of TEs Class 1 (RNA intermediate) and 2 (DNA intermediate) Potent genetic mutagens ● Disrupt expression of genes ● Genome reorganisation and evolution ● Transduction of flanking sequence Species specific families ● Human: Alu, L1, SVA ● Mouse: SINE, LINE, ERV Many other families in other species
  • 35. WTAC NGS Course, Hinxton 12th April 2014 Human Mobile Elements
  • 36. WTAC NGS Course, Hinxton 12th April 2014 Mobile Element Insertions
  • 37. WTAC NGS Course, Hinxton 12th April 2014 Mouse Example - LookSeq
  • 38. WTAC NGS Course, Hinxton 12th April 2014 Human Alu - IGV
  • 39. WTAC NGS Course, Hinxton 12th April 2014 Detecting Mobile Element Insertions Most algorithms for locating non-reference mobile elements operate in a similar manner Goal: Detect all read pairs where one-end is flanking the insertion point and mate is in the inserted sequence Pseudo algorithm ● Read through BAM file and make list of all discordant read pairs ● Filter the reads where one end is similar to your library of mobile elements ● Remove anchor reads with low mapping quality ● Cluster the anchor reads and examine breakpoint ● Filter out any clusters close to annotated elements of the same type
  • 40. WTAC NGS Course, Hinxton 12th April 2014 1000 Genomes CEU Trio Typical human sample ~900-1000 non-reference mobile elements ● ~800 Alu elements, ~100 L1 Why are there 44 calls private to the child?
  • 41. WTAC NGS Course, Hinxton 12th April 2014 Mobile Element Software RetroSeq: https://github.com/tk2/RetroSeq VariationHunter: http://compbio.cs.sfu.ca/strvar. htm T-LEX: http://petrov.stanford.edu/cgi-bin/Tlex. html Tea: http://compbio.med.harvard.edu/Tea/