Whole genome sequencing has become popular due to decreasing costs and time required. It sequences the entire human genome, about 80% compared to 1% for exome sequencing. Technologies like Illumina and Ion Torrent are used. Bioinformatics analysis of whole genome data identifies single nucleotide polymorphisms (SNPs), insertions/deletions (indels), copy number variations (CNVs), and structural variations (SVs) which can provide insights into disease. Public databases and manual annotation are used to understand the impact of these variations.
5. Whole Genome Analysis
Microarray Exome Whole Genome
Only known SNPs Only the coding regions The complete DNA
(~ 900 000) of the genome sequences
Up to 0.0003 % of the ~ 1 % of the human ~ 80 % of the human
human genome genome genome
6. Whole Genome Analysis
What are the technologies involved ?
•
Illumina (Solexa)
•
ABI SOLiD
•
Ion Proton (2013)
7. Whole Genome Analysis
Illumina (Solexa) :
•
cluster generation by bridge amplification
•
sequencing by synthesis
18. Whole Genome Analysis
SNPs : easy to detect
1. Map sequenced reads to reference sequence
2. See how the 'consensus sequence' differs from the reference
3. 1-base difference between consensus sequence
and reference = SNP
19. Whole Genome Analysis
SNPs: impact?
•
Public databases:
•
DbSNP (ncbi)
•
OMIM (Johns Hopkins University School of Medicine)
•
SIFT (Craig Venter Institute)
•
…
•
'Manual' annotation
21. Whole Genome Analysis
Insertion or deletion of a sequence of DNA of arbitrary length
22. Whole Genome Analysis
Indels detection
Small Indels :
•
Detection of small gaps in the alignment
•
Combination of the gapped alignments based on proximity
•
Filtering (read pos., coverage, quality)
Large Indels :
•
Use of the reads pairing info (Illumina and SOLiD only)
23. Whole Genome Analysis
Indels: impact?
•
Public databases:
•
dbSNP for small indels (ncbi)
•
dbVar for large indels (ncbi)
•
'Manual' annotation