2. Aim
• They want to developed a SNP calling
method for Illumina platform.
• Consider the data quality, alignment and
experimental error common to this
platform.
3. Applications of NGS
• From whole genome sequence to know the
gene variations between individuals.
• Disease
• Drug
• Environment
4. Workflow
Sequencing
reads
Map reads onto
reference genome
Prior probability
of each
genotype
Recalibrate sequencing
quality score
Calculate likelihood of
each genotype
Inferred
genotype via
Bayes theorem
5. Traditional Method
• Phred score is a universal standard.
• Compare the sample sequence with
reference genome and filter low score
mismatch.
• A method to detect heterozygous
polymorphisms.
6. Prior Probability
• According to existing researches
• The estimated SNP rate between two
human haploid chromosome is about
0.001. (Sachidanandam et al. 2001).
• Human reference genome sequence has
an error rate of 0.00001. (Collins et al.
2004)
Set the homozygous SNP at 0.0005, and the
hetrozygous rate is 0.001.
7. Prior Probability
• According to a previous study on dbSNP,
transitions are four times more frequent
than transversions among the substitution
mutations. (Zhao and Boerwinkle 2002)
9. Recalibration
• 3’ -end of reads have a much higher error
rate than earlier cycles.
• Original quality score can’t represent the
true error rate.
• Check the mismatch in dbSNP.
10. Recalibration
• Illumina uses two lasers.
• A and C use the same laser, G and T use
another.
• A-C and G-T substitution were 58%-72%
overestimated.
• Duplicate reads
• Penalty for these reads.
11. Likelihood Calculation
• Observed allele type
• Quality score
• Sequencing cycle
• Observation of the same allele from
reads with the same mapping location.
12. Evaluation
• Comparison of the consensus sequence
with Illumina human 1M BeadChip
genotyped alleles from the same DNA
sample showed genotyped alleles on the X
chromosome and autosomes were
covered at 99.97% and 99.84% consistency,
respectively.