2. The Problem
⢠NGS based assays have the potential to detect a nearly infinite amount of
different variants; hard to validate
⢠At the same time, the ability to identify a subset of variants may be
compromised in a given sample/run; hard to detect
⢠Monitoring more targets in a reference material should improve the ability to
monitor overall performance
⢠At the same time, monitoring more targets makes using the traditional
Westgard rules difficult
(e.g., false positive outliers when there are 100 targets)
4. The General Approach
⢠Use many runs in order to determine the
expected variant frequencies, thenâŚ
⢠âŚuse many runs in order to assess the
performance of each variant
⢠âŚuse many variants in order to assess the
performance of each run
⢠You know the amount of input material
(complexity)
⢠You know the depth of coverage
⢠You can derive the expected performance
(beta-binomial)
⢠A typical PGM run with 10 ng of input and
capped at 2,000 reads is expected to
perform as if there are ~1,200 reads
⢠The method uses deviations from the
expected variant frequencies in order to
assess performance vs. expected
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
11%
12%
13%
14%
15%
16%
17%
18%
19%
20%
0 1 2 3 4 5 6
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
11%
12%
13%
14%
15%
16%
17%
18%
19%
20%
0 10 20 30 40 50 60
5. The Basic Equations
⢠The upper equation uses established
average observed variant frequencies
⢠The lower equation can be used with
heterozygous SNPs â if you are sure that
they are heterozygous (not triploid, etc.) and
if you are sure that they average 50%
⢠dexp,i is the expected depth of the variant
based on the amount of input material and
the sequencing depth around that variant
⢠xi is the observed variant frequency
⢠Invert dexp/deff in order to obtain deff/dexp,
which is expected to be ⤠1
⢠The more independent variants there are,
the tighter the confidence intervals
đ đđĽđ
đ đđđ
=
1
đ â 1
Ă
đ đđĽđ,đ Ă đĽđ â đĽ 2
đĽ Ă 1 â đĽ
đ
đ=1
đ đđĽđ
đ đđđ
=
4
đ
Ă đ đđĽđ,đ đĽđ â 0.5 2
đ
đ=1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Variants
VCF Variant Frequency
6. Variant Callers can add Variance
compare a search of FASTQ files for known variants to those reported by a pipeline
error bars are 95% confidence intervals (relatively wide due to n=14)
the variant caller(s) add additional variance (FASTQ generally > VCF)
10
100
1000
10000
10 100 1000 10000
VCFEffectiveDepth
FASTQ Effective Depth
AF10 AF20
0.1
1
10
0.1 1 10
VCFEffectiveDepth/ExpectedDepth
FASTQ Effective Depth / Expected Depth
AF10 AF20
7. Variable Targets and a New Operator
analysis of many runs shows that 2 targets (1 and 8) appear to vary more than expected
removing those targets reveals two pairs of samples that appear to vary more than expected
one pair of samples was run by a new operator
0.01
0.1
1
10
0 5 10 15 20 25 30 35
EffectiveDepth/ExpectedDepth
Sample
0.01
0.1
1
10
0 5 10 15 20 25 30 35
EffectiveDepth/ExpectedDepth
Sample
0.01
0.1
1
10
EffectiveDepth/ExpectedDepth
0.01
0.1
1
10
EffectiveDepth/ExpectedDepth
8. Next Steps
⢠Develop performance acceptance rules for highly multiplexed NGS assays
⢠If you have data where many variants are monitored over time, it would be great
to collaborate
⢠If you have data (e.g., WGS/WES/related), try grouping heterozygous variants
into different categories and see how those categories perform
⢠It should be possible to assess the complexity of a sample using heterozygous
SNPs â however, you have to be sure that the chromosomes are diploid and not
triploid (e.g., 33.3% or 66.7% expected VF), etc.
ykonigshofer@seracare.com