How to Troubleshoot Apps for the Modern Connected Worker
Mar2013 Reference Material Selection Working Group
1. Genome
in
a
Bo*le
Working
Group
Reference
Material
(RM)
Selec:on
and
Design
…
to
tell
the
truth
and
nothing
but
…
XGEN
Congress
March
21,
2013
Andrew
Grupe,
PhD
2. Scope
of
Reference
Material
Discussion
• Human
Genome
&
Tumor
Sequencing
• Variant
Types
– SNP
– InDel
/
Subs:tu:on
– CNV
– Structural
variant
| 2
3. Reference
Material
Needed
For
• Clinical
plaVorm
valida:on
– Sequencing
System
– Bioinforma:cs/Analysis
Pipeline
• Clinical
test
development
and
valida:on
– Whole
genome
– Targeted
– Germline
vs.
tumor
• Research
– Process
development
and
QC
• Product
development
– Sequencing
Systems
– SoYware
development
| 3
5. NY
State
Guidelines
–
Oncology
NGS
Minimum
Data
Requirement
-‐
Valida:on
• Accuracy:
Sequence
a
well-‐characterized
reference
sample
(e.g.
HapMap
DNA
GM12878)
to
determine
error
rate
across
all
amplicons.
• AnalyFcal
sensiFvity:
Establish
the
analy:cal
sensi:vity
of
the
assay
by
interroga:ng
all
variants
in
the
3
amplicons
with
the
consistently
poorest
coverage,
and
all
variants
in
3
amplicons
with
consistently
good
coverage.
This
can
iniFally
be
established
with
defined
mixtures
of
cell
line
DNAs
(not
plasmids),
but
needs
to
be
verified
with
3-‐5
pa:ent
samples.
• AnalyFcal
specificity:
Establish
the
analy:cal
specificity
of
the
assay
by
interroga:ng
all
variants
in
the
3
amplicons
with
the
consistently
poorest
coverage,
and
all
variants
in
3
amplicons
with
consistently
good
coverage.
This
can
iniFally
be
established
with
defined
mixtures
of
cell
line
DNAs
(not
plasmids),
but
needs
to
be
verified
with
3-‐5
pa:ent
samples.
| 5
6. Accredita:on
-‐
College
of
American
Pathologists
(CAP)
NGS
Requirements
• Valida:ons
must
include
informa:on
on
the
analy:cal
target
(examples,
exons,
genes,
exomes,
genomes,
and
transcriptomes).
The
ability
of
the
analy:cal
process
to
sequence
the
target
(e.g.
percentage
of
target
adequately
sequenced)
must
be
described.
• Valida:ons
must
determine
and
document
analy:cal
sensi:vity,
specificity,
reproducibility,
repeatability
and
precision
for
the
types
of
variants
assayed
(e.g.
single
nucleo:de
variants,
inser:ons
and
dele:ons,
homopolymer
or
repe::ve
sequences).
| 6
7. Associa:on
for
Molecular
Pathology
Comments
to
FDA
UHT-‐Sequencing
Mee:ng,
June
2011
• …
Performance
of
and
coverage
needs
for
a
given
plaVorm
are
likely
to
differ
depending
on
the
nucleic
acid
and
DNA
regions
analyzed,
the
variants
interrogated,
the
rela:ve
allele
propor:ons
of
par:cular
variants,
…
Evalua:on
should
consider
the
effects
of
rela:ve
GC
content,
homopolymeric
and
other
regions
of
repe::ve
sequence,
homologous
gene
regions
and
DNA
structural
variants,
…
This
necessitates
flexibility
and
individualiza:on
in
the
development
of
valida:on
protocols,
guidelines,
and
controls
on
a
(clinical)
applica:on-‐by-‐applica:on
basis.
…
• Assay
controls
should
include
a
range
of
variants,
…
Process
controls
like
NA12876
[sic]
…
and
the
synthe:c
ERCC
RNA
transcripts
from
NIST
are
examples
of
potenFal
standard
reference
materials.
…
| 7
8. Main
Mee:ngs
–
Reference
Materials
(RMs)
• April
13,
2012
(NIST)
– Genome
in
a
Bo*le
consor:um
ini:a:on
• August
16,
2012
(NIST)
– Intended
uses
of
RMs
– RM
selec:on
strategies
• November
7,
2012
(ASHG)
– Status
updates
• December
6,
2012
– Selec:on
of
ini:al
RMs
• March
21,
2013
(XGEN
Congress)
| 8
10. Discussion
Topics
For
Human
Genome
Sequencing:
• What
sources
of
RMs
to
consider
– Primary
sample
/
cell
line
• Consent
– Available
for
research
and
for
profit
use
• What
extent
of
prior
characteriza:on
• Which
ethnici:es,
genders
• Which
muta:ons
need
to
be
present
– Is
medical
relevance
necessary
• Ini:ally
to
have
– ONE
characterized
genome
RM
-‐
or
– Mul:ple
genomes,
lower
level
of
characteriza:on
• Source
of
commercial
development
and
distribu:on
– Manufactured
under
quality
system
for
diagnos:c
applica:ons
| 10
11. Reference
Material
–
Intended
Uses
• Characterize
PlaVorms
&
Methods
– DNA
sequencing
– Exis:ng
&
upcoming
NGS
technologies
– Research
applica:ons
– Clinical
diagnos:cs
applica:ons
• Not
intended
as
reference
material
for
– Valida:on
of
specific
muta:ons
in
a
panel
| 11
12. Desired
RM
Sample
Characteris:cs
• General
Considera:ons
– Sample
characteris:cs
are
more
important
than
selec:on
of
specific
sample
IDs
– More
reference
samples
preferred
over
fewer
samples
• E.g.
prefer
8
fully
characterized
samples
at
high
depth
and
corresponding
trios
at
lower
depth
over
4
fully
characterized
samples
plus
trios
| 12
13. Desired
RM
Sample
Characteris:cs
(cont.)
• High
Priority
– Mul:ple
ethnici:es
• Diversity
in
structural
varia:on
to
stress
systems
• However,
no
requirement
for
representa:ves
from
every
ethnic
group
– Balanced
female
to
male
ra:o
– Cell
lines,
low
passage
• Replenish
supply
Targeted
Ethnic
Distribu:on
2
European-‐ancestry:
northern/western
&
southern/eastern
2
African-‐American:
AA
&
African,
or
two
AA
from
different
parts
of
the
US
2
La:no:
different
ancestral
places,
US
or
South/Central
America
1
East
Asian
1
South
Asian
| 13
14. Desired
RM
Sample
Characteris:cs
(cont.)
• Nice
to
have
– Interracial
marriage
samples
• Controlled
admixture
• Haplotypes
• Less
cri:cal
– Phenotypic
characteriza:on
• Reference
material
not
for
discovery
– Access
to
RNA
or
:ssues
• No
limitless
supply
of
material
with
iden:cal
characteris:cs
| 14
15. Other
RM
Considera:ons
• DNA
from
low
passage
cell
lines
– Understand
propaga:on
of
variants
through
cell
line
passaging
• Modify
DNA
purifica:on
in
future
to
keep
step
with
new
NGS
technologies
– Current
purified
DNA
fragment
sizes
are
80-‐100kb
• OK
for
exis:ng
technologies
– New
nanopore
technologies
may
need
Mbp
fragments
• Agarose
embedding
is
proven
extrac:on
technology
• Consider
footprint
analysis
of
all
batches
prior
to
distribu:on
– Iden:fy
gene:c
driY,
mix
ups,
….
,
develop
benchmarks
• Reference
material
that
mimics
tumor
sample
characteris:cs
– FFPE
embedded
cells?
• Blood
or
saliva
as
primary
(not
cell
line)
DNA
sources
| 15
16. RM
Sample
Source
Sugges:ons
Most
support
• NA12878
– Large
HapMap
family,
well
characterized
– NIST
contracted
Coriell
for
DNA
batch
• Personal
Genome
Project
Samples
– Includes
trios
– Use
sequence
data
to
derive
admixture
– h*p://www.personalgenomes.org
– Consent
includes
research
use,
commercial
use
and
re-‐iden:fica:on
| 16
17. RM
Sample
Source
Sugges:ons
(cont.)
Some
support
(if
consent
sufficient)
• HS1011
– Charcot
Marie
Tooth
cell
line
• Lupski
et
al,
NEJM
2010
• MCF10A
– Normal
breast
• Used
by
Horizon
Dx
to
produce
isogenic
cell
lines
with
cancer
relevant
muta:ons
Other
• African
American
sample
with
70%
sanger
sequence
– No
cell
line
available
– Subject
s:ll
alive
=>
re-‐consent
&
generate
cell
line?
• huRef
sample
| 17
18. HapMap
NA12878
An
Obvious
Choice?
• Mul:tude
of
public
and
proprietary
datasets
• Cell
line
and
DNA
available
from
Coriell
• Listed
in
guidelines
as
poten:al
reference
sample
for
clinical
tests
| 18
19. HapMap
NA12878
Consent
• Consent
available
for
– Research
use
HOWEVER
….
• Consent
does
not
include
– Some
commercial
uses
• Incl.
altera:ons,
re-‐distribu:on
– Re-‐iden:fica:on
through
sequence
data
• Op:on
to
withdraw
data
and
materials
http://hapmap.ncbi.nlm.nih.gov/downloads/elsi/CEPH_Reconsent_Form.pdf
http://genomeinabottle.org/forum-topic/what-appropriate-informed-consent-
reference-materials-genome-bottle-consortium
| 19
20. HapMap
NA12878
Status
as
RM
• NIST
expects
first
batch
of
DNA
from
Coriell
in
mid
April
• Legal
and
IRB
review
at
NIST
for
NA12878
release
• Start
to
develop
bioinforma:cs
methods
based
on
NA12878
data
– Have
bioinforma:cs
tools
when
other
samples
are
available
8,000 aliquots of 10ug each on order by NIST
from Coriell | 20
21. Personal
Genome
Project
(PGP)
Samples
• Consent
– Research
and
commercial
use
– Possibility
of
re-‐iden:fica:on,
including
through
sequence
– Op:on
to
withdraw
at
any
point
• Data
removal
and
destruc:on
of
material
www.personalgenomes.org/consent/PGP_Consent_Approved_02212012.pdf
• Sample
availability
– Ongoing
enrollment
– Limited
collec:on
of
ethnically
diverse
trios
h*p://blog.personalgenomes.org/2012/11/29/seeking-‐diversity/
| 21
22. RM:
Selected
3
PGP
Trios
Available
at
Coriell
• Ashkenazim
Jewish
trio,
East
European
ancestry
– Parents,
Son
– huAA53EO
/
hu8E87A9
/
hu6E4515
Not
yet
available
at
Coriell
• East
Asian
trio
– Parents,
Son
– hu91BD69
/
hu38168C
/
huCA017E
• Caucasian
quartet
– Parents,
2
monozygo:c
twin
daughters
– huCDC3B8
/
huFE01E1
/
hu1E8957
/
hu961968
| 22
23. PGP
Info
-‐
hu8E87A9
(abbreviated)
https://my.personalgenomes.org/profile/hu8E87A9 | 23
24. Coriell
Info
-‐
hu8E87A9
(abbreviated)
| 24
http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=hu8E87A9
25. Summary
• Defined
required
RM
characteris:cs
• Ini:al
set
of
RM
samples
selected
– NA12878
• Many
exis:ng
public
and
proprietary
datasets
• Listed
in
clinical
guidelines
to
establish
valida:on
parameters
• Consent
limita:ons
– Commercial
use,
re-‐iden:fica:on
through
sequence
•
Under
legal
and
IRB
review
by
NIST
– Three
PGP
trios
• One
trio
already
available
at
Coriell
• Consent
without
withdrawal
op:on
may
not
meet
ethical
review
standards
| 25
26. Contact
Informa:on
Genome
in
a
Bo*le:
h*p://genomeinabo*le.org
Jus:n
Zook:
jus:n.zook@nist.gov
Marc
Salit:
salit@nist.gov
Andrew
Grupe:
andrew.grupe@celera.com
| 26
28. HapMap
Re-‐Consent
What will happen if I don’t agree to let my sample be used?
You will not lose any benefits if you choose not to let your sample be used. If
you don’t agree to let your sample be used, it will not be used for the HapMap.
However, it will continue to be used for other IRB approved research studies,
just as it has been in the past, unless you specifically tell us that you don’t want
it used for such studies anymore.
Can I change my mind after I agree to let my sample be used?
Deciding whether to let your sample be used for the HapMap is completely up
to you. You will not lose any benefits if you choose not to let your sample be
used. However, once your sample has been studied and your genetic
information has been put in the database, you will not be able to take that
information back.
| 28
29. HapMap
Re-‐Consent
The Repository does not let anyone sell material from samples or cell lines.
However, information from genetics research sometimes helps companies
make products to diagnose or treat diseases. If information from your family’s
cell lines leads to making a product, it would probably contribute only in a very
small way. Also, because the cell lines will not have names on them, neither the
researchers nor anyone at the Repository would know if your samples were
even used. So you will not get any additional payment for having your sample
used in this project.
| 29
30. HapMap
Re-‐Consent
… The database will not include any medical information about anyone whose sample is
used. It also will not include any information that could identify who the individual people
or families are. …
Because the database will be public, people who do identity testing, such as for paternity
testing or law enforcement, may also use the samples, the database, and the HapMap, to
do general research. However, it will be very hard for anyone to learn anything about you
personally from any of this research because none of the samples, the database, or the
HapMap will include your name or any other information that could identify you or your
family.
What are the risks of having my sample used for this project?
If your family’s samples are used, lots of genetic information from your samples will be put
in the database, and lots of people will be able to look at it for any purpose. However,
there are only a couple of ways anybody could trace the information back to you. One is if
they thought your information might be
in the database, got another sample from you, did many tests on that sample, and then
compared the genetic information from those tests with the information in the database.
The other is if somebody compared the information in the database with genetic
information known to be from you that was in another database and figured out who you
were. The risk of either of these things happening is very small, but it may grow in the
future.
We cannot always predict the results of research, so new risks to you may come up in the
future that we can’t predict now. | 30