Genomic Data Annotation: Making Sense of the Deluge

One million monkeys with typewriters
Annotations of the Genomic Data Deluge

Genome Informatics Alliance
Portland, 28/29 March 2012

Dr. Frank Schacherer, CTO, BIOBASE GmbH
frank.schacherer@biobase-international.com

Disclaimer: no actual monkeys
involved
In 2003 the Arts Council for England
paid £2,000 for a real-life test of the
theorem involving six Sulawesi crested
macaques, but the trial was abandoned
after a month.

AT C
G
G AT TT The monkeys produced five pages of
TT A text, mainly composed of the letter S,
C
GTA CG but failed to type anything close to a
CGC word of English, broke the computer
G
G TA C and used the keyboard as a lavatory.
A
ATA
C
TTG A A http://www.telegraph.co.uk/technology/news/8789
C
TG G 894/Monkeys-at-typewriters-close-to-reproducing-
C
CGT AT Shakespeare.html
T

Agenda

• What annotation do we need?
• How can we get it?

A deluge of data
• deluge (plural deluges)
– A great flood or rain.
The deluge continued for hours,
drenching the land and slowing traffic
to a halt.
– An overwhelming amount of
something.
The rock concert was a deluge of
sound.

Media perception
Science 2011

The Power Of Digitizing
Health Affairs 2009
Human Beings
17 Feb 2012
Soon, $1,000 Will Cost of Gene Sequencing
Map Your Genes Falls, Raising Hopes for
10 Jan 2012 Medical Advances
'Personalized Medicine'
7 March 2012
Hits a Bump / March 2012

Life cycle of data annotation

Understan Derive
dMap Analyze
Annotate Publish
Rank Curate

How to predict mutation effects
• Overlap with other data
– dbSNP, 1000 genomes
– Relatives and Controls
• Algorithmically
– Frameshift, Nonsense, Stop
gain/loss, Non-synonymous
changes (SIFT, PolyPhen, ...)
• Based on annotation
– known functional regions
(active sites, binding sites, ...)
• Directly known effects
– HGMD

Bioinformatics, Vol. 26 no. 16 2010, pages 2069; 10.1093/bioinformatics/btq330

Associating Genotype with Phenotype

http://www.gen2phen.org/

What data do we need for clinical
application

ACCE takes its name from the four main criteria for evaluating a genetic test —
analytic validity, clinical validity, clinical utility and associated
ethical, legal and social implications
Centers for Disease Control and PreventionOffice of Public Health Genomics (OPHG)

Ideal Annotation for clinical use?
• Variants N=12
– Pathogenic, Uncertain, Benign 4 Testing
(Clinical Validity,Who/When, Methods,
– Severities, if known
Interpretation, Cost)
– Ethnicities/Frequencies 4 Management,
– Number of cases Clinical Significance, Implications
– Symptoms In conjunction with 3 Actionability, Clinical Utility
other mutations 3 Clinical manifestations
• ( Pathophysiology, Phenotype, Prognosis,
Evidences
Severity, Penetrance,
– Not weighted equally Pleiotropy)
– Risks of incorrect classification 2 Frequency
not equal between genes (especially indicate most common variants)
2 Inheritance and
Data from: Howard P. Levy, MD, PhD
Johns Hopkins University
de novo mutation rate
2 Evidence-based
Data from: Elaine Lyon, Ph.D.,
FACMG University of Utah & 1 Clinical Decision Support in EHR
ARUP Laboratories

Who provides annotation?

Payor Test Lab Curator Researcher

Patient MD/Geneticist Anybody Computer

Surveys & Patient Self-annotation

nature biotechnology VOLUME 29 NUMBER 5 MAY 2011
Knaus, William A.
BUILDING A GENOME Patients with serious diseases may experiment with drugs that have
ENABLED ELECTRONIC not received regulatory approval. Online patient communities
MEDICAL RECORD structured around quantitative outcome data have the potential to
provide an observational environment to monitor such drug usage
and its consequences. Here we describe an analysis of data
reported on the website PatientsLikeMe by patients with amyotrophic
lateral sclerosis (ALS) who experimented with lithium carbonate

DNA Variant Databases

Data, except for HGMD and DMuDB courtesy of P. Willems, Mutabase

Testing Lab data

A safe and secure route for sharing variant data
The Diagnostic Mutation Database (DMuDB) is a unique repository of high
quality variant data collected from accredited clinical genetic testing
laboratories in the UK National Health Service (NHS).
It provides a safe and secure way for variant data to be shared within and
between laboratories in order to support safer, more consistent
diagnoses. The database was established in order to address the lack of
data-sharing or publication in the genetic testing community.
DMuDB is used regularly by genetic scientists:
• to check a new variant against existing reported variants from
other laboratories
• to check for co-reported variants
• as a part of regular re-assessment of unclassified variants
• via the Universal Browser as part of complex searches
covering multiple databases

www.ngrl.org.uk/Manchester

LSDBs (Locus Specific Databases)

http://www.hgvs.org/dblist/glsdb.html

Crowdsourcing genome annotation

Crowdsourcing reality

…biological databases can be
“The future of curated by a diffuse network of
biocuration
To thrive, the field that
volunteers? This is certainly not the
links biologists and their case and at the core of every
data urgently needs successful wiki database are a group
structure, recognition
and support. “
of dedicated experts who do the bulk
NATURE|Vol 455|2008 of the data curation.

Data Annotation Professionals
• Clear incentives
• Background in life sciences (MSc/PhD)
• Curation is sole focus of work
• Knowledge of standards, databases, formats,
specialized tools

Huge volumes of primary data are currently
archived in numerous open-access databases, and
with new generation technologies becoming more
common in laboratories, large datasets will become
even more prevalent than today. The lasting
archiving, accurate curation, efficient analysis and
precise interpretation of all of these data are a
challenge. Collectively, database development and
biocuration are at the forefront of the endeavor to
make sense of this mounting deluge of data.

HGMD - comprehensive disease-
causing germline

Cleaning up the literature

Charts from: Jonathan S. Berg, U North Carolina, Chapel Hill

Conclusions on annotation

• Clinical-grade annotation may be the most
important task ahead
• NGS itself contributes to generate evidence
• Many different sources and ways of annotation
exist
• Human, specialist annotation remains essential
(monkeys nonwithstanding)

• BIOBASE Employees all around the world
• David Cooper, University of Cardiff

Thank you!
• Andrew Deveraux, NGRL
• Patrick Willems, MutaBase
• Johan den Dunnen, HVP & Leiden University Medical Center
• Anthony J. Brooks, GEN2PHEN & University of Leicester
• Samir K. Brahmachari , OSDD

Gene Regulation Analysis Human Mutation & Functional Analysis
Variant Analysis

sales@biobase-international.com
www.biobase-international.com

Genomic Data Annotation: Making Sense of the Deluge

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (12)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie Genomic Data Annotation: Making Sense of the Deluge

Ähnlich wie Genomic Data Annotation: Making Sense of the Deluge (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Genomic Data Annotation: Making Sense of the Deluge

Hinweis der Redaktion