1. Biomedical Annotation
Kevin Livingston, Ph.D.
Postdoctoral Fellow
Pharmacology Department, School of Medicine
University of Colorado Anschutz Medical Campus
Kevin.Livingston@ucdenver.edu
http://compbio.ucdenver.edu/Hunter_lab/Livingston
2. Biomedical researchers are interested in
understanding their data in the context of
all known background knowledge:
curated databases & literature.
2
3. Pubmed Growth Rate
1100 25
1000
y = ~e0.0405x
900 R² = 0.99
20
800
New Entries (thousands)
Total Entries (millions)
700
15
600
500 y = ~e0.0402x
R² = 0.94 10
400
300
200 5
100
2 journal
0 0
articles
per
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
973,499 PubMed entries in 2011 (>2,600 per day) minute!
3
4. Biomedical Data Sources
Total Manual GO
Annotations:
1,116,848
1,380 Total GO
Database Annotations:
s in 2012 132,425,702
PubMed Articles
Referenced:
94,518
4
5. Annotation Consumers?
• The linguistic community typically uses
annotation as training data or for specific tasks
– An abundance of tools that can produce annotations
in the specific format of those resources
– Tools for computational linguistics
• Biomedical annotation typically used for
curating, indexing, or enrichment analysis
• But what about re-using annotations and tools in
other contexts and for other purposes?
5
11. CRAFT:
Colorado Richly Annotated Full Text corpus
http://bionlp-
corpora.sourceforge.net/CRAFT/
• 67 full text articles (+30 more reserved for future testing)
• >560,000 Tokens
• >21,000 Sentences
• ~100,000 concept annotations to
7 different biomedical ontologies/terminologies
• Penn Treebank markup for each sentence
• Multiple output formats available
11
12. CRAFT Annotation
hemopoiesis has agent results in regulation by transcription
entity that has function corepressor
activity
biological
binding regulation transcription
results in
protein coactivator
results in regulation by activity
interaction of
regulates
DNA protein transcription
Hematopoiesis is precisely orchestrated by lineage-specific
DNA-binding proteins that regulate transcription
in concert with coactivators and corepressors.
GO GO
CHEBI SO relation 12
BP MF
14. Compositional Annotation
& Knowledge
vertebrate
pigmentatio
n
occurs_in denotes subClassOf
text annotation 3
TAXON:7742 GO:0043474
basedO basedO
Vertebrata pigmentation
n n
hasBody hasBody
CRAFT
PMID:1473718 text annotation 1 text annotation 2
3 hasTarget hasTarget
14
15. Summary
• Model that covers syntactic and semantic
annotation
– Linguistic annotation
– Semantic annotation
– Entity-based annotation
• Capture complex content that is not necessarily
best represented via a single URI
– Created a GraphAnnotation
that denotes a RDF named graph
• Add kiao:basedOn to enable annotation
compositions and provenance tracking
– Annotation-level
15
16. Acknowledgements
University of Colorado: • National ICT Australia
• Hunter Lab – Karin Verspoor
– Larry Hunter
– Mike Bada • Funding:
– Bill Baumgartner – NIH/NLM training grant
– Chris Roeder – Andrew W. Mellon Foundation
– Kevin Cohen
– Carsten Goerg
16
17. Biomedical Annotation
Kevin Livingston, Ph.D.
Postdoctoral Fellow
Pharmacology Department, School of Medicine
University of Colorado Anschutz Medical Campus
Kevin.Livingston@ucdenver.edu
http://compbio.ucdenver.edu/Hunter_lab/Livingston
Hinweis der Redaktion
Entity Centric
Document Centric
Rectangles are concepts we create, rounded rectangles are current ontological concepts. Orange objects are information content entities, blue objects are biomedical concepts.