1. Poster Design & Printing by Genigraphics®
- 800.790.4001
Ted Pedersen
Department of Computer Science
University of Minnesota, Duluth
tpederse@d.umn.edu
http://www.d.umn.edu/~tpederse
UMLS::Similarity is freely
available open source
software that allows a user to
measure the semantic
similarity or relatedness of
biomedical terms found in the
Unified Medical Language
Systems (UMLS). It is written
in Perl and can be used via a
command line interface, an
API, or a Web interface.
UMLS::Similarity has been modeled after and
inspired by WordNet::Similarity (and yes, we've
even used some code). But, it has evolved to a
point where it is certainly more than a clone and
has its own very distinctive identity.
The development of UMLS::Similarity was
supported in part by an RO1 grant from the
National Institutes of Health (USA), National
Library of Medicine (#1R01LM009623-01A2).
What are we measuring, and why?
Similarity Depends on IS-A hierarchy
Acknowledgments
Using UMLS::SimilarityAbstract
Contact
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts
Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota
http://umls-similarity.sourceforge.net
Unified Medical Language System
To be similar is to be alike, how much is X
like Y? Similar concepts share ancestors in
is-a hierarchy, the deeper the ancestor the
more similar
• LCS : least common subsumer
●
Tetanus and strep throat are similar, since
both are kinds of bacterial infections
The ability to organize concepts by their
similarity or relatedness to each other is a
fundamental operation in the human mind,
and to many problems in Natural Language
Processing and Artificial Intelligence
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts
Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota
http://umls-similarity.sourceforge.net
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts
Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota
http://umls-similarity.sourceforge.net
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical Concepts
Bridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesota
http://umls-similarity.sourceforge.net
Relatedness Relies on Definitions
Assign a numeric value that quantifies how
similar or related two concepts or senses
are, not words
Cold may be temperature or illness
To be related is much more general, since
there are many ways to be related is-a,
part-of, treats, symptom-of, ...
●
Tetanus and deep cuts are related but
they really aren't similar (deep cuts can
cause tetanus though)
●
Related words often defined using the
same ore similar words, look for overlaps
Web Interface
• Allows for all measures to be computed
using a subset of possible sources
•http://atlas.ahc.umn.edu
•http://maraca.d.umn.edu
Command Line
• Supports all measures, all UMLS sources
plus many additional functions (many from
UMLS::Interface), examples include :
•GetChildren
•GetParents
•GetRelated
•GetSemanticGroup
•FindCuiDepth
•FindPathtoRoot
•findLeastCommonSubsumer
Semantic Similarity Measures
Path based
Shortest Path (path, cdist)
Depth based
Leacock & Chodorow (lch)
Zhong et al. (zhong)
Nguyen & Al-Mubaid (nam)
Information Content
Resnik (res)
Lin (lin)
Jiang & Conrath (jcn)
Relatedness Measures
Path Based
Hirst & St-Onge (hso)
Definition Based
Lesk (lesk)
Adapted Lesk (lesk)
Definition + Corpus
Gloss Vector (vector)
The UMLS is a date warehouse distributed by
the National Library of Medicine (twice a year)
It includes more than 100 terminologies, code
sets, and ontologies encompassing many
different areas of medical knowledge. A user
can access individual sources (examples
below) or view them as one large combined
resource via the MetaThesaurus.
MeSH – medical subject headings, used for
indexing articles in PubMed
FMA – Foundational Model of Anatomy, a
very fine grained ontology of human anatomy
OMIM – Online Mendelian Inheritance in
Man, catalog of genes and gene disorders
SNOMEDCT – Systematized Nomenclature
of Medicine – Clinical Terms
Word Sense Disambiguation
with UMLS::SenseRelate
We can measure senses, or we can use the
measures to identify senses!
http://search.cpan/org/dist/UMLS-SenseRelate