Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts
1. Exploiting Semantic Structure for Mapping
User-specified Form Terms
to SNOMED CT Concepts
Ritu Khare1,2, Yuan An1, Jiexun Li1, Il-Yeol Song1, Xiaohua Hu1
The iSchool at Drexel1
College of Medicine2
Drexel University, Philadelphia, PA, USA
2. Presentation Order
1. Motivation
2. Problems
3. Solutions
4. Evaluation
5. Final Remarks
2
3. General Motivation
Database Integration and Interoperability
Semantic Heterogeneity across clinical data sources
(Halevy, 2005, Henry et al. 1993, Hernandez et al. 2005, Wright et al., 1999)
?
MRN Med Rec # Medical Record
Number
Blood Diastolic
Pressure Systolic BP
Physical Status
Constitutional Vital Signs
Recommendation: Controlled Medical Vocabularies should
be involved in the design artifacts of the healthcare systems.
(Jean et al., 2007, Sugumaran and Storey, 2002)
3
4. Specific Motivation
Clinical Encounter Form Electronic Health Records (EHR)
The terms on the clinical forms are mapped to, or annotated
by, a standard terminology.
Domain experts may manually perform the annotation
costly and tedious
Research Objective: Design an automatic tool for mapping
4
form terms to standard terminologies.
5. 1. Motivation
2. Problem
3. Solutions
4. Evaluation
5. Final Remarks
5
6. The Mapping Problem
Clinical Encounter Form SNOMED CT
The Systematized Nomenclature of
Medicine - Clinical Terms (Intl.
Health Terminology Stds. Dev. Org)
Most comprehensive clinical
vocabulary (SNOMED CT User
Guide, 2009).
>360,000 logically-defined clinical
concepts (Hina et al., 2010,
Stenzhorn et al., 2009).
Form
Term SNOMED CT Concept
Patient 11615400: Patient
(person)
MRN
398225001: Medical
record number
6 (observable entity)
8. SNOMED CT Browsers: (Rogers and Bodenreider, 2008)
Existing Mapping Services
General Mapping
Category Specific Mapping
8
9. Challenges:
Mapping Form Terms to SNOMED CT Concepts
Diversity Challenge Context Challenge
Different clinicians - different Same Form Term - Different
terms Concepts.
MRN, Med. Rec.#
Vital signs, Constitutional,
Physical status
9
10. 1. Motivation
2. Problem
3. Solution
4. Evaluation
5. Final Remarks
10
11. Premises
The first, i.e., the most string-
The key is to identify the similar, result retrieved by the
SNOMED CT semantic category-specific mapping is
category appropriate for a usually the desired concept.
given term.
How to automatically determine the SNOMED CT Semantic
? Category appropriate for a given form term ?
11
12. The term context can be derived from the SEMANTIC STUCTURE of
1 the form.
The FORM TREE accurately captures the semantic intentions of
the designer.
Inspired by hierarchical modeling of forms (Dragut et al. 2009,
Wu et al. 2009)
12
13. The implicit relationship between
2 the term context
(i.e., the semantic structure)
and the desired semantic
category Naïve Bayes Classifier
can be formally captured into Based on the Bayes theorem
a STATISTICAL MODEL. (Han and Kamber 2006).
Procedure Class Labels (SNOMED CT
Person
root
semantic categories )
attribute, body structure,
Observable
Entity Patient Examination disorder, …
Data Attributes (local
Name Gender structure)
Respiratory
Observable Node type
Entity
Parent node type
Observable
Child node Type
Entity M F Parent Semantic Category
nl
perc. Grandparent Semantic
Finding Category
Qualifier
Value Qualifier
Value
13
14. Overall Mapping Approach
Form Tree Training Data
Node Category Semantic SNOMED
Form Structure Attributes
Classificatio Membership Category CT
Category SNOMED CT
Term Analyzer n Model Category
Probabilities Picker
Specific
Concept
Mapping
Procedure
Person
root
Observable
Entity Patient Examination
Name Gender Respiratory
Observable
Observable Entity
Entity
Novelty: Hybrid Approach
(leverages semantic structure as well as term
14 linguistics)
15. 1. Motivation
2. Problem
3. Solution
4. Evaluation
5. Final Remarks
15
16. Data Manual (Gold)
Annotations
954 (63.55%) terms
Dataset Forms Total Term Concept ID
Terms
Patien 11615400: Patient
1 Walk in clinic encounter 161 t (person)
forms (3 forms) MRN 398225001: Medical
2 Nursing patient 261 record number
admission forms (6 (observable entity)
forms) … ……………….
3 Labor & delivery DB 294
data-entry forms (7
forms) Some Unmapped Terms
4 Adult visit encounter 388
no scleral icterus
forms
(5 forms) chronic back pain
5 Child visit encounter 397 Follow up with PCP
forms
(5 forms) Sent to ER
16
26 Forms 1501
17. Implementation (JAVA) and Settings
Gold
Form Design Annotations
Interface API, provided by
the Dataline
Form Tree Training Data Software Limited
Category Semantic SNOMED
Form Structure Node Classificatio Membership Category CT
Category SNOMED CT
Term Analyzer Attributes n Model Category
Probabilities Picker
Specific
Concept
Mapping
Cross Validation
17 (leave 1 out) for
each dataset
18. Goal: To study whether…
Experiment Design semantic structure can improve mapping
performance.
SNOMED
Form CT General SNOMED CT Measures
Term Mapping Concept
Precision # correct annotations/#
Baseline (linguistics annotations
only) Recall # correct annotations/# gold
annotations
Category Semantic SNOMED
Form Structure Node Classificatio Membership Category CT SNOMED CT
Category
Term Analyzer Attributes n Model Category
Probabilities Picker
Specific Concept
Mapping
Hybrid (linguistics + semantic
structure)
Category Category
Semantic SNOMED
Form Structure Node Classificatio Membership Category CT SNOMED CT
Picker
Term Analyzer Attributes n Model +candidate Category
Probabilities Specific Concept
set
expansion Mapping
18 Hybrid++
19. Mapping Duration
Results /form = 1- 11 s
Baseline Recall low:
Precision: 0.63, Recall: 0.45 SNOMED CT API uses exact
Baseline to Hybrid string matching
Precision by 18%. Couldn’t handle the variation
of terms, i.e., diversity
Hybrid to Hybrid++ challenge.
Precision by16% , Recall
by23%
Hybrid++
19 Precision: 0.86, Recall: 0.55
20. More Results
Term processing
component
remove special characters
-, #, /, etc.
acronym expansion
dictionary
T (Temperature)
BTL (Bilateral Tubal Precision only slightly
Litigation) improved
3-5%
VTE (Venous
Recall improved majorly
Thromboembolism) 25%
Final Precision =0.89, Recall
20 =0.76
21. Implications
Impact of Semantic Structure
Overall mapping performance
More number of correct predictions (context challenge)
Impact of Linguistics
Majorly on recall
Reaches more number of relevant terms (diversity
challenge)
Overall
Promising performance, even with limited training data
Recall low because of simplicity of linguistic techniques -
can be further improved using sophisticated techniques.
21
22. 1. Motivation
2. Problem
3. Solution
4. Evaluation
5. Final Remarks
22
23. Contributions
PROBLEM: NEW problem of standardizing the terms on clinical
encounter forms using SNOMED CT.
Existing works (Henry et al., 1993, Barrows Jr. et al. 1994,
Patrick et al. 2007)
standardization of clinical notes: diagnosis, medication
information, patient complaints, etc.
SOLUTION: Context-based method that leverages SEMANTIC
STRUCTURE of forms along with term linguistics.
Existing works
linguistic techniques (synonyms, morphemes, lexical
variants)
23
24. Contributions
EVALUATION: 26 healthcare forms containing 950+ mappable
terms specified by multiple clinicians.
Improvement over existing services
23% precision, 38% recall
Promising Performance
precision: 0.89, recall: 0.76
FINDINGS:
Linguistics helps overcome diversity challenge and improve
recall
Semantic structure helps overcome context challenge and
improves precision and recall.
Design synergistic hybrid approaches to address all
mapping challenges, and Achieve a superior performance
24
25. Limitations
TECHNIQUE TECHNICAL EVALUATION
Post coordinated mapping Compare with other models:
Handle Missing and Bayesian networks, k
Inapplicable Values in Neural Networks,
Training data
Classification Association
Rules
STUDY
Test the validity of
Domain Expert Annotator assumptions
Class conditional
independence
Correctness of most
linguistic matching
concept
Classification Attributes
Compare/Combine with
25
other UMLS terminology
26. Future Directions
Fully explore SNOMED In larger frameworks, does
CT annotation help improve
Defining relationships Data/Database Integration
?
Data Quality ?
Customize for Form Patient Diagnosis ?
Categories User Interventions ?
Encounter, Regular
Visit,… Work In Progress:
Larger Knowledge Base for Integrate with flexible Electronic
Training Datasets Health Record system (IHI 2010)
Integration of new forms in EHR
improve database integration
process
26
25 min presentation – 5 min question answer. Make 20 slides only. Read reviewers comments. Breakdown – 2, 4, 5, 5, 4
(In other words, we could say that existing systems are certainly not designed with future integration in mind.)
Who designed the forms? Why not other domains – which other domains? Possible. Have some idea. Mark the concepts – post coordinated or partial mapping.
Draw all the figures properly in MS 2010 ppt.
Why does recall decrease – when number of correct predictions decrease on applying the hybrid method. Sometime linguitic approach returns more accurate result. More improvement in recall, and precision means forms had those terms whose multiple senses exist in SNOMED CT
Our experience of tagging 52 data-entry forms suggests that the training samples can be constructed quickly and easily, as compared to the construction of exhaustive set of rules or heuristicsTo further test the performance of the mapping framework in a heterogeneous environment,