Using Named Entity Recognition technologies contained in IBM Watson products, we are building the ontology of concepts contained in the Cochrane Database of Systematic Reviews: inclusion criteria, participants, interventions, comparisons, and outcomes (PICO), risk of bias assessments, conclusions, and dates. In the future, medical researchers and practitioners will be able to navigate the resulting network using simple queries and natural language queries and access the resulting information using tablets or similar convenient systems of engagement.
PARADIGMA presented a first prototype of the ontology which has been automatically extracted from the Cochrane corpus, shared her experiences in the process of developing the ontology, and demonstrated its use in the context of search and discovery.
2. Motivation & Approach
To make Cochrane contents – evidence - more accessible
Cross referencing information
finding relevant passages in documents of 200+ pages
Supporting discovery & search in Cochrane review documents
Build a foundation for apps to be used in „point of care“ situations
Extract an ontology based on information contained in the Cochrane library
„discover“ Cochrane content relevant to a given patient
Using semantic models (IBM‘s System T) to extract entities &
relations
diseases, diagnoses, treatments, interventions, medication, drugs,
symptoms, complications
„… prolonged treatment with vitamin K antagonists reduces the risk of
recurrent venous thromboembolism …. ”
Page 2
L. Chiticariu, R. Krishnamurthy, Y. Li, F. Reiss, and S. Vaithyanathan, “Domain adaptation of rule-based annotators for named-entity recognition tasks,” in
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010, pp. 1002–1012.
A. Nagesh, G. Ramakrishnan, L. Chiticariu, R. Krishnamurthy, A. Dharkar, and P. Bhattacharyya, “Towards efficient named-entity rule induction for customizability,”
in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012, pp.
128–138.
3. Page 3
Semantic models using System T / AQL
•Extract
candidate
features
•Apply filters
•Post processing
Dictionary
Learning
•Extract basic
features
•Combine
features
•Annotate and
canonical form
Named Entity
Recognition
•Combine
modifiers with
entities
•Combine
Relation Hints
with extended
spans
•Normalize
Relation
Identification
8. Building Blocks for Relation Extraction
Named Entities
(„calcium channel blockers“, „parkinson‘s
disease“)
Relation Hints
(A was caused by B; A has positive effects on B …)
tag with relation type: CAUSE, PREVENT,
INCREASE, …
Modifier Hints
(use of A; risk of A; developing A …)
tag with modifier type: RISK, USE, INFECTION,
GROWTH, …
Page 8
Relation
Identification
9. Illustrated cases
List and Bracket Processing
„Other drugs include carbamazepine and newer antiepileptics
(lamotrigine, topiramate and zonisamide) and the atypical
antipsychotics (clozapine, aripiprazole and ziprasidone)“
Special Cases (for “is a”)
„atypical antipsychotics (clozapine, aripiprazole and ziprasidone)“
„infections, such as malaria and hookworm”
„selenium, vitamin C and other antioxidants“
Simple direct relations
@PREVENT :entity /(protect|help)s against/ :entity
@PREVENT :entity <can> <adverb> ? prevent :entity
@CAUSE :entity <is> <adverb>? followed by :entity
@CAUSE :entity <can> <adverb>? result in :entity
Page 9
Relation
Identification
10. Relation Postprocessing
Combine consecutive modifiers and Named Entities:
„a reduction in the risk of developing A“
REDUCE.RISK.GROWTH
Combine Relation Hints and Extended Entity Spans:
„use of calcium channel blockers was associated with a reduction in the risk of developing
parkinson‘s disease“
USE A CAUSE REDUCE.RISK.GROWTH B
Simplify („translate“) to create the final Semantic Relation
A REDUCE RISK B
calcium channel blockers REDUCE RISK parkinson‘s
disease
Page 10
Relation
Identification
19. Observations and insights gained …
Rule based system adequate for medical reports
Statistical approaches require larger corpora
Grammatical parsers alone not sufficiently specific
Domain specific language aids semantic modelling
Problems encountered, responses (POS)
Adjective contamination
Some antiepileptic drugs are marketed specifically for migraine prophylaxis.
Delimiting entities and relations
Drug therapy for migraine falls into two categories.
Patients were likely to reduce the number of their migraine headaches by 50%.
Efforts commensurate with the text corpus
Continuous improvement process inherent in our approach
Building on top of the existing dictionaries and patterns
Page 19
20. What‘s next?
Improve AQL extraction results
Improve entity normalization and types
(eg. make better use of entity components: „endothelin receptor
antagonist“)
Identify most relevant relations
Extraction of Structured Context
Use ontology for point of care situations
Introduce deep learning technology
User interface (mobile systems of engagement) for „point of care“
situations
Combine with patient data to guide the discovery process in
Cochrane reviews
Page 20