2. 2
Motivation
● Complex nature of obesity.
● Wide range of biomedical data sources available.
– implementation of biomedical text/data mining.
● Possible to reveal hidden links between obesity and other
diseases.
● Partial completed knowledge representation models of obesity.
● A systematic approach required for:
– analysis and interpretation of clinical knowledge.
3. 3
Concept Maps
● Knowledge representation models.
● Consisted of:
– nodes (concepts).
– links (relationships between the nodes).
● Aim: gather, understand, explore knowledge.
● Variety of users.
● No explicit detail.
● Implemented primarily in education.
5. 5
Aim
● To design a framework to build/enhance medical concept maps.
● To improve the understanding of health care concept
complexity.
● Assist medical professionals in the representation, exploration
and validation of their expert knowledge.
● Improvement of the clinical health care.
6. 6
Objectives
● Design and implement methods for health care concept
detection.
● Concept organisation in a concept map form.
● Method generation for concept map updates.
● Build a framework for the design/enhancement/validation of
medical concept maps.
● Methodology evaluation through the health problem of obesity:
– validation of obesity related concepts with current structured obesity
information available.
– identify gaps in clinical knowledge.
7. 7
Research Hypothesis &
Questions
-The analysis required to extract health care concepts.
-The approach to built and enhance a concept map.
-The concept map contribution in the representation/validation of knowledge.
-The text mining results help to understand/explore clinical problems.
Biomedical
Text Mining
Scientific
literature
Concept
map
Improvement of
health care
Framework
8. 8
Obesity
● Worldwide problem.
● Epidemic proportions:
– WHO rates (2005): 1.6 billion overweight, 400 million obese.
● Associations to various diseases.
● Complex risk factors and complications.
● Various aspects.
● Lots of research.
10. 10
Biomedical Text Mining
● Extraction of information from unstructured data of biomedical
nature.
● Discovery of new, previously unknown knowledge.
● Performed on documents with complex/specific terminology and
expressions.
● Challenges:
– language ambiguity.
– variation of language expression.
● Various tools and applications (Termine, Whatizit, GATE).
● Adaptation to user's tasks and requirements.
11. 11
What we are looking for?
● Risk Factors
● Causal Factors
● Confounding Factors
● Outcomes
● Complications
● Interventions
● ...
12. 12
Methodology Overview
1. Document retrieval.
2. Term/concept extraction.
3. Feature engineering and Information extraction:
- application of classification/clustering techniques.
4. Concept map design.
13. 13
Evaluation-Obesity Case Study
● Comparison:
– What ?
● biomedical text mining results.
● concept map information.
– How ?
● concepts and relationships.
● New ones.
● Examination/manipulation/validation of new knowledge by experts.
● Enhancement of the concept map.
14. 14
Progress so far (1)
● Corpus collection.
● Application of Automated Term Recognition (ATR).
● C-value method.
● Single word ATR:
– terminological head identification.
– word of a multi-word term that defines the term class.
– example:
● “Childhood diabetes type II”.
● Terminological head: “diabetes”.
15. 15
Progress so far (2)
● Ranking head measures:
– total head frequency,
– single head frequency,
– maximum and average C-value,
– abstract frequency,
– ratio of single head frequency/total head frequency,
– tf*idf (term frequency*inverse document frequency).
16. 16
Results
tf*idf total freq single freq abstract freq word freq max_c aver_c ratio
0
5
10
15
20
25
30
35
40
45
0
10
20
30
40
50
Statistical measure
Numberofkeywords
17. 17
Progress so far (3)
● Pattern extraction from abstracts for:
– risk, confounding and causal factors,
– interventions,
– complications,
– outcomes.
Obesity risk is increased among women with psychiatric disorders
Potential risk factor
19. 19
Future plan
Species identification in obesity corpus (Linneus)
Exploration of single word terms ATR
Calculation of z-score
Integration of single and multi-word terms
Lexical/semantic analysis of the existing concept map
Paper preparation for the extraction of single terms in text
Pattern extraction from manual analysis
Pattern rule design with Minor Third
Feature engineering
Clustering
Classification
Paper preparation for the classification of disease descriptors
Paper preparation for the clustering of health care concepts
Integration of the results
Preparation of the second year interview/report
Design of concept map relationships (exploration)
Application of visual mapping tools
Update of the new concept map
Comparison and validation of knowledge
Exploration of concept complexity in obesity
Paper preparation for the automatic design of clinical concept maps
Produced generic framework of the methodology
Writing the thesis
October 2010 April 2011 November 2011 May 2012
Year 3
Year 2
Date
Year 2 (1/2): Concept extraction
20. 20
Future plan
Species identification in obesity corpus (Linneus)
Exploration of single word terms ATR
Calculation of z-score
Integration of single and multi-word terms
Lexical/semantic analysis of the existing concept map
Paper preparation for the extraction of single terms in text
Pattern extraction from manual analysis
Pattern rule design with Minor Third
Feature engineering
Clustering
Classification
Paper preparation for the classification of disease descriptors
Paper preparation for the clustering of health care concepts
Integration of the results
Preparation of the second year interview/report
Design of concept map relationships (exploration)
Application of visual mapping tools
Update of the new concept map
Comparison and validation of knowledge
Exploration of concept complexity in obesity
Paper preparation for the automatic design of clinical concept maps
Produced generic framework of the methodology
Writing the thesis
October 2010 April 2011 November 2011 May 2012
Year 3
Year 2
Date
Year 2 (2/2): Concept structuring
21. 21
Future plan
Species identification in obesity corpus (Linneus)
Exploration of single word terms ATR
Calculation of z-score
Integration of single and multi-word terms
Lexical/semantic analysis of the existing concept map
Paper preparation for the extraction of single terms in text
Pattern extraction from manual analysis
Pattern rule design with Minor Third
Feature engineering
Clustering
Classification
Paper preparation for the classification of disease descriptors
Paper preparation for the clustering of health care concepts
Integration of the results
Preparation of the second year interview/report
Design of concept map relationships (exploration)
Application of visual mapping tools
Update of the new concept map
Comparison and validation of knowledge
Exploration of concept complexity in obesity
Paper preparation for the automatic design of clinical concept maps
Produced generic framework of the methodology
Writing the thesis
October 2010 April 2011 November 2011 May 2012
Year 3
Year 2
Date
Year 3: Design of the medical concept map
22. 22
Summary
● Framework creation for clinical concept map building and
enhancement.
● Improved understanding of health care concept complexity.
● So far:
– comprehension of literature review.
– methodology design.
– single ATR.
– pattern design.