Crowdsourced annotations data offers cognitive computing systems insights in lay semantics. This is especially important in health care, where medical terminology is often not aligned with patients `lay' language. However, the general crowd often has limited medical knowledge. Therefore this research investigated the opportunities of social health websites for obtaining ground truth annotations data for cognitive computing systems including clinical decision support systems. By identifying these websites and analyzing their data, it offers a starting point for the future utilization of user-generated health content for cognitive systems. However, the opportunities of social health data are currently limited by various legal regulations. Therefore this paper also dwells on the legal aspects of implementing social health data for cognitive computing systems.
Utilizing Social Health Websites for Cognitive Computing and Clinical Decision Support Systems
1. Utilizing Social Health Websites for Cognitive Computing
Exploring the Potential of User-Generated Health Content for Clinical Decision Support Systems
Harriëtte Smook
h.smook@vu.nl
28 October 2014
2. Cognitive Computing Systems
‘Prostheses’ for human cognition
Interact naturally:
Machines & users should be closer to
each other by enabling machines to
understand human natural language
Introduce a new generation of
Clinical Decision Support Systems
Expand human cognition:
Ease processes, especially those
with large data sets or data that
requires human interpretation.
Learn by being used:
Humans often can easily detect machine
errors. Systems usage can be arranged
in such a way that humans understand
the system and the problems it solves.
Apple Siri
Google Glass
IBM Watson
Why Cognitive Systems? IBM Research. Retrieved from http://www.research.ibm.com/cognitive-computing/why-cognitive-systems.shtml, accessed 16 July 2014.
Lora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014.
3. Clinical Decision Support Systems
IBM Watson
2. Generates & evaluates!
evidence-based hypothesis
1. Understands !
human natural language
& human communication
3. Adapts & learns!
from user selections
& responses
Transformational technologies combined
Lora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014
4. How can Health 2.0 help cognitive computing systems?
HealthUnlocked ? Health Tracking Tools:
+ Social Health Websites:
!
PatientsLikeMe
=
!
…
!
Collaboration of patients, medical experts and researchers
Collective aggregation of information, experiences and data
Tools for collecting, tracking and sharing health information:
• Monitoring new treatments
• Collecting real-world experiences
• Patients have more explicit control over their own data
5. How can health 2.0 help cognitive computing systems?
My patient has
acute coryza!
+ =
The crowd provides human perspectives:
Crowdsourcing Human Semantics
New generation of
Clinical Decision
Support Systems
Doctors Patients Health-aware citizens
Experts provide
formal knowledge
Well, I only have
a cold.
6. How to utilize user-generated health content as
training data for cognitive computing systems?
2. Data Analysis 3. Create Ground Truth Data
Representativeness
Validity
Consistency
Compare with existing
Watson data
1. Gather the data
PatientsLikeMe
Publicly available pages
7. Data Analysis
Important aspects for obtaining widespread health data
Coverage of different
medical conditions
> 500 conditions
Availability of different
kinds of data
Diverse health
tracking tools
Consistency in the
used vocabulary
43% of the symptoms
covered by UMLS
Cultural and geographical
dispersion of users
> 260.000 users
Website in English
PatientsLikeMe (PLM)
Catherine Arnott Smith and Paul J Wicks. Patientslikeme: Consumer health vocabulary as a folksonomy. In AMIA annual symposium proceedings, vol. 2008, p. 682. American Medical Informatics Association, 2008.
8. PLM Data Analysis
Demographic analysis:!
• Data analysis in terms of demographics & population
• Countries of residence, gender & age
Analysis of top-reported conditions:!
• Prevalence on PLM vs. prevalence in the U.S.
• Demographics per top-reported condition vs. official health statistics:
• Gender, peak age & onset age
Analysis of top-reported treatments:!
• Top-reported treatments vs. official drug prescription statistics
• PLM treatments per top-reported condition vs. officially listed treatments in U.S.
Lexical Analysis:!
• PLM conditions and treatments compared with official medical terminology (UMLS)
9. PLM Data Characteristics
373600 Patients
Age Gender
Gender per age category
233153 Unique members
99274 U.S. members
697 Conditions
Current age
Onset age
432 Conditions
Reported treatments
Perceived effectiveness of treatments
1617 Treatments
Current patients
Stopped patients
Adherence
Burden
Costs
Current duration
Past duration
Severity of side effects
1257 Treatments
Reported purpose
Perceived effectiveness per purpose
1172 Treatments
Top reported dosages
1032 Treatments
Top reasons why people stopped
663 Treatments
Top reported side effects
663 Conditions
Current patients
Gender
Primary condition
Condition status
Top reported symptoms
10. Demographic Analysis
Countries of residence, gender and age
37% of PatientsLikeMe’s members lives in the United States
Other
United States
United States
United Kingdom
Canada
Australia
India
South Africa
Ireland
New Zealand
Other
37,2%
4,2%
2,7%
1,1%
0,8%
0,3%
0,3%
0,2%
51,7%
13. Top-reported conditions
Are more prevalent on PatientsLikeMe than in the United States
Condition PLM US US
1 Fibromyalgia 21,4% 2%
2 Multiple Sclerosis!
!
19,3% 0,1%
3 Major Depressive Disorder 8,7% 6,7%
4 Generalized Anxiety Disorder 7% 3,1%
5 Chronic Fatigue Syndrome 6,6% 0,3%
6 Parkinson’s Disease 6,6% 0,3%
7 Epilepsy 4,5% 0,2%
8 Rheumatoid Arthritis 2,4% 0,6%
9 Amyotrophic Lateral Sclerosis 3,3% 0,01%
10 Post-Traumatic Stress Disorder 3,4% 3,6%
U.S. most prevalent conditions are mainly related to heart disease and overweight
14. Demographics per condition
Gender
Women are overrepresented in all top conditions on PatientsLikeMe
Peak age
PLM patients suffering from mental health conditions are remarkably older than the peak age
PLM patients suffering from conditions common among elderly are remarkably younger
Onset age
PLM patients suffering from mental health conditions experience these often already in their childhood
15. Top-reported treatments
Are less popular prescription drugs in the U.S.
Top-reported PLM treatments versus official U.S. rankings
PLM Treatment U.S. rank
1 Gabapentin 20
2 Duloxetine n.a.
3 Pregabalin n.a.
4 Baclofen n.a.
5 Clonazepam n.a.
6 Copaxone n.a.
7 Levothyroxine 2
8 Tramadol 21
9 Lamotrigine n.a.
10 Bupropion n.a.
Official U.S. rankings versus top-reported PLM treatments
U.S. Treatment PLM rank
1 Hydrocodone Paracetamol 13
2 Levothyroxine Sodium 7
3 Lisinopril 37
4 Simvastatin 42
5 Metoprolol 53
6 Amlodipine 57
7 Omeprazole 9
8 Metformin 22
9 Salbutamol 28
10 Atorvastatin n.a.
Frequently prescribed drugs in the U.S. are less popular on PLM
16. Lexical analysis
The majority of the treatments and conditions is covered by UMLS
Lexical tools:!
• BeCas1
• UMLS Metathesaurus
Browser2
• NCBO BioPortal Annotator3
• RxTerms4
All treatments and conditions from the data set are compared with UMLS!
• Only 2 out of 1025 unique treatments & 9 out of 663 unique conditions are not covered:
• Too general term (e.g. accidental fall)
• Term is proposed and not yet included in UMLS or under discussion
• Term is removed from UMLS
• Term is not evidence-based and used by alternative healers
1. http://bioinformatics.ua.pt/becas/#!/about
2. http://uts.nlm.nih.gov/home.html
3. http://bioportal.bioontology.org/annotator
4. http://wwwcf.nlm.nih.gov/umlslicense/rxtermApp/rxTerm.cfm
17. Issues in utilizing user-generated health content
as training data for cognitive computing systems
Bias & Limitations Accessibility Privacy issues
Each data source comes with bias and
limitations that need to be considered
Data is not easily accessible How to avoid?
18. Opportunities in utilizing user-generated health content as
training data for cognitive computing systems
Access to high coverage of
(rare) medical conditions
Access to patients and
health-aware citizens as
an intermediate between
the general crowd and experts
Knowledge from the
patients’ perspective
19. In the future..
Perform analysis on data from
alternative geographical contexts
Perform analysis on data with
different characteristics
Generate better
ground truth data