1. Modeling Social Behavior for Health care Utilization in
Depression
Monitoring Clinical Depressive Symptoms in Social Media
Project Funded by:
NIH R01 Grant #:
MH105384-01A1
The content is solely the
responsibility of the authors and
does not necessarily represent
the official views of the National
Institutes of Health.
Amir Hossein Yazdavar
amir@knoesis.org
Advisors:
Prof. Amit Sheth, Krishnaprasad Thirunarayan (Kno.e.sis Center),
Jyotishman Pathak (Division of Health Informatics Cornell University)
Project Link:
rebrand.ly/depressionProject
@knoesis_mdd
@halolimat
2. The importance of Studying Clinical Depression
November , 2011,
“Teen Tweets Before Committing Suicide: The
Importance of “Cyber-helping.”
http://www.adweek.com/digital/teen-tweets-before-comitting-suicide-the-importance-of-cyber-helping/
September , 2015,
“Jim Carrey’s Girlfriend: Her Last Tweet Before
Committing Suicide ’Signing Off.”
http://hollywoodlife.com/2015/09/29/jim-carrey-girlfriend-suicide-note-twitter-break-up/
Clinical depression is one of the most common
mental illness
350
million
$42
billion
Adults in USA age 18 and older
suffered from depression
40
million
Of those suffering receive
treatment.
Only 1/3
Spent on depression treatment in a
year.
People affected
Over 90% of people who commit suicide have
been diagnosed with clinical depression.
2/30@knoesis_mdd
3. The importance of Studying Clinical Depression
In 2015, there were more than 44K reported suicide deaths.
“Suicide claims more lives than war, murder, and
natural disasters combined.”
American Foundation for Suicide Prevention:
Every 12 minutes a person dies by suicide in the US.
Every day, 121 Americans take their own life.
500K people visited a hospital for injuries due to self-harm.
Suicide was the second leading cause of death for adults between the ages of 10 & 34 years
3/30@knoesis_mdd
http://myefiko.com/wp-content/uploads/2016/05/girls-suicide.jpg
http://www.udayavani.com/sites/default/files/i
mages/english_articles/2017/04/15/Suicide.jp
g
4. Previous efforts to address Clinical Depression
Global effort to manage depression involves detecting it through
survey-based methods via phone or online questionnaires
Under-representation
Sampling bias
Incomplete
information
Large temporal gaps between
data collection & dissemination of findings
Cognitive bias
4/30@knoesis_mdd
http://www.thechicagobridge.org/wp-
content/uploads/2015/12/online-survey.jpg
5. Unique opportunity to study Clinical Depression
Social Media Platforms:
A valuable resource for learning about users’ feelings, emotions,
behaviors, and decisions that reflect their mental health as they
are experiencing the ups and downs.
How well can textual content in social media be harnessed to
reliably capture clinical depression symptoms of a user over time?
Are there any underlying common themes among depressed
users?
https://makeawebsitehub.com/wp-content/uploads/2016/04/social_media.jpg
http://www.wamitab.org.uk/useruploads/images/istock_question_man_small.jpg
http://news.mit.edu/sites/mit.edu.newsoffice/files/styles/news_article_image_top_s
lideshow/public/images/2015/big-data-medicine-model_0.jpg?itok=9gUD6D48
5/30@knoesis_mdd
6. Supervised approach
Language Style, Emotion, Ego-network, User Engagement
Low Recall,
High dependency
on the quality of
lexicon
Labor-intensive
annotations
6/30@knoesis_mdd
Lexicon-based approach
Previous Efforts to Study Depression
7. Previous Efforts to Study Depression
Simply use the keyword “depression” in tweets to find depression indicative tweets
2.bp.blogspot.com/_VMAt17gvKp8/TEUAD3ebxQI/
AAAAAAAAA1A/riVGzdSTavM/s1600/confuse.gif
“economic depression”,
“great depression”,
“depression era”,
“tropical depression”
Ambiguity
I am depressed, my
paper got rejected again
Transient sadness
...sleep forever...
Context sensitivity.
7/30@knoesis_mdd
8. Can we develop more specific evaluations
rooted in current clinical protocols? The role of PHQ-9
Clinical Definition of Depressive Behavior:
According to DSM*, clinical depression can be diagnosed through
the presence of a set of symptoms over a period of time.
* Diagnostic and Statistical Manual of Mental Disorders (DSM)
Patient Health Questionnaire 9 items (PHQ-9)
A nine item depression scale, which incorporates DSM-V.
Standard test used by clinicians to screen, diagnose, and
measure the severity of depression.
PHQ-9 rates the frequency of
symptoms which leads into
scoring severity index.
PHQ-9 is completed by the
patient and is scored by the
clinician.
http://cdn.xl.thumbs.canstockph
oto.com/canstock10070068.jpg
8/30@knoesis_mdd
9. Top-down definition of depressive disorder
pinkgirlq8.com/wp-
content/uploads/twitte
r.jpg
9/30@knoesis_mdd
10. Obsessed
with weight
S5
Lassitude
S4
Sleep
disorder
S3
10/30@knoesis_mdd
Feel so fat So tired..always so tired. havent slept in 8 days..
Must not.eat must.be.thin 0 energy to do anything
we will never sleep, sleep is
for the weak we will never
rest, until we're ***** dead.
94lbs, urgh I disgust myself.
wish I could just sleep for 10
days straight
just burst out in tears
because I'm that tired 😭😭
why can't I sleep
so depressed with my weight
can I just want be skinny
cba with work, I just want to
snuggle up all day in bed
I just wana sleep
PHQ-9 symptoms in tweets
11. Self harm
S9
Suicidal
Thought
S9
Feeling bad
about yourself
S6
11/30@knoesis_mdd
I've never been so sure about
suicide
Kill me now
why have I become such a chubby
***** for
Its amazing how much blood can
bleed just from a small cut into a
vain
I'm so done. **** recovery. **** life.
Just want to die. Couldn't care
less what happens to me
anymore.
I feel like a failure
all my blades are so blunt
Cut again tonight, couldnt do it as
bad as I have therepy tomorrow
It is always my fault
I swear every night I tell myself
"one more cut and that's it" It's
never just one more.
Thinking about hanging myself ... I
just don't want to wake up
tomorrow morning.
What a sick life I lead..
PHQ-9 symptoms in tweets
12. How to Study Clinical Depression in Twitter?
We built a model which emulates PHQ-9 questionnaire for
detecting clinical depression symptoms in Twitter profiles.
Analyzing user’s topic preferences and word usage we
can monitor the depression symptoms.
12/30@knoesis_mdd
What they are talking about?
How they express themselves?
13. A document is mixture of latent topics,
where a topic is a distribution of co-
occurring words.
Simply counts the number of depression-
indicative terms by creating a dictionary of
terms for each depressive symptoms.
Learned topics were not
granular and specific
enough to correspond to
depressive symptoms.
bocawatch.org/wp-
content/uploads/2016/10/careerconfusion
1-e1417093044460.png
13/30@knoesis_mdd
Ambiguity and context
sensitivity problems
How to Study Clinical Depression in Twitter?
Bottom-up processing: LDA Top-down processing: Lexicon-based
14. Hybrid Processing
We add supervision to LDA, by using terms that are strongly related to the 9
depression symptoms as seeds of the topical clusters, and guide the model to
aggregate semantically related terms into the same cluster.
14/30@knoesis_mdd
How to Study Clinical Depression in Twitter?
16. Social media users have a creative descriptive metaphorical phrases and
explanations for symptoms
I’m so exhausted all timeSo tired, so drained, so done
Lack of Energy
Generates a personalized set of seed terms per user
(a subset of the available terms in the lexicon)
16/30@knoesis_mdd
Generating Seed Terms: Challenges
17. Language of social media contains polysemous words in its vocabulary
Cut my finger opening a can of fruit scars don’t heal when you keep cutting
Word Sense Disambiguation (WSD)
We disambiguate a polysemous word based on the
sentiment polarity of its enclosing sentence.
17/30@knoesis_mdd
Generating Seed Terms: Challenges
18. We divide each user’s collection of preprocessed tweets into a set of tweet
buckets using a specific time interval of d days.
-Φ is the distribution of words per
symptom.
-θ shows the distribution of symptoms
over buckets
We restrict a topic si to a single
corresponding value for each user-
specific seed terms.
Each term wi is then assigned to the
largest probability symptom associated
with it in Φ.
18/30@knoesis_mdd
Framework
19. Quantitative Analysis
Umass:
where D(wi ,wj ) counts the number of
documents containing both wi and wj words
and D(wi) counts the ones containing wi
● Topic coherence measures score a single topic by identifying the degree of
semantic similarity between high-scoring words in that topic.
UCI:
where the word probabilities are calculated
by counting word co-occurrence in a sliding
window over an external dataset such as
Wikipedia.
19/30@knoesis_mdd
20. ● We created a dataset containing 45,000 Twitter users who self-declared their depression and the
other 2000 “undeclared” users collected randomly.
● What is self-declared depression?
○ Sample profiles: 1, 2, 3, 4, 5
20/30@knoesis_mdd
Dataset
21. ● Data Preparation
○ After removing the profiles with less than 100 tweets,
○ we obtained 7,046 users with 21 million timestamped tweets (at most 3,200 per user)
○ Next, we randomly sampled self-reported 2,000 profiles and 2,000 random users.
● Performing text preprocessing
Our model discovers depressive symptoms as latent topics from sliding
window on buckets of timestamped tweets posted by users.
21/30@knoesis_mdd
Dataset
22. The following table illustrates the sample of topics learned by ssToT and LDA model. p(w|s)
22/30@knoesis_mdd
Qualitative Analysis
23. ● Interesting Observations:
○ Captures acronyms related to each symptoms
○ Has excessive usage of expressive interjections
○ Has family and friends-related topics
Symptom 5
“Mfp” ↣ for “More Food Please”
“Ugw” ↣ stands for “Ultimate Goal Weight”
Symptom 2
“idec” ↣ “I Don’t Even Care”
“aw” ↣ indicative of disappointment
“feh” ↣ indicative of feeling underwhelmed
“ew” ↣ denoting disgust
“argh” ↣ showing frustration
family, hugs, attention,
parents, competition, daddy,
mums, sigh, grandma,losing
23/30@knoesis_mdd
Qualitative Analysis
24. Visualizing Depressive Symptoms
● We keep topics containing certain number of seed terms as dominant words (threshold)
24/30@knoesis_mdd
25. ● Average coherency of different models vs our model.
● The higher the coherency score, the more interpretable topics
25/30@knoesis_mdd
Quantitative Analysis
26. Symptom Prediction (Multi-label Classification)
● We try to predict the correct set of labels (depressive symptoms) for each
bucket of tweets.
● We build a ground truth dataset of 10400 tweets in 192 buckets. Each bucket
contains tweets that are posted by the user within span of 14 days (in
compliance with PHQ-9).
● Tweets are selected from a randomly sampled subset of both self-reported
depressed users and random users.
● Three human judges manually annotated each tweet using the nine PHQ-9
categories as labels.
26/30@knoesis_mdd
27. We compare our model results to common supervised approaches for performing
multi-label classification, namely, the binary relevance (BR) and classifier
chains (CC) methods.
27/30@knoesis_mdd
Symptom Prediction (Multi-label Classification)
28. Limitations
Our model works well with less descriptive symptoms
Over thinking
always destroy
my mood
this essay is dragging
so much, can’t deal with
essays and revisions
any more :(
my head is
such a mess
right now
I need a
break from
my thoughts
Concentration
Problems
descriptive and metaphoric and do not contain any depressive-indicative term
photos.gograph.com/thumbs
/CSP/CSP993/k14300461.jp
g
28/30@knoesis_mdd
Limitations
29. Conclusion and Future work
● We showed the potential of social media data for extracting depression symptoms.
● We developed a semi-supervised statistical model using a hybrid approach that
combines lexicon and topic modeling.
● Our approach complements the current questionnaire-driven diagnostic tools by
gleaning depression symptoms in a continuous and unobtrusive manner.
● Our model yields promising results that is competitive with a fully supervised approach:
accuracy of 68% and a precision of 72% for capturing depression symptoms.
● In future, we plan to apply our approach to capture depressive signals from various
data sources including longitudinal electronic health record (EHR) systems and
private insurance reimbursement to develop a robust “big data” platform for detecting
clinical depressive behavior at the community level.
29/30@knoesis_mdd
33. Research Questions
1) How well do the content of posted images (colors, aesthetic and facial
presentation) reflect depressive behavior?
2)Does the choice of profile picture show any psychological trait of online
depressed persona?
3)Are they reliable enough to represent the demographics such as age and
gender?
4)Are there any underlying common themes among depressed individuals
generated visual and textual content?