1. Mapping the Meaning of Life
Using Open-Ended Surveys and Computational Methods to Extract the
Structure of Subjective Well-Being
Patrick van Kessel, Pew Research Center
4. “Meaning” means a lot of different things
• There are some quality indicators for capturing general life
satisfaction and meaning
• Satisfaction with Life Scale (SWLS, Diener)
• Meaning in Life Questionnaire (MLQ, Steger)
• But what about sources of meaning?
• We seek and find meaning in many ways:
• Family/Friends
• Job/Career
• Faith/Religion
• What questions do we even ask?
5. Measuring Sources of Meaning
Large-N Closed-
Format Surveys
Small-N Qualitative
Interviews and Panels
Large-N Open-Ended
Surveys
Good
• Lots of breadth
• Representative
• Require fewer
assumptions
• Lots of depth
• Require fewer
assumptions
• Lots of depth
• Lots of breadth
• Representative
Bad
• Assume
categories
beforehand
• Limited depth of
insight
• Limited breadth of
insight
• Not representative
6. Measuring Sources of Meaning
Large-N Closed-
Format Surveys
Small-N Qualitative
Interviews and Panels
Large-N Open-Ended
Surveys
Good
• Lots of breadth
• Representative
• Require fewer
assumptions
• Lots of depth
• Require fewer
assumptions
• Lots of depth
• Lots of breadth
• Representative
Bad
• Assume
categories
beforehand
• Limited depth of
insight
• Limited breadth of
insight
• Not representative
7. Measuring Sources of Meaning
Large-N Closed-
Format Surveys
Small-N Qualitative
Interviews and Panels
Large-N Open-Ended
Surveys
Good
• Lots of breadth
• Representative
• Require fewer
assumptions
• Lots of depth
• Require fewer
assumptions
• Lots of depth
• Lots of breadth
• Representative
Bad
• Assume
categories
beforehand
• Limited depth of
insight
8. Measuring Sources of Meaning
Large-N Closed-
Format Surveys
Small-N Qualitative
Interviews and Panels
Large-N Open-Ended
Surveys
Good
• Lots of breadth
• Representative
• Require fewer
assumptions
• Lots of depth
• Require fewer
assumptions
• Lots of depth
• Lots of breadth
• Representative
Bad
• Assume
categories
beforehand
• Limited depth of
insight
• Limited breadth of
insight
• Not representative
9. Measuring Sources of Meaning
Large-N Closed-
Format Surveys
Small-N Qualitative
Interviews and Panels
Large-N Open-Ended
Surveys
Good
• Lots of breadth
• Representative
• Require fewer
assumptions
• Lots of depth
• Lots of depth
• Lots of breadth
• Representative
• Require fewer
assumptions
Bad
• Assume
categories
beforehand
• Limited depth of
insight
• Limited breadth of
insight
• Not representative
10. • Designed a website to be engaging
• Made open-ends the focus
• Recruitment strategies
• Fliers and handouts
• Google/Facebook ads
• Social media / word of mouth
• Mechanical Turk
• Convenience sample, but diverse
• 63 of 75 quotas (85%) with n>=50
Large online survey + open-ends
11. A Big Book of Meaning
• 5 open-ended questions + demographic and personality covariates
• 1904 respondents
• 1090 complete cases
• Average of 178 words per respondent
• 447,338 words total
• Respondent characteristics
• 33 states with at least 10 responses
• 63 of 75 quota categories (85%) filled with at least 50 responses, 74 with at
least 20 (97%)
12. Cleaning
• Expanded common contractions and abbreviations
• Removed punctuation and extra spaces, lowercased words, etc.
• Removed “stopwords” (e.g. and, but, it, that)
• Automatic and manual spelling correction for every unknown word
14. Normalized pointwise mutual information
• What words best distinguish different sets of respondents?
• Break responses into words
• Partition respondents into two categories
• Find words that distinguish the categories from each other
38. Topic modeling
• Structural Topic Models
• Latent Dirichlet Allocation (LDA)
• Iteratively learns how words are related to each other; divides responses up
into topical clusters
• Can automatically identify common themes
• Cut up responses into N topics
• Run regressions on overall meaning and life satisfaction
39. Most Meaningful Topics
Topic 14 Top Words:
Highest Prob: god, faith, loving, wonderful, purpose, loved, existence
FREX: god, faith, existence, jesus, loved, loving, wonderful
Lift: existence, faith_god, jesus, jesus_christ, faith, god, loved
Score: existence, god, faith, jesus, loving, faith_god, loved
Topic 15 Top Words:
Highest Prob: job, great, get, relationship, wonderful, close, live
FREX: great, job, great_family, bring, great_friend, house, great_job
Lift: great_job, great_family, close_friend, great_friend, family_great, great, pay_bill
Score: great_job, great, job, great_family, family_great, great_friend, job_love
Topic 19 Top Words:
Highest Prob: kid, get, day, always, child, girl/boy, married
FREX: kid, girl/boy, married, morning, bed, today, reason
Lift: think_would, kid, bed, girl/boy, morning, married, make_sure
Score: think_would, kid, girl/boy, married, reason, morning, child
Having Faith in God
Having a Great
Job and Family
Raising a Family
41. Conclusion
When done right, open-ends can give you rich responses and unlock
new insights into how respondents think about an issue
• 63 of 75 quota categories (85%) filled with at least 50 responses, 74 with at
least 20 (97%)
Today we’re going to talk about open-ends and the meaning of life
So, five years ago I was at the University of Chicago and I decided that, for my Master’s thesis, I would study the meaning of life
More specifically, I wanted to study how different people go about seeking and finding meaning and happiness
What are the different strategies, and what works and what doesn’t? Are certain approaches better for certain types of people?
Well, there are no shortage of proposed answers to that question - but I wanted to study the issue scientifically
So, naturally, I turned to the field of survey research
Of course, in order to do a survey, you typically first need to know what questions to ask
I found a few well-validated survey inventories that measured overall sense of meaning, but I also wanted to tap into different sources
The problem is, meaning can come from a lot of different places, and it’s highly subjective
In reviewing the literature, I found that most research on the subject has taken one of two approaches
The first approach is to run a big large-N closed-format survey
The advantages of this are that you can cover a lot of ground, and if you do it right, you can get a representative sample and be able to say something about the broader population
However, if you want to cover everything this way, you first have to come up with a list of categories or constructs
One of the most comprehensive surveys I found did a huge review of philosophical works and came up with a list of a dozen different categories and asked people to rate the importance of each in giving their life a sense of meaning. So, some of the categories were “family”, “legacy”, “financial security” - but one of them was literally called “hedonistic activities” – and I was skeptical that asking my respondents “how important are hedonistic activities to you?” was going to give me the answers I wanted
See, inevitably, when you develop a closed-format survey, you wind up making assumptions. They may be firmly grounded in existing literature and theory, of course, but you’re making assumptions nonetheless.
And for something as subjective as meaning, the very process of defining categories and labels concerned me
Closed-format questions also don’t give you much insight – you may find that “hedonistic activities” are important to certain kinds of people, but without asking an exhausting battery of follow-up questions, you can’t really learn anything about WHY that is
Now, often when a researcher has a less-than-perfect understanding of what they're studying, they start off doing more exploratory qualitative research in the form of interviews and panels
And there's been plenty of work in that area too
Interviews and panels can give you a much deeper look into a particular subject, and you can probe and learn more about how potential respondents think about the topic
But, by design, they're small-N, and you can't really say much about the population - ultimately they can point you in the right direction, but they won't help you definitively answer your question if you're interested in representativeness
However, with the computational tools we now have, I’m not convinced that we have to compromise anymore
So I started looking into open-ends
Traditionally, surveys with open-ends have always put them in the back seat - after a long battery of questions, they’ll ask for some clarification, but people don’t bother much with them, because they’re tired and they seem supplementary
Usually respondents write a few words, and we might then go through and code these responses into different categories, and occasionally we actually analyze those variables - but at least in my experience, I've noticed that the vast majority of open-ends go unanalyzed
The problem is, respondents don’t take the open-ends seriously if it’s clear that they’re just an afterthought
But what if we were to flip it around, and actually make the open-ends the focal point of the survey?
If we do it right, we might be able to get all of the breadth of a large-scale survey, with all of the depth of an interview or panel
Depending on how we design our sample, it could be representative of the population
And it wouldn’t require any major assumptions about the structure of what we want to study
In order to get at a wide range of strategies for finding happiness, I needed each respondent to give me more than just a few words - I needed a LOT of content
I branded the project as a website, to get people invested - and explained how I was going to use text analytics to mine their responses
I emphasized up-front to my respondents that the point of the survey was to capture in their own words how they thought about meaning
I recruited in a variety of ways, and while it was a convenience sample, it wound up being very diverse
So, I'm going to show you a grab-bag of results now - keep in mind that this is a proof-of-concept more than anything. My sample isn't representative, and these results are just intended to illustrate some of the different ways you can use these data to gain insight into your research topic - nothing here is definitive
Also keep in mind that all of this is from 2012
So let’s start with love
One of the covariates I collected was respondents’ relationship statuses – whether they were single, in a relationship, engaged, married, or divorced/separated/widowed
On the left we have single respondents, and on the right we have respondents in a relationship – but not engaged or married
On the Y-axis we have mutual information – a measure of how distinctive a word is to its category – how much information it contributes
And on the x-axis we have the relative difference in proportion of respondents in each category that mention that word
So, if respondents in a relationship mention the word “relationship” 20% of the time, but single respondents only do so 5% of the time, that gives us a relative difference of 15%
And the words we’ll see are sized by overall frequency
And, as we might have expected, the most discriminating words for being in a relationship – as opposed to being single – are “boyfriend” and “girlfriend”
But this is across all respondents
We can also filter down to respondents who feel like their life is much more satisfying than average – one standard deviation higher than average
And when we do that, we can start to more clearly see how different types of respondents may find satisfaction in different aspects of life, based on their circumstances
Happy single people, for example, use the word “new” a lot – perhaps indicating that they are out looking for new and novel experiences
What about getting engaged?
When we subset to just respondents with high life satisfaction, we see that those in a relationship no longer use the term “boyfriend” and “girlfriend” but instead are more characterized by the word “relationship”