The Pursuit of Happiness and Statistics

1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
Kimberly Nguyen MA 576: Project Due: Friday 5/29/16
The Pursuit of Happiness and Statistics
Introduction
Everything we do in life is for the pursuit of happiness. We spend countless nights
cramming for Professor Carvalho’s exams, recopying his notes, and attending every one of his
office hours, meanwhile dreaming about graduating college with a high grade point average and
having a high paying job. But is this how we achieve happiness? Is the isolation we feel from
friends and family while we are studying in Mugar actually hindering our happiness? If we
dedicate some of that time to having more sex, could we elevate our emotional state? And once
we are done with college, how do we choose between a careers that we actually enjoy versus
careers that pay well? Which of these factors truly determine one’s state of happiness? Based on
a survey of 39 employed students in an MBA class at the University of Chicago, these are the
questions we will attempt to answer.
From these students, a total of five variables were collected. The response we are
considering is level of happiness which was measured on a 10 point scale with 1 representing a
suicidal state, 5 representing a state of just “surviving life”, and 10 representing a euphoric state.
4 potential predictors were collected: money, sex, love, and work. Money is a continuous variable
measuring family income in thousands of dollars. Sex refers not to gender, but a 0, 1 dummy
variable where 1 represents a satisfactory level of sexual activity. Love is a factor variable
ranging from 1-3 measuring a student’s feeling of belonging in the context of family, friends,
and community. 1 indicates a feeling of loneliness and isolation, 2 indicates the student has a few
secure relationships, and 3 indicates a deep feeling of belonging. And lastly, work is a factor
variable ranging from 1-5, with 1 indicating the student is seeking other employment, 3 being
indifferent about their job, and 5 indicating that the student enjoys their job.1
In real life, I believe all four of these predictors are important contributors to one’s
emotional state, thus my preliminary inference is that these will all be statistically significant in
our model. Especially in the context of college students, these factors are even more of a priority.
Students are still young enough to be financially dependent on their families, and thus the money
variable, which measures their family’s income could still be a significant factor in these
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
students’ lives. Love measures the social aspect of one’s life which is especially important in
college years where one is developing the friendships they’ll have for the rest of their lives. And
work is where these students spend a significant amount of time, and obtaining a good job is the
reason students are event attending college, thus whether or not they enjoy their job should have
significant effect on their state of happiness. If I was to throw any variable out, it would be sex
because having satisfactory sex is not necessarily a priority shared amongst all college students.
We will allow the data to verify or disprove these assumptions.
Data Clean Up
Due to the limited sample size, we had to do a bit of house cleaning with our data. Since
we had so many variables/levels in our predictors, and only 39 observations, we had to collapse
the levels of a majority of our predictors and our response to avoid sparsity. The response
happiness went from having levels 1 through 10, to having 4 levels: 1-4, 5-6, 7-8, and 9-10.
Money and sex were not changed. Love was transformed from having 3 levels to 2 levels: 1-2,
and 3. Lastly, work went from having a scale of 1-5 to 1-2, 3, and 4-5. The levels still represent
the same aspect of their respective variables. For example, in work 1-2 means the student is
either looking for a new job or currently dislike your job, and 3 means the student is indifferent,
and 4-5 means the student enjoys their job.
Preliminary Data Analysis
Looking at the pairwise graph between all the variables, we can gain some idea about the
possible correlation between the predictors. Simply from observing the graph, it does not seem
that there exists correlation between any of the predictors. This logically make sense because
sex, family income, love, and work are all very different components of life that generally would
not have an effect on the other.
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
We will start by looking at the mosaic plots between the factor variables and our
response. It seems that there is not difference in happiness level whether or not a student is
having satisfactory sex. In fact, the effect of satisfactory sex on happiness seems to be counter
intuitive to what one would think. There are more people who have satisfactory sex in the lower
levels of happiness than people who don’t have satisfactory sex.
Observing the mosaic plot between happiness and love, it is apparent that those who feel
most loved, experience a higher level of happiness than those who feel less loved. Most of the
people who are in love level 3 fall into levels 7-8 and 9-10 of happiness.
There also seems to be a significant difference in level of happiness between the people
who enjoy their jobs and those who don’t. A larger proportion of the people who don’t enjoy
their jobs fall into the 1-4 level of happiness.
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
And a larger proportion of people who enjoy their job fall into the 7-8 and 9-10 levels of
happiness.
Disqualification of Linear Model
Running a linear model here would not make sense for multiple reasons. When running a
linear model, we assume that our response is a continuous and normally distributed random
variable. However, here our response is categorical, not continuous. And not only is it not
continuous, but the response is ordinal, i.e. a categorical variable that has an order and this order
is important information that should not be ignored. R does not even allow you to run a linear
model on ordinal data.
However, if we ignore all this and revert back to the response being on a scale from 1 to
10 and treat this as a continuous response, we can run a linear model. But even then, the
assumption of a null plot of residuals is violated. The plot fans out left and right from the middle
of the plot.
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
Fitting the Model
Fortunately, through the power of generalized linear models, we can correctly model
ordinal data using a proportional odds model. After running a proportional odds model with
happy as the response, and the 4 predictors- money, sex, love, and work we put that model
through an AIC model selection, and end up with a model that predicts the probability of being
in each level of happiness based on love and work.
As I expected, sex was not a valuable variable in predicting the probability of being in a
certain level of happiness. This agrees with our previous discussion where we claimed that
having satisfactory sex is not a priority in many people’s lives, especially considering these are
MBA student at a top ranking university who have a lot more things to worry about.
Surprisingly, the amount of income a student’s family earns also does not have a significant hand
in determining happiness level. Looking back, this makes sense as these are MBA students, not
undergraduates and thus they have probably already financially separated themselves from their
families. And they are also employed MBA students, further distancing their connection to their
families’ incomes.
We did not attempt to fit any interactions because qualitatively, it would not make sense
for any of the predictors to interact with each other. As previously mentioned, in the preliminary
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
data analysis, logically, satisfactory sex, family income, job satisfaction, and feeling of
belonging do not generally not have an effect on each other. In addition we had no reason to
believe we needed to transform any variables and thus no transformations were attempted. Any
transformations would have complicated the interpretation of the model.
Coefficient Interpretation
In a proportional odds model, the probability of being at most in category j is:
𝜸𝒋 = P(y≤j) = 𝒍𝒐𝒈𝒊𝒕−𝟏
(𝜽𝒋 − 𝑿 𝑻
𝜷)
And thus, the log odds of being at most in category j relative to not being in category j is:
logit (𝜸𝒋)=log(
𝜸 𝒋
𝟏−𝜸 𝒋
) =𝜽𝒋 − 𝑿 𝑻
𝜷
From this we can derive that the odds ratio is:
𝑶𝑹𝒋(𝑿 𝟏, 𝑿 𝟐) =
𝒐𝒅𝒅𝒔 𝒋(𝑿 𝟏)
𝒐𝒅𝒅𝒔 𝒋(𝑿 𝟐)
=𝒆(𝑿 𝟐−𝑿 𝟏) 𝑻 𝜷
for all categories j
and thus
𝒐𝒅𝒅𝒔𝒋(𝑿 𝟏)= 𝒐𝒅𝒅𝒔𝒋(𝑿 𝟐)* 𝒆(𝑿 𝟐−𝑿 𝟏) 𝑻 𝜷
We will interpret the coefficients in terms of log odds and odds ratios:
 Intercepts
o 1-4 | 5-6: An individual who feels the lowest level of love and belonging (love=1-
2), and dislikes/is looking for a new job (work =1-2) has a log odds of being at
most at level 1-4 of 0.9121.
o 5-6 | 7-8: An individual who feels the lowest level of love and belonging (love=1-
2), and dislikes/is looking for a new job (work=1-2) has a log odds of being at
most at level 5-6 of happiness of 2.9392.
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
o 7-8 | 9-10: An individual who feels the lowest level of love and belonging
(love=1-2), and dislikes/is looking for a new job (work =1-2) has a log odds of
being at most at level 7-8 of happiness of 8.5585.
 LOVE3: Holding work constant, for a specific level of happiness, the log odds of being
at most that happy decreases by 4.033 for people who have a higher sense of love and
belonging (love=3) than those who have a lower sense of love and belonging (love=1-2).
In other words, for a specific level of happiness, holding work constant, the odds of a
person in love level 1-2 being at most in that that specific level of happiness is 𝑒4.033
times more than a person in love level 3. In simpler terms, this means that a person who
has a higher sense of loving and belonging has a higher probability of being in a higher
level of happiness.
 WORK3: Holding love constant, for a specific level of happiness, the log odds of being
at most that happy is 1.871 less for those who are indifferent about their current job
(work = 3) versus those who either dislike or are looking for a new job (work =1-2). In
other words, for a specific level of happiness, holding work constant, the odds of a person
in work level 1-2 being at most in that specific level of happiness is 𝑒1.871
times more
than a person in work level 3. In simpler terms, this means that a person who is
indifferent about their current job has a higher probability of being in a high level of
happiness than a person who is looking for a new job/dislikes their job.
 Work4-5: Holding love constant, for a specific level of happiness, the log odds of being
at most that happy is 3.448 less for those who enjoy their job (work=4-5) than those who
dislike their job or are looking for new employment (work=1-2). In other words, for a
specific level of happiness, holding work constant, the odds of a person in work level 1-2
being at most in that specific level of happiness is 𝑒3.448
times higher than a person in
work level 4-5. In simpler terms, a person who enjoys their job has a higher probability of
being in a higher happiness level than a person who dislikes or is looking for a new job.
Significance of Coefficients
We will now test the statistical significance of the coefficients. All tests in this paper will
be tested at a significance level of .05.
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
 Love3
𝑯 𝟎: 𝛽𝐿𝑜𝑣𝑒3 = 0
𝑯 𝟏: 𝛽𝐿𝑜𝑣𝑒3 ≠ 0
The conclusion from this test is that we reject the null and say that this coefficient is
statistically significant. The coefficient being statistically different from zero means that, holding
work constant, there is a significant difference in the log odds of being at most in a specific
happiness level between individuals who feel more loved and less loved.
 Work3
𝑯 𝟎: 𝛽 𝑊𝑜𝑟𝑘3 = 0
𝑯 𝟏: 𝛽 𝑊𝑜𝑟𝑘3 ≠ 0
The conclusion from this test is that we fail to reject the null and conclude that coefficient
is not statistically significant. This means that there is not a significant difference in the log odds
of being at most in a specific happiness level between individuals who are looking for a new
job/dislike their job and individuals who are indifferent about their job.
 Work4-5
𝑯 𝟎: 𝛽 𝑊𝑜𝑟𝑘4−5 = 0
𝑯 𝟏: 𝛽 𝑊𝑜𝑟𝑘4−5 ≠ 0
In this test, we reject the null and conclude that this coefficient is statistically significant.
This means that there is a significant difference in the log odds of being at most in a specific
happiness level between individuals who dislike/are looking for a new job versus those who
actually enjoy their jobs.
Test statistic: t= 3.204245
P-value: 0.00135417
Test statistic: t= 1.688251
P-value: 0.09136302
Test statistic: t= 2.945382
P-value: 0.003225561
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
Goodness of Fit
We will now conduct a goodness of fit test of our model. Asymptotically the deviance
has a chi-square distribution. Although the size of our dataset is not ideal, we will continue with
the test anyways:
𝑯 𝟎: Current model has an adequate fit
𝑯 𝟏: Saturated model
Test statistic: 𝜒2
= 𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝑀𝑜𝑑𝑒𝑙-𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒𝑆𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑀𝑜𝑑𝑒𝑙=57.36982 - 0 =57.36982
Degrees of freedom: 𝐷𝐹𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝑀𝑜𝑑𝑒𝑙-𝐷𝐹𝑆𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑀𝑜𝑑𝑒𝑙=(n-p)-(n-n) = (39-6)-(39-39) = 33
P-value: 0.005355549
From the goodness of fit test, we strongly reject the null. Unfortunately this means that
the model we fit is not sufficient in predicting the happiness level of a student based on love and
work. This is highly likely due to the small sample size of the dataset.
Prediction versus Observations
In the following chart, we plotted the predictions for the probability of being in each level
of happiness for each combination of love and work level, and plotted the actual observed
proportions from our data. Agreeing with our goodness of fit test, a majority of our predictions
were extremely off from the predicted value.
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
Conclusion
From the beginning there were quite a few issues with this dataset. The small size of the
dataset, coupled with the relatively large quantity of groups that each person could be placed in
created by the various combinations of the levels of predictors and response made our dataset
extremely sparse. Even after combining levels of the predictors and response, we still ended up
with groups that contained either few or no observations. As seen in the chart below under
“Freq”, there were 24 total possible categories but 10 of them contained no observations.
1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889
Unfortunately, I would hold the sparsity/small sample size of our data set responsible for the lack
of fit we discovered in our model. To improve the model we need to collect more observations
and perhaps more relevant predictors such as grade point average or state of health. Thus I would
not rely on the specific numbers we obtained in our model to make any real predictions.
Nonetheless, the qualitative conclusions that we can draw from this model can still be
valuable. Making a lot of money is the end goal in a lot of people’s lives, however our model
confirms that the cliché “money doesn’t buy you happiness” is true. Money is not a significant
factor in determining happiness, in both our model and in our lives. In addition, whether or not
one has satisfactory sex is not important in determining happiness either. An individual could
have good sex, but still lack the emotional connection necessary to fully enjoy the sex. Thus it
makes sense that the variable love was significant in our model. The variable love represented an
individual’s feeling of love and belonging with their friends, family, and community. Knowing
you have a group of people that you can trust and rely on is invaluable. Money cannot buy you
that type of security. In addition, it make sense that the variable work was significant in our
model. An individual generally spends 8 hours a day at their job. That’s half of the time we are
awake each day. Thus whether or one enjoys one’s job has an adverse effect on a person’s mood.
An individual could have a very high paying job, but the euphoria of depositing a paycheck does
not last long enough to dull the pain of an unstimulating, monotonous, and unfulfilling job.
Receiving high marks in school, making enough money to live comfortably, and having
satisfying sex are all important parts of life. However, we should not let the pursuit of these
things hinder us from the things that truly make us happy: friends, family, and community.
Especially for those of us about to graduate college, we should keep this in mind when we are
choosing between the career we want to pursue, and the career that pays a lot. If these two things
do not coincide, then it may be more valuable to us to take the job that we would enjoy more.
But we should bear in mind the most crucial factor in determining emotional state:
sample size. Having an adequate sample size has a 100% chance of elevating a statistician’s
level of happiness.

Recomendados

Alexa y gaby von
Alexa y gabyAlexa y gaby
Alexa y gabyGabrielagor
204 views8 Folien
Cortazar cronograma de tp von
Cortazar cronograma de tpCortazar cronograma de tp
Cortazar cronograma de tpJosé Miguel Palma
99 views1 Folie
Фізична культура і спорт в незалежній Україні von
Фізична культура і спорт в незалежній УкраїніФізична культура і спорт в незалежній Україні
Фізична культура і спорт в незалежній УкраїніЄвгеній Меркулов
179 views4 Folien
presentación von
presentaciónpresentación
presentacióncarlosmvb69
65 views1 Folie
Jorge y anhuar von
Jorge y anhuarJorge y anhuar
Jorge y anhuarjorge_nachon
342 views6 Folien
Oclusion in removable partial denture von
Oclusion in removable partial dentureOclusion in removable partial denture
Oclusion in removable partial dentureSaeed Bajafar
2.5K views15 Folien

Más contenido relacionado

Destacado

Proyecto con 3 von
Proyecto con 3Proyecto con 3
Proyecto con 3JoseLuis1B
218 views3 Folien
Laia y christian von
Laia y christianLaia y christian
Laia y christianmlul63
398 views4 Folien
Classement F16 avant la quatrième étape von
Classement F16 avant la quatrième étape Classement F16 avant la quatrième étape
Classement F16 avant la quatrième étape jcsamyde
43 views1 Folie
Diana lópez von
Diana lópezDiana lópez
Diana lópezjorge_nachon
149 views9 Folien
Diagramas de flujo von
Diagramas de flujoDiagramas de flujo
Diagramas de flujoJulian Sanchez
234 views4 Folien
LokeshMahawarResume von
LokeshMahawarResumeLokeshMahawarResume
LokeshMahawarResumeLokesh Mahawar
288 views6 Folien

Destacado(12)

Proyecto con 3 von JoseLuis1B
Proyecto con 3Proyecto con 3
Proyecto con 3
JoseLuis1B218 views
Laia y christian von mlul63
Laia y christianLaia y christian
Laia y christian
mlul63398 views
Classement F16 avant la quatrième étape von jcsamyde
Classement F16 avant la quatrième étape Classement F16 avant la quatrième étape
Classement F16 avant la quatrième étape
jcsamyde43 views
Maher Group - Visual Standards Manual von S2 Creative
Maher Group - Visual Standards ManualMaher Group - Visual Standards Manual
Maher Group - Visual Standards Manual
S2 Creative232 views
Elementary Elaboration V2sre von Sheri Edwards
Elementary Elaboration V2sreElementary Elaboration V2sre
Elementary Elaboration V2sre
Sheri Edwards5.8K views
La computadora como sistema von JosePabloBPS
La computadora como sistemaLa computadora como sistema
La computadora como sistema
JosePabloBPS2.4K views
Oclusion in removable partial denture von Saeed Bajafar
Oclusion in removable partial dentureOclusion in removable partial denture
Oclusion in removable partial denture
Saeed Bajafar6.7K views

Similar a The Pursuit of Happiness and Statistics

Correlational research von
Correlational researchCorrelational research
Correlational researchTrung Le
2.5K views28 Folien
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx von
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxcurwenmichaela
3 views60 Folien
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx von
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxjasoninnes20
3 views11 Folien
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx von
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxrichardnorman90310
2 views90 Folien
Marriage quiz, myths exposed solutions offered von
Marriage quiz, myths exposed solutions offeredMarriage quiz, myths exposed solutions offered
Marriage quiz, myths exposed solutions offeredA R
232 views8 Folien
Reciprocal Inclinations of Smithies Presentation-5-4 von
Reciprocal Inclinations of Smithies Presentation-5-4Reciprocal Inclinations of Smithies Presentation-5-4
Reciprocal Inclinations of Smithies Presentation-5-4Eileen Fung
367 views19 Folien

Similar a The Pursuit of Happiness and Statistics(20)

Correlational research von Trung Le
Correlational researchCorrelational research
Correlational research
Trung Le2.5K views
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx von curwenmichaela
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
curwenmichaela3 views
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx von jasoninnes20
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
jasoninnes203 views
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx von richardnorman90310
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
Marriage quiz, myths exposed solutions offered von A R
Marriage quiz, myths exposed solutions offeredMarriage quiz, myths exposed solutions offered
Marriage quiz, myths exposed solutions offered
A R232 views
Reciprocal Inclinations of Smithies Presentation-5-4 von Eileen Fung
Reciprocal Inclinations of Smithies Presentation-5-4Reciprocal Inclinations of Smithies Presentation-5-4
Reciprocal Inclinations of Smithies Presentation-5-4
Eileen Fung367 views
Proposal for Predicting Job Satisfaction and Success-James Li von James Li
Proposal for Predicting Job Satisfaction and Success-James LiProposal for Predicting Job Satisfaction and Success-James Li
Proposal for Predicting Job Satisfaction and Success-James Li
James Li511 views
The following output tests three questions. 1) What is the effect of.pdf von sgambikaproducts
The following output tests three questions. 1) What is the effect of.pdfThe following output tests three questions. 1) What is the effect of.pdf
The following output tests three questions. 1) What is the effect of.pdf
Sunk Cost Fallacy Research Paper von Liz Hernandez
Sunk Cost Fallacy Research PaperSunk Cost Fallacy Research Paper
Sunk Cost Fallacy Research Paper
Liz Hernandez4 views
Correlation- an introduction and application of spearman rank correlation by... von Gunjan Verma
Correlation- an introduction and application of spearman rank correlation  by...Correlation- an introduction and application of spearman rank correlation  by...
Correlation- an introduction and application of spearman rank correlation by...
Gunjan Verma1.4K views
Resourcd File von Resourcd
Resourcd FileResourcd File
Resourcd File
Resourcd391 views
Mint And Chocolate Experiment Research von Andrea Jimenez
Mint And Chocolate Experiment ResearchMint And Chocolate Experiment Research
Mint And Chocolate Experiment Research
Andrea Jimenez12 views
G6 m1-a-lesson 1-t von mlabuski
G6 m1-a-lesson 1-tG6 m1-a-lesson 1-t
G6 m1-a-lesson 1-t
mlabuski553 views
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx von jasoninnes20
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
jasoninnes203 views
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx von curwenmichaela
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
curwenmichaela2 views

The Pursuit of Happiness and Statistics

  • 1. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 Kimberly Nguyen MA 576: Project Due: Friday 5/29/16 The Pursuit of Happiness and Statistics Introduction Everything we do in life is for the pursuit of happiness. We spend countless nights cramming for Professor Carvalho’s exams, recopying his notes, and attending every one of his office hours, meanwhile dreaming about graduating college with a high grade point average and having a high paying job. But is this how we achieve happiness? Is the isolation we feel from friends and family while we are studying in Mugar actually hindering our happiness? If we dedicate some of that time to having more sex, could we elevate our emotional state? And once we are done with college, how do we choose between a careers that we actually enjoy versus careers that pay well? Which of these factors truly determine one’s state of happiness? Based on a survey of 39 employed students in an MBA class at the University of Chicago, these are the questions we will attempt to answer. From these students, a total of five variables were collected. The response we are considering is level of happiness which was measured on a 10 point scale with 1 representing a suicidal state, 5 representing a state of just “surviving life”, and 10 representing a euphoric state. 4 potential predictors were collected: money, sex, love, and work. Money is a continuous variable measuring family income in thousands of dollars. Sex refers not to gender, but a 0, 1 dummy variable where 1 represents a satisfactory level of sexual activity. Love is a factor variable ranging from 1-3 measuring a student’s feeling of belonging in the context of family, friends, and community. 1 indicates a feeling of loneliness and isolation, 2 indicates the student has a few secure relationships, and 3 indicates a deep feeling of belonging. And lastly, work is a factor variable ranging from 1-5, with 1 indicating the student is seeking other employment, 3 being indifferent about their job, and 5 indicating that the student enjoys their job.1 In real life, I believe all four of these predictors are important contributors to one’s emotional state, thus my preliminary inference is that these will all be statistically significant in our model. Especially in the context of college students, these factors are even more of a priority. Students are still young enough to be financially dependent on their families, and thus the money variable, which measures their family’s income could still be a significant factor in these
  • 2. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 students’ lives. Love measures the social aspect of one’s life which is especially important in college years where one is developing the friendships they’ll have for the rest of their lives. And work is where these students spend a significant amount of time, and obtaining a good job is the reason students are event attending college, thus whether or not they enjoy their job should have significant effect on their state of happiness. If I was to throw any variable out, it would be sex because having satisfactory sex is not necessarily a priority shared amongst all college students. We will allow the data to verify or disprove these assumptions. Data Clean Up Due to the limited sample size, we had to do a bit of house cleaning with our data. Since we had so many variables/levels in our predictors, and only 39 observations, we had to collapse the levels of a majority of our predictors and our response to avoid sparsity. The response happiness went from having levels 1 through 10, to having 4 levels: 1-4, 5-6, 7-8, and 9-10. Money and sex were not changed. Love was transformed from having 3 levels to 2 levels: 1-2, and 3. Lastly, work went from having a scale of 1-5 to 1-2, 3, and 4-5. The levels still represent the same aspect of their respective variables. For example, in work 1-2 means the student is either looking for a new job or currently dislike your job, and 3 means the student is indifferent, and 4-5 means the student enjoys their job. Preliminary Data Analysis Looking at the pairwise graph between all the variables, we can gain some idea about the possible correlation between the predictors. Simply from observing the graph, it does not seem that there exists correlation between any of the predictors. This logically make sense because sex, family income, love, and work are all very different components of life that generally would not have an effect on the other.
  • 3. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 We will start by looking at the mosaic plots between the factor variables and our response. It seems that there is not difference in happiness level whether or not a student is having satisfactory sex. In fact, the effect of satisfactory sex on happiness seems to be counter intuitive to what one would think. There are more people who have satisfactory sex in the lower levels of happiness than people who don’t have satisfactory sex. Observing the mosaic plot between happiness and love, it is apparent that those who feel most loved, experience a higher level of happiness than those who feel less loved. Most of the people who are in love level 3 fall into levels 7-8 and 9-10 of happiness. There also seems to be a significant difference in level of happiness between the people who enjoy their jobs and those who don’t. A larger proportion of the people who don’t enjoy their jobs fall into the 1-4 level of happiness.
  • 4. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 And a larger proportion of people who enjoy their job fall into the 7-8 and 9-10 levels of happiness. Disqualification of Linear Model Running a linear model here would not make sense for multiple reasons. When running a linear model, we assume that our response is a continuous and normally distributed random variable. However, here our response is categorical, not continuous. And not only is it not continuous, but the response is ordinal, i.e. a categorical variable that has an order and this order is important information that should not be ignored. R does not even allow you to run a linear model on ordinal data. However, if we ignore all this and revert back to the response being on a scale from 1 to 10 and treat this as a continuous response, we can run a linear model. But even then, the assumption of a null plot of residuals is violated. The plot fans out left and right from the middle of the plot.
  • 5. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 Fitting the Model Fortunately, through the power of generalized linear models, we can correctly model ordinal data using a proportional odds model. After running a proportional odds model with happy as the response, and the 4 predictors- money, sex, love, and work we put that model through an AIC model selection, and end up with a model that predicts the probability of being in each level of happiness based on love and work. As I expected, sex was not a valuable variable in predicting the probability of being in a certain level of happiness. This agrees with our previous discussion where we claimed that having satisfactory sex is not a priority in many people’s lives, especially considering these are MBA student at a top ranking university who have a lot more things to worry about. Surprisingly, the amount of income a student’s family earns also does not have a significant hand in determining happiness level. Looking back, this makes sense as these are MBA students, not undergraduates and thus they have probably already financially separated themselves from their families. And they are also employed MBA students, further distancing their connection to their families’ incomes. We did not attempt to fit any interactions because qualitatively, it would not make sense for any of the predictors to interact with each other. As previously mentioned, in the preliminary
  • 6. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 data analysis, logically, satisfactory sex, family income, job satisfaction, and feeling of belonging do not generally not have an effect on each other. In addition we had no reason to believe we needed to transform any variables and thus no transformations were attempted. Any transformations would have complicated the interpretation of the model. Coefficient Interpretation In a proportional odds model, the probability of being at most in category j is: 𝜸𝒋 = P(y≤j) = 𝒍𝒐𝒈𝒊𝒕−𝟏 (𝜽𝒋 − 𝑿 𝑻 𝜷) And thus, the log odds of being at most in category j relative to not being in category j is: logit (𝜸𝒋)=log( 𝜸 𝒋 𝟏−𝜸 𝒋 ) =𝜽𝒋 − 𝑿 𝑻 𝜷 From this we can derive that the odds ratio is: 𝑶𝑹𝒋(𝑿 𝟏, 𝑿 𝟐) = 𝒐𝒅𝒅𝒔 𝒋(𝑿 𝟏) 𝒐𝒅𝒅𝒔 𝒋(𝑿 𝟐) =𝒆(𝑿 𝟐−𝑿 𝟏) 𝑻 𝜷 for all categories j and thus 𝒐𝒅𝒅𝒔𝒋(𝑿 𝟏)= 𝒐𝒅𝒅𝒔𝒋(𝑿 𝟐)* 𝒆(𝑿 𝟐−𝑿 𝟏) 𝑻 𝜷 We will interpret the coefficients in terms of log odds and odds ratios:  Intercepts o 1-4 | 5-6: An individual who feels the lowest level of love and belonging (love=1- 2), and dislikes/is looking for a new job (work =1-2) has a log odds of being at most at level 1-4 of 0.9121. o 5-6 | 7-8: An individual who feels the lowest level of love and belonging (love=1- 2), and dislikes/is looking for a new job (work=1-2) has a log odds of being at most at level 5-6 of happiness of 2.9392.
  • 7. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 o 7-8 | 9-10: An individual who feels the lowest level of love and belonging (love=1-2), and dislikes/is looking for a new job (work =1-2) has a log odds of being at most at level 7-8 of happiness of 8.5585.  LOVE3: Holding work constant, for a specific level of happiness, the log odds of being at most that happy decreases by 4.033 for people who have a higher sense of love and belonging (love=3) than those who have a lower sense of love and belonging (love=1-2). In other words, for a specific level of happiness, holding work constant, the odds of a person in love level 1-2 being at most in that that specific level of happiness is 𝑒4.033 times more than a person in love level 3. In simpler terms, this means that a person who has a higher sense of loving and belonging has a higher probability of being in a higher level of happiness.  WORK3: Holding love constant, for a specific level of happiness, the log odds of being at most that happy is 1.871 less for those who are indifferent about their current job (work = 3) versus those who either dislike or are looking for a new job (work =1-2). In other words, for a specific level of happiness, holding work constant, the odds of a person in work level 1-2 being at most in that specific level of happiness is 𝑒1.871 times more than a person in work level 3. In simpler terms, this means that a person who is indifferent about their current job has a higher probability of being in a high level of happiness than a person who is looking for a new job/dislikes their job.  Work4-5: Holding love constant, for a specific level of happiness, the log odds of being at most that happy is 3.448 less for those who enjoy their job (work=4-5) than those who dislike their job or are looking for new employment (work=1-2). In other words, for a specific level of happiness, holding work constant, the odds of a person in work level 1-2 being at most in that specific level of happiness is 𝑒3.448 times higher than a person in work level 4-5. In simpler terms, a person who enjoys their job has a higher probability of being in a higher happiness level than a person who dislikes or is looking for a new job. Significance of Coefficients We will now test the statistical significance of the coefficients. All tests in this paper will be tested at a significance level of .05.
  • 8. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889  Love3 𝑯 𝟎: 𝛽𝐿𝑜𝑣𝑒3 = 0 𝑯 𝟏: 𝛽𝐿𝑜𝑣𝑒3 ≠ 0 The conclusion from this test is that we reject the null and say that this coefficient is statistically significant. The coefficient being statistically different from zero means that, holding work constant, there is a significant difference in the log odds of being at most in a specific happiness level between individuals who feel more loved and less loved.  Work3 𝑯 𝟎: 𝛽 𝑊𝑜𝑟𝑘3 = 0 𝑯 𝟏: 𝛽 𝑊𝑜𝑟𝑘3 ≠ 0 The conclusion from this test is that we fail to reject the null and conclude that coefficient is not statistically significant. This means that there is not a significant difference in the log odds of being at most in a specific happiness level between individuals who are looking for a new job/dislike their job and individuals who are indifferent about their job.  Work4-5 𝑯 𝟎: 𝛽 𝑊𝑜𝑟𝑘4−5 = 0 𝑯 𝟏: 𝛽 𝑊𝑜𝑟𝑘4−5 ≠ 0 In this test, we reject the null and conclude that this coefficient is statistically significant. This means that there is a significant difference in the log odds of being at most in a specific happiness level between individuals who dislike/are looking for a new job versus those who actually enjoy their jobs. Test statistic: t= 3.204245 P-value: 0.00135417 Test statistic: t= 1.688251 P-value: 0.09136302 Test statistic: t= 2.945382 P-value: 0.003225561
  • 9. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 Goodness of Fit We will now conduct a goodness of fit test of our model. Asymptotically the deviance has a chi-square distribution. Although the size of our dataset is not ideal, we will continue with the test anyways: 𝑯 𝟎: Current model has an adequate fit 𝑯 𝟏: Saturated model Test statistic: 𝜒2 = 𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝑀𝑜𝑑𝑒𝑙-𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒𝑆𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑀𝑜𝑑𝑒𝑙=57.36982 - 0 =57.36982 Degrees of freedom: 𝐷𝐹𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝑀𝑜𝑑𝑒𝑙-𝐷𝐹𝑆𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑀𝑜𝑑𝑒𝑙=(n-p)-(n-n) = (39-6)-(39-39) = 33 P-value: 0.005355549 From the goodness of fit test, we strongly reject the null. Unfortunately this means that the model we fit is not sufficient in predicting the happiness level of a student based on love and work. This is highly likely due to the small sample size of the dataset. Prediction versus Observations In the following chart, we plotted the predictions for the probability of being in each level of happiness for each combination of love and work level, and plotted the actual observed proportions from our data. Agreeing with our goodness of fit test, a majority of our predictions were extremely off from the predicted value.
  • 10. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 Conclusion From the beginning there were quite a few issues with this dataset. The small size of the dataset, coupled with the relatively large quantity of groups that each person could be placed in created by the various combinations of the levels of predictors and response made our dataset extremely sparse. Even after combining levels of the predictors and response, we still ended up with groups that contained either few or no observations. As seen in the chart below under “Freq”, there were 24 total possible categories but 10 of them contained no observations.
  • 11. 1: George and McCulloch (1993) "Variable Selection via Gibbs Sampling" JASA, 88, 881-889 Unfortunately, I would hold the sparsity/small sample size of our data set responsible for the lack of fit we discovered in our model. To improve the model we need to collect more observations and perhaps more relevant predictors such as grade point average or state of health. Thus I would not rely on the specific numbers we obtained in our model to make any real predictions. Nonetheless, the qualitative conclusions that we can draw from this model can still be valuable. Making a lot of money is the end goal in a lot of people’s lives, however our model confirms that the cliché “money doesn’t buy you happiness” is true. Money is not a significant factor in determining happiness, in both our model and in our lives. In addition, whether or not one has satisfactory sex is not important in determining happiness either. An individual could have good sex, but still lack the emotional connection necessary to fully enjoy the sex. Thus it makes sense that the variable love was significant in our model. The variable love represented an individual’s feeling of love and belonging with their friends, family, and community. Knowing you have a group of people that you can trust and rely on is invaluable. Money cannot buy you that type of security. In addition, it make sense that the variable work was significant in our model. An individual generally spends 8 hours a day at their job. That’s half of the time we are awake each day. Thus whether or one enjoys one’s job has an adverse effect on a person’s mood. An individual could have a very high paying job, but the euphoria of depositing a paycheck does not last long enough to dull the pain of an unstimulating, monotonous, and unfulfilling job. Receiving high marks in school, making enough money to live comfortably, and having satisfying sex are all important parts of life. However, we should not let the pursuit of these things hinder us from the things that truly make us happy: friends, family, and community. Especially for those of us about to graduate college, we should keep this in mind when we are choosing between the career we want to pursue, and the career that pays a lot. If these two things do not coincide, then it may be more valuable to us to take the job that we would enjoy more. But we should bear in mind the most crucial factor in determining emotional state: sample size. Having an adequate sample size has a 100% chance of elevating a statistician’s level of happiness.