2. The use of methods or techniques
that employ numerical data for the
purpose of investigating and
understanding patterns of behavior
or natural phenomena.
3. Objective, i.e. results tend to be
independent from researcher bias.
Provides numerical representations of
behavior or phenomena being studied
Emphasis on consistency (and efficiency)
of results and not its validity
Statistical probabilities are at the core of
most(if not all) formal analyses
4. Examining central tendency and
general patterns in data
Moments (e.g. mean, variance)
Quantiles (e.g. 50 percentile or median)
Correlations (degree of association)
Histograms (distribution of variable)
Plots (e.g. scatter, connected lines, time
plots)
5. Examining differences between 2 or
more groups
Student t test (difference in mean values)
Mann-Whitney U test (difference in median values)
Chi-Squared test (difference in proportions)
ANOVA (>2 groups, 1 variable)
ANCOVA (ANOVA with covariates)
MANOVA (>2 groups, 2 or more variables)
MANCOVA (MANOVA with covariates)
Bayesian approaches (the first three above are
classical or “frequentist” approaches. The remaining
four can be used in both Bayesian and classical
contexts)
6. Exploring relationships
Cluster Analysis (forms groups using a set of
attributes)
Principal Component Analysis (identify patterns
in data with many dimensions, e.g. facial
recognition)
Factor Analysis (models the observed variables
using its relation to unobserved/latent factors,
e.g. intelligence, health)
Canonical Correlation (uses correlations
between variables to derive linear combos, e.g.
responses from two different personality tests)
Finite Mixture Models (regression based cluster
analysis, e.g. identify and explain the behavior of
the “sickly” vs. “healthy” amongst health care
users)
7. Testing a priori relationships and
making predictions
Method of Least Squares (minimize
the sum of squared residuals)
Maximum Likelihood Estimation
(maximize the likelihood function)
The above are common techniques for
conducting regression analyses
8. Types of data
Nominal (e.g.
gender, occupation, ethnicity/race)
Dichotomous (e.g. Yes/No responses, gender)
Ordinal (e.g. income range, education, level
of satisfaction)
Integer/count (e.g. visitation to park, class
absences, number of natural
disasters, accidents, and other
events/outcome data)
Continuous (e.g. age, height, weight)
9. Types of data sets
Cross-sectional (many individuals or
events in a single time period)
Time series (one individual or event
across many time periods)
Panel (many individuals or events
across many time periods)
10. Formulate a specific question that you want
answered (e.g. “What are the monetary
benefits from advanced degrees?”)
Identify the variables you need to conduct an
investigation
(wage, education, experience, other factors)
Formulate a model with the variables that can
be used to help answer your question, e.g.
wage = f(education, experience, other factors)
11. Formulate a hypothesis
Hypothesis: wage and education share a
positive relationship, i.e. benefits are
positive.
Use QA methods to test hypothesis
In classical inference, the statistical or “null”
hypothesis being tested is: ‘no
relationship’, against the alternative
hypothesis: ‘positive relationship’ using p-
values as a reference
In Bayesian inference, the ‘positive
relationship’ hypothesis is tested directly
against other hypotheses by evaluating its
relative likelihood.
12. Descriptive Statistics of Hourly Wage by Education
Highest Level of Education OBS Mean Std. Dev. Median Min Max
NO HIGH SCHOOL DEGREE 3752 11.18 6.05 9.75 0.13 76.04
HIGH SCHOOL DIPLOMA 12846 15.23 8.79 13.02 0.00 90.00
SOME COLLEGE BUT NO DEGREE 8518 16.20 10.14 13.50 0.00 89.00
ASSOCIATES DEGREE(OCCUP/VOC) 2305 18.43 9.61 16.34 0.59 76.04
ASSOCIATES DEGREE(ACADEMIC) 2224 18.92 10.49 16.42 0.03 95.00
BACHELOR'S DEGREE 8876 26.28 15.71 22.20 0.00 99.00
MASTER'S DEGREE 3343 31.60 16.94 27.87 0.00 104.39
PROFFESSIONAL DEGREE 657 41.43 22.17 36.99 2.20 95.96
DOCTORATES DEGREE 615 39.79 19.11 36.73 0.06 89.00
TOTAL 43136 19.72 13.80 15.46 0.00 104.39
Compiled using data from the October Current Population Survey (2005 to 2007)
Not everyone in this sample
were employed
13. Graphs help you
visualize relationships
25
20
Hourly Wage
15
10
Note: Experience = age — yrs. of schooling
—6
5
0 5 10 15 20 25 30 35 40 45 50 55 60
Experience
95% CI Fitted values
14. Regression Results: Wage = f(experience, education)
from Least Squares regression
Source SS df MS Number of obs = 43136
F( 10, 43125)= 1866.93
Model 2481252.4 10 248125.24 Prob > F= 0.000
Residual 5731550.5 43125 132.905519 R-squared= 0.3021
Adj R-squared= 0.302
Total 8212802.9 43135 190.397656 Root MSE= 11.528
WAGE Coef. Std. Error t-Statistic [95% Conf. Interval] Shows all
Experience EXPRNCE 0.900 0.0208 43.26 0.859 0.941
variables (age-
EXPRNCE_SQ -0.013 0.0004 -34.13 -0.014 -0.012
coefficients
y.o.s.-6)
HS DIPLOMA 3.772 0.2140 17.63 3.352 4.191 are significant
SOME COLLEGE 5.825 0.2264 25.73 5.381 6.269 (p-value<0.01)
ASS DEG(OCCUP) 6.785 0.3054 22.21 6.186 7.384
Education
ASS DEG(ACADEMIC) 7.228 0.3088 23.41 6.622 7.833
variables (NO
HS DEGREE is BACH DEGREE 15.168 0.2252 67.34 14.727 15.610
reference) MASTER DEGREE 19.908 0.2749 72.42 19.369 20.446
PROF DEGREE 30.475 0.4881 62.43 29.518 31.432
DOCT DEGREE 28.246 0.5019 56.27 27.262 29.229
Constant -1.509 0.3059 -4.93 -2.109 -0.910
These can be viewed as the partial
effects on wage from a unit change
in an independent variable.
15. Based on the results, an individual would
receive (on average) an additional:
• $19.91/hr for a master’s degree
• $28.25/hr for a doctoral degree
• $30.48/hr for a professional degree
compared to someone who does not
have a high school degree, i.e. benefits are
positive.
We should be cautious in taking the results
too seriously since we ignored “other factors”
such as occupation and gender, which have
been shown in previous studies to also affect
wages, i.e. results may be sensitive to model
specification
16. When transforming variables in your
data, be aware of the mathematical
consequences of:
Taking logarithm of zero or negative
values
Dividing by zero values
Taking the square root of negative values
The above appear as missing values when
attempted with most statistical packages.
Try rescaling variables when possible
17. Check the validity of your results using
common sense or findings from similar
studies. This may be harder to do when
conducting exploratory analysis.
Using Qualitative Analysis in tandem can help
in choosing the best approach and translating
the results into information that can be
applied in the real word, e.g. public policy.
Unfortunately, formal mixing of the two is
not yet common at advanced levels perhaps
because both methods require a certain
degree of training.
18.
19. One approach to help us decide is to evaluate what kind of
return we could expect from the $2 investment given the
overall odds of winning one of the nine possible prizes (1 in
31.8 or about 3%). For simplicity, let’s assume that there
are no other participants which would effect our expected
return. In general the expected return from participating
can be expressed as:
Expected Return = Pr(win)*(prize minus cost)
+ (1 − Pr(win))*(zero minus cost)
= Gain(win) + Loss(not win)
If Expected Return greater than zero (>0), then play; if
less than or equal to zero (≤0), then don’t play.
20. Gain(win) can be found by taking the weighted sum of all
prizes less the $2 cost, i.e. each prize minus $2 times the
probability of winning the particular prize all added
together.
Gain(win) = (1/175,223,510)*($40 mil − $2) +
(1/5,153,633)*($1 mil − $2) +
(1/648,976)*($10K − $2) +
(1/19,088)*($100 − $2) + (1/12,245)*($100 − $2) +
(1/360)*($7 − $2) + (1/706)*($7 − $2) +
(1/111)*($4 − $2) + (1/55)*($4 − $2)
= $0.53
21. Loss(not win) is simply .97*($0 − $2) = − $1.94, i.e. the
97% chance you will not win times the loss of $2.
Your expected return is therefore:
Expected Return = $0.53 − $1.94 = − $1.41 (<0)
So in the long run, you can expect to lose $1.41 from
playing this lottery and therefore, you should not play.
If the secondary prizes are fixed, how large does the jackpot need
to be in order for this lottery to be worth playing?
Answer: more than $287,728,678
Classical approach revolve around examining the probability of data, given a specific hypothesis. Bayesian approach revolve around examining the probability of various hypotheses, given the data.
Well, I hope this has been informative and has helped you decide whether you want to consider quantitative analysis in your research projects. I’ll take any final questions or comments. Thank you.