Z Score,T Score, Percential Rank and Box Plot Graph
Introduction to spss – part 1
1. Introduction to SPSS-Part 1
Vignes Gopal Krishna
Fast track PhD student, SLAI fellow, and Research
Assistant
University of Malaya
2. SPSS
• Statistical Package/Product for Social
Sciences(Economics, Sociology, Population
Studies, and etc)- Subjects – People/Society
• Statistical Package/Product for Sciences(SPS)
(Health Sciences, Neurosciences, Medical
Sciences, Economics, Sociology and etc)-Subjects
–People/Society/Patients/Animals/Neurons
3. • SPSS- Rows X Columns X Cells (RCC)
Rows – Subjects, Columns – Variables, Cells –
Values/Statements
SPSS = Main Inputs (DV-views) X Outputs (Results)
Additional inputs (Scripts & Syntax)
Advantages
• Deals with the process of quantifying qualitative data
• Numerical presentation of qualitative data (Descriptive and
Inferential Statistics)
• Deals with both parametric and non-parametric approaches
• Deals with Cross Sectional Data, Time Series Data, and Panel
Data
4. SPSS Layout
Rows
Cells
Columns
Icons
Menus
SPSS –Multi-dimensional Matrix
Will you be able to find the number of rows
and columns?
Data View
Variable View
5. Disadvantages
• Doesn’t deal with advanced mode of modeling and
quantitative techniques (Not possible by menus)
• Doesn’t deal with the advanced techniques of data
type.(Not possible by menus)
Common measurement
(a)Categorical variable (CAV)-Nominal & Ordinal
(b)Continuous variable (COV)-Scale(Ratio & Interval)
(c) String – Qualitative statements (Not important in
SPSS)-Nvivo, QDA-Miner, Dedoose, Atlas-TI, and etc
6. Classification variable = is a partial element of
categorical variable.
Classification variable-variable that is used to classify
qualitative arguments/statements – variable by
categories (Categorical variable) + variable by
statements (Non-Categorical variable)
Categorical variable
(a)Dichotomous variable (Binomial) – 2 values – NO /
OR – Independent & Dependent samples
(b)Polychotomous variables (Multinomial)- >2 values
– NO/OR –Independent & Dependent samples
7. Categorical variable
(a)constant and fixed
(b)Separated by categories
(c)Gradual change = 0, static
(d)Nominal (X order) and Ordinal (Order)/Rank
Continuous variables
(a) X constant and fixed
(b) Separated by ratios and intervals
(c) Gradual change !=0, dynamic
8. Types of Variables
(a) Bi + nary variable = 2 groups of variables (0 and 1) Examples: Gender(0=Male, 1=Female), Case and
Control(0=Healthy, 1=Disease), Fluctuations(0=Increase, 1=Decrease.
(b) Dichotomous variable = 2 groups of variables(can be any 2 values) Examples:Gender(2=Male,3=Female), Case
and control(0=Before Treatment,1=Present Treatment)
(c) Independent variable = stand alone variable-Corx1,x2,x3 = 0 – Predictor/Regressor/Indicator
(d) Dependent variable = relying on factors –Cory,x1,x2 !=0)-Predictand/Regressand/Outcome
(e) Confounding variable = distorts the effects of one variable on another. -expansion of matching – reduces the
effects of confounding.
(f) Control variable –controls the effects of IV on DV.
(g) Controlled variable – another term of Dependent Variable
(h) Instrumental variable –variable that has zero correlation with residuals/error terms, but, has correlation with
dependent variable
(i) Criterion variable – a variable that has presumed effect –Non-experimental research
(j) Discrete variable – a variable that takes up distinct values
(k) Dummy variable – similar as binary variable –classification variable
(l) Endogeneous variable – inside the system-influenced by variables that are entering into the system.
(m) Exogeneous variable – outside the system- entering the systm-influencing the endogeneous variable
(n) Interval variable – a form of scale variable
(o) Ratio variable – a form of scale variable
(p) Intervening variable – intervene the association between the main variables. –moderating and mediating
variables
(q) Mediating variable – Indirect effect on the association between the main variables
(r) Moderating variable – indirect effect through interaction effects between related variables
9. (s)Polychotomous variables – take up more than 2
values/groups
(t)Manifest variable – indicator variable that can indicate the
presence of latent variable
(u)Latent variable –variable that cannot be measured directly
– it has to depend on manifest variables.
(v)Manipulated variable – Similar as IV
(w)Outcome variable – Similar as DV-presumed effect
(x)Predictor variable – Similar as IV-presumed cause
(y) Nominal variable – takes up any value – doesn’t follow
orders/ranks
(z) Ordinal variable –takes up values based on orders/ranks.
* Treatment variable – Similar as IV
10. Types of Quantitative Data
(a)Time Series Data –data follows the series of timing – single
country/industry/activity/firm/organization/stock
market/society and etc – multiple sampling periods
(b) Cross Sectional Data – data follows the cross evaluations of
various forms of
subjects(countries/industries/activities/firms)-single point of
time
(c) Panel Data – Time Series Data + Cross Sectional Data – with
different characteristics
(d) Pooled Data – Combined version of data – with similar
characteristics
(e) Longitudinal Data – Wider scope of data – variation of
timing
11. Types of Qualitative Data
(a)Factual Data – Demographical Data(Marital
Status, Level of Education, Age, Position and etc)-
(Experimental and Non-experimental Data) –
Yes/No versus Yes/No/Don’t know
True or False
Which one is more
(b)Positive and Normative Data – Actual preferable?
versus
predicted, Agreement to Disagreement, Likes to
Dislikes
(c) Logical Arguments – True or False
(d) Boolean Statements – AND, OR, NOT
12. Likert Scale(LS) and Scale(S)
LS != S
For example:-
5 Levels of Likert Scale
1=Strongly Agree
2=Agree
3=Neither Agree nor Disagree
4=Disagree
5=Strongly Disagree
In a normal case, Scale refers
to ratio or interval?
13. Sample and Population
The association between Sample and Population can
be seen in the context of Donut
Which one is good?
“RVRCNB” Approach
14. Parameter and Statistics
Parameter = Population(Actual)
Statistics = Sample(Prediction)
Y=β0 + β1X1 + β2X2 + ε (Parameter)
PY=Pβ0 + Pβ1X1 + Pβ2X2 + Pε (Statistics)
Statistics ~ Parameter (Actual Population is
Unknown)-estimated Population
15. Descriptive and Inferential Statistics
*For quantitative mode of single/multi-purposes
*Descriptive = Describe + Narrative(Describing subjects) – Single Purpose(SP)
*Inferential = Investigation + Narrative(Investigating subjects) –Multi Purposes(MP)
Descriptive Analysis – Quantitative research
(a) Descriptive Statistics (Continuous variables)-[Mean, Median, Variance, Standard
deviation, Max, Min , Range, skewness, kurtosis, Standard error of mean, Histogram
with normal curve, Normal Q-Q plot, Normal P-P plot – Uni-variate
(b) Frequency Distribution(Categorical variables)-[Mode(similar as frequency), Median,
Variance and Standard Deviation, Max, Min, Range]-Uni-variate
Inferential Analysis – Quantitative research
(a) Normality tests -hypothesis testing – SPSS(Shapiro Wilk and Kolmogorov-Smirnov)
(b) Non-normality tests – hypothesis testing – SPSS(One Sample Kolmogorov Smirnov
tests for uniform, Poisson, and Exponential distributions)-Others are possible through
Scripts and Syntax
(c) Mean differences – Single mean test, One sample t-test, Two samples (Independent
and Dependent sample tests)
(d) Association – Linear and Non-Linear modes of regressions
(e) Correlation – Linear and Non-Linear modes of correlations
16. Types of Samplings
All the research starts with a single or multiple
purposes……..Purposive Sampling
Additional types of samplings
(a)Simple random sampling – samples that have been selected
randomly-equal chance of probability –unbiased sampling
(b)Systematic sampling – samples that have been selected
from ordered sampling frame
(c)Stratified sampling –sampling mode that are divided into
homogeneous subgroups
(d) Cluster sampling – sampling that deals with the division of
it into groups that deals with the similar characteristics.
(e)Convenience sampling – Easy sampling – choose groups of
interest.
What type
of
research?
17. Sampling with replacement and no
replacement
*Are tied up with the probability of sample selection.
*For example:
Let’s say that we have some alphabets with us(A,B, C,D,E)……
(a)Sampling with replacement – Select one alphabet first and put it back into the sample space. Two alphabets were
chosen. The sample space can be presented as below:-
AA, AB, AC, AD, AE
BA, BB, BC, BD, BE
CA, CB, CC, CD, CE
DA, DB,DC, DD, DE
EA, EB, EC, ED, EE
The probability of choosing at least one Alphabet “A”, [AA,AB,AC, AD,AE,BA,CA, DA, EA], Probability=9/25=0.36
(b)Sampling without replacement –Select one alphabet first and do not put it again in the sample space. We cannot
select the same alphabets.We can just use the previous example in which two alphabets were chosen. The sample
space can be reflected as below:-
AA, AB, AC, AD, AE
BA, BB, BC, BD, BE
CA, CB, CC, CD, CE
DA, DB,DC, DD, DE
EA, EB, EC, ED, EE
The probability of choosing at least one alphabet “A”, [AB, AC, AD, AE, BA, CA, DA, EA]. Probability=8/20 = 0.4
18. Dependent and Independent Samples
Dependent Samples – Same subjects at different
levels (Very Highly Correlated)
Independent Samples – Different subjects at same
and different levels.(Low and Moderate
Correlations)
Population
1
Sample
1
Sample
2
Population
1
Sample
3
Independent and
Dependent samples
Sample
4
19. Sample Size
• Should be representative of population size(N)
• In a general/normal case, n >= pN(p=0.5 and above)
• Manual computations of sample size(n)
Margin of errors/Standard errors in percentage (when
population size is unknown)
ME z PP(1 PP) / n
2 2 n z PP(1 PP) /ME
Computation of sample size with finite population correction factor
n= n(N)/n + (N-1)
20.
21.
22. Useful Software to deal with the
selection of sample size
(a) G*Power (http://www.gpower.hhu.de/)
(b) Power sample
size(http://biostat.mc.vanderbilt.edu/wiki/M
ain/PowerSampleSize)
(c) Power Analysis & Sample Size
(http://www.ncss.com/software/pass/)
24. Introduction
The terms of “parametric” and “non-parametric”
were coined by Jacob Wolfowitz in the year of
1942.
Parametric – (distribution is known)
Non-parametric –(distribution is unknown)
In my point of view, I would say that it is just a
general thought of statistics and it should be used
as a benchmark or baseline on the development
of various statistical modes of intellectual
thoughts on the statistical tests.
25. Characteristics of parametric approach
(a)Data – follows the probability distribution
(b) Tied up with probability mode of sampling type (Simple random sampling,
Stratified random sampling, systematic random sampling, random cluster,
stratified random cluster, Complex Multi-stage Random, Random mode of
purposive sampling)
(c)Deals with the statistical inferences on the distributions of parameters
(d) Always linked with linearity of data(variables and
errors/residuals(uncertainty))
(e) Patterns of data(variables and errors/residuals follows the style of
homogeneity)
(f) Follows strict forms of assumptions (robust = if the assumptions are
fulfilled)
I would classify this approach as the classical approach due to the fact that it
doesn’t the evolutionary direction of momentum.
26. Assumptions of parametric approach
(a)Linearity of parameters
(b)Homogeneity/Homogeneous mode of existing variables and
omitted variables(error terms/residuals)-symmetrical form of
distribution.
(c)Dependent variables /residuals should be normally distributed.
(d) Randomness among the selected samples should be maintained
(only if it has got to do with random sampling)
(e)Expansionary use of non-categorical variables(continuous variables)
in the statistical tests.
(f) Minimization of outliers
(g) Mean, Mode, and Median of the variables are approximately the
same (for the case of normal distribution)-Bell Shaped Normal
Curve.
(h) Doesn’t deal with the process of re-sampling(Bootstrapping)
27. Identification on the statistical approach is an
important step that should be taken before
moving to existing forms of statistical tests.
Distributional tests are needed to determine the
nature of data(variables and residuals)
In a simple context,
Parametric – follows normal distribution
Non-parametric – follows free distribution
28. Distribution tests of normality
Graphical approach
(a) Histogram with normal curve
(b) Box plot
(c) Normal Q-Q plot
(d)Normal P-P plot
(e) Leverage Plot
29. Numerical approach
Uni-variate tests
(a) Jarque Bera test
(b) Coefficient of variations
(c) Coefficient of Skewness and Kurtosis
(d) Kolmogorov-Smirnov test
(e) Shapiro-Wilk test
(f) Shapiro-Francia test
(g) Anderson-Darling test
Multi-variate tests
(a) Multivariate tests of normality
30. Parametric tests of correlation
(a)Pearson product moment correlation coefficient (Bivariate analysis)
(b) Stepwise mode of linear regression (Multivariate analysis)
(c) Auxiliary mode of linear regression (Multivariate analysis)
(d) Scatter plot /Scatterplot matrix with fitness line(linear form) (Bivariate
analysis)
Non-parametric tests of correlation
(a) Spearman rank correlation (Bivariate analysis)
(b) Kendall Tau’s rank correlation (Bivariate analysis)
(c) Stepwise mode of Non-linear regression (Multivariate analysis)
(d) Auxiliary mode of Non-Linear regression (Multivariate analysis)
(e) Scatter plot/Scatterplot matrix with fitness line(Non-Linearity form)
(Bivariate analysis)
31. Parametric tests of associations
(a) Linear regression (Bivariate and Multivariate)
(b) Stepwise mode of Linear regression(Bivariate and Multivariate)
(c) Auxiliary mode of Linear regression(Bivariate and Multivariate)
(d) Linear mode of co-integration tests
(e) Linear mode of causality tests
Non-parametric tests of associations
(a) Non-Linear regression (Bi-variate and Multivariate)
(b) Logistic regression (LR) –DV(categorical variable)
*Ordered LR (Ordinal variable)
* Un-ordered LR (Nominal variable)
(c) Correspondence Analysis
independent sample (Pearson Chi-Square, Contingency Coefficient
(Nominal),Phi-Cramer’s V(Nominal), Lambda (Nominal)
32. Main features of SPSS –Inferential
Statistics
Regression
Parametric
Linear Regression
Linear Curve
Estimation
Linear Weight Estimation
& Different types of
estimation
Probit Regression
Tobit Regression
Linear mode of
Scatter plot
Simultaneous
regression
Non-Parametric
Non-Linear
Regression
Non-Linear Curve
Estimation
Non-Linear Weight
Estimation & Different
types of estimation
Linear mode of Leverage
Plot and residual plot
33. Non-Parametric
Regression
Logit Regression
Non-Linear mode of
Scatter Plot
Non-Linear mode of Leverage
plot and Residual plot
Non-Linear mode of
Simultaneous equation
Parametric
correlation
Pearson correlation
Linear Mode of Stepwise
Regression
Linear Mode of Auxiliary
regression
VIF & Tolerance Value
Linear mode of
Scatter Plot
35. Parametric mode of
testing on differences
Single test of mean
One sample t-test
PM
Two sample t-test
Dependent Samples
*Paired sample t-test
*ANOVA repeated
measures
Independent Samples
*Independent Sample t-test
*ANOVA –one way/two
way/multiple factors
*MANOVA, GANOVA, SPANOVA,
ANCOVA, MANCOVA,SPANCOVA
36. Non-Parametric mode
of testing on
differences
Chi-Square test
2 sample tests
Dependent samples
Binomial test
*Wilcoxon test
*Sign test
*McNemar test
•Marginal
Homogeneity
•*Friedman test
•*Kendall’s W test
•*Cochran’s Q test
Independent samples
*Mann Whitney U test
*Moses extreme reactions
*Kolmogorov-Smirnov Z
*Wald-Wolfowitz runs test
*Kruskal –Wallis H test
*Median test
*Jonckheere-Terpstra test