SlideShare ist ein Scribd-Unternehmen logo
1 von 100
UNIT 5
Topics to be covered
• Unit-V:
Data Analysis: Editing, Coding, Tabular representation of data, Graphical
Representation of Data.
• Questionnaire Construction, Content Analysis, Validity and Reliability Test.
• Descriptive Statistics and Probability: Measures of Central Tendency,
Dispersion, Skewness & Kurtosis, Probability and Laws, Random Variable,
Expectation.
• Probability Distribution and Sampling: Discrete, Binomial, Poisson,
Continuous, Normal Sampling Distribution, Statistical Estimation.
• Multivariate Data Analysis: Factor Analysis, Cluster Analysis, Discriminant
Analysis, Multi- Dimensional Scaling, Conjoint Analysis.
Fieldwork/Data Collection Process
Selecting Field Workers
Training Field Workers
Supervising Field Workers
Validating Fieldwork
Evaluating Field Workers
Selection of Field Workers
The researcher should:
• Develop job specifications for the project, taking into
account the mode of data collection.
• Decide what characteristics the field workers should
have.
• Recruit appropriate individuals.
General Qualifications of Field Workers
• Healthy. Field workers must have the stamina required
to do the job.
• Outgoing. The interviewers should be able to establish
rapport with the respondents.
• Communicative. Effective speaking and listening skills
are a great asset.
• Pleasant appearance. If the field worker's physical
appearance is unpleasant or unusual, the data collected
may be biased.
• Educated. Interviewers must have good reading and
writing skills.
• Experienced. Experienced interviewers are likely to do
a better job.
Training of Field Workers
• Making the Initial Contact – Interviewers should be trained to make
opening remarks that will convince potential respondents that their
participation is important.
• Asking the Questions
1. Be thoroughly familiar with the questionnaire.
2. Ask the questions in the order in which they appear in the
questionnaire.
3. Use the exact wording given in the questionnaire.
4. Read each question slowly.
5. Repeat questions that are not understood.
6. Ask every applicable question.
7. Follow instructions, skip patterns, probe carefully.
Training of Field Workers
• Probing – Some commonly used probing
techniques:
1. Repeating the question.
2. Repeating the respondent's reply.
3. Using a pause or silent probe.
4. Boosting or reassuring the respondent.
5. Eliciting clarification.
6. Using objective/neutral questions or comments.
Training of Field Workers
• Recording the Answers – Guidelines for recording answers to
unstructured questions:
1. Record responses during the interview.
2. Use the respondent's own words.
3. Do not summarize or paraphrase the respondent's answers.
4. Include everything that pertains to the question objectives.
5. Include all probes and comments.
6. Repeat the response as it is written down.
• Terminating the Interview – The respondent should be left with a positive
feeling about the interview.
Guidelines on Interviewer Training: The Council of American Survey Research Organizations
Training should be conducted under the direction of supervisory personnel and should cover the following:
1) The research process: how a study is developed, implemented & reported.
2) Importance of interviewers; need for honesty, objectivity & professionalism.
3) Confidentiality of the respondent & client.
4) Familiarity with market research terminology.
5) Importance of following the exact wording & recording responses verbatim.
6) Purpose & use of probing & clarifying techniques.
7) The reason for & use of classification & respondent information questions.
8) A review of samples of instructions & questionnaires.
9) Importance of the respondent’s positive feelings about survey research.
An interviewer must be trained in the interviewing techniques outlined above.
Guidelines on Supervision: The Council of American Survey Research Organizations
All research projects should be properly supervised. It is the data collection agency’s responsibility to:
1) Properly supervise interviews.
2) See that an agreed-upon proportion of interviewers’ telephone calls are monitored.
3) Be available to report on the status of the project daily to the project director, unless otherwise instructed.
4) Keep all studies, materials, and findings confidential.
5) Notify concerned parties if the anticipated schedule is not met.
6) Attend all interviewer briefings.
7) Keep current & accurate records of the interviewing progress.
8) Make sure all interviewers have all materials in time.
9) Edit each questionnaire.
10) Provide consistent & positive feedback to the interviewers.
11) Not falsify any work.
Supervision of Field Workers
Supervision of field workers means making sure that they are following the procedures and techniques in which
they were trained. Supervision involves quality control and editing, sampling control, control of cheating, and central
office control.
• Quality Control and Editing – This requires checking to see if the field procedures are being
properly implemented.
• Sampling Control – The supervisor attempts to ensure that the interviewers are strictly
following the sampling plan.
• Control of Cheating – Cheating can be minimized through proper training, supervision, and
validation.
• Central Office Control – Supervisors provide quality and cost-control information to the central
office.
Validation of Fieldwork
Validation:
• The supervisors call 10 - 25% of the respondents to inquire whether the field
workers actually conducted the interviews.
• The supervisors ask about the length and quality of the interview, reaction to
the interviewer, and basic demographic data.
• The demographic information is cross-checked against the information
reported by the interviewers on the questionnaires.
Evaluation of Field Workers
• Cost and Time. The interviewers can be compared in terms of the total cost (salary and expenses) per
completed interview.
• Response Rates. It is important to monitor response rates on a timely basis so that corrective action can be
taken if these rates are too low.
• Quality of Interviewing. To evaluate interviewers on the quality of interviewing, the supervisor must directly
observe the interviewing process.
• Quality of Data. The completed questionnaires of each interviewer should be evaluated for the quality of data.
Data Preparation Process
Select Data Analysis Strategy
Prepare Preliminary Plan of Data Analysis
Check Questionnaire
Edit
Code
Transcribe
Clean Data
Statistically Adjust the Data
Questionnaire Checking
A questionnaire returned from the field may be unacceptable for several reasons.
• Parts of the questionnaire may be incomplete.
• The pattern of responses may indicate that the respondent did not understand or follow the instructions.
• The responses show little variance.
• One or more pages are missing.
• The questionnaire is received after the preestablished cutoff date.
• The questionnaire is answered by someone who does not qualify for participation.
Editing
Treatment of Unsatisfactory Results
• Returning to the Field – The questionnaires with unsatisfactory responses may be
returned to the field, where the interviewers recontact the respondents.
• Assigning Missing Values – If returning the questionnaires to the field is not feasible,
the editor may assign missing values to unsatisfactory responses.
• Discarding Unsatisfactory Respondents – In this approach, the respondents with
unsatisfactory responses are simply discarded.
Coding
Coding means assigning a code, usually a number, to each possible response to each question. The code includes
an indication of the column position (field) and data record it will occupy.
Coding Questions
• Fixed field codes, which mean that the number of records for each respondent is the same and the same data appear
in the same column(s) for all respondents, are highly desirable.
• If possible, standard codes should be used for missing data. Coding of structured questions is relatively simple,
since the response options are predetermined.
• In questions that permit a large number of responses, each possible response option should be assigned a separate
column.
Coding
Guidelines for Coding Unstructured Questions:
• Category codes should be mutually exclusive and collectively exhaustive.
• Only a few (10% or less) of the responses should fall into the “other” category.
• Category codes should be assigned for critical issues even if no one has
mentioned them.
• Data should be coded to retain as much detail as possible.
Codebook
A codebook contains coding instructions and the necessary information about
variables in the data set. A codebook generally contains the following
information:
• column number
• record number
• variable number
• variable name
• question number
• instructions for coding
Coding Questionnaires
• The respondent code and the record number appear on each record in the data.
• The first record contains the additional codes: project code, interviewer code,
date and time codes, and validation code.
• It is a good practice to insert blanks between parts.
ID PREFER. QUALITY QUANTITY VALUE SERVICE INCOME
1 2 2 3 1 3 6
2 6 5 6 5 7 2
3 4 4 3 4 5 3
4 1 2 1 1 2 5
5 7 6 6 5 4 1
6 5 4 4 5 4 3
7 2 2 3 2 3 5
8 3 3 4 2 3 4
9 7 6 7 6 5 2
10 2 3 2 2 2 5
11 2 3 2 1 3 6
12 6 6 6 6 7 2
13 4 4 3 3 4 3
14 1 1 3 1 2 4
15 7 7 5 5 4 2
16 5 5 4 5 5 3
17 2 3 1 2 3 4
18 4 4 3 3 3 3
19 7 5 5 7 5 5
20 3 2 2 3 3 3
Restaurant Preference
SPSS Variable View of the Data
Codebook Excerpt
Column
Number
Variable
Number
Variable
Name
Question
Number
Coding
Instructions
1 1 ID 1 to 20 as coded
2 2 Preference 1 Input the number circled.
1=Weak Preference
7=Strong Preference
3 3 Quality 2 Input the number circled.
1=Poor
7=Excellent
4 4 Quantity 3 Input the number circled.
1=Poor
7=Excellent
5 5 Value 4 Input the number circled.
1=Poor
7=Excellent
6 6 Service 5 Input the number circled.
1=Poor
7=Excellent
Column
Number
Variable
Number
Variable
Name
Question
Number
Coding
Instructions
7 7 Income 6 Input the number circled.
1 = Less than $20,000
2 = $20,000 to 34,999
3 = $35,000 to 49,999
4 = $50,000 to 74,999
5 = $75,000 to 99,999
6 = $100,00 or more
Codebook Excerpt (Cont.)
Example of Questionnaire Coding
Finally, in this part of the questionnaire we would like to ask you some background information for
classification purposes.
PART D Record #7
1. This questionnaire was answered by (29)
1. _____ Primarily the male head of household
2. _____ Primarily the female head of household
3. _____ Jointly by the male and female heads of household
2. Marital Status (30)
1. _____ Married
2. _____ Never Married
3. _____ Divorced/Separated/Widowed
3. What is the total number of family members living at home? _____ (31 - 32)
4. Number of children living at home:
a. Under six years _____ (33)
b. Over six years _____ (34)
5. Number of children not living at home _____ (35)
6. Number of years of formal education which you (and your spouse, if
applicable) have completed. (please circle)
College
High School Undergraduate Graduate
a. You 8 or less 9 10 11 12 13 14 15 16 17 18 19 20 21 22 or more (36-37)
b. Spouse 8 or less 9 10 11 12 13 14 15 16 17 18 19 20 21 22 or more (37-38)
7. a. Your age: (40-41)
b. Age of spouse (if applicable) (42-43)
8. If employed please indicate your household's occupations by checking the
appropriate category.
44 45
Male Head Female Head
1. Professional and technical
2. Managers and administrators
3. Sales workers
4. Clerical and kindred workers
5. Craftsman/operative /laborers
6. Homemakers
7. Others (please specify)
8. Not applicable
9. Is your place of residence presently owned by household? (46)
1. Owned _____
2. Rented _____
10. How many years have you been residing in the greater Atlanta area?
years. (47-48)
Data Transcription
Transcribed Data
CATI/
CAPI
Keypunching via
CRT Terminal
Digital
Tech.
Optical
Recognition
Bar Code &
Other
Technologies
Verification: Correct
Keypunching Errors
Disks Other
Storage
Computer
Memory
Raw Data
Data Cleaning Consistency Checks
Consistency checks identify data that are out of range, logically inconsistent,
or have extreme values.
• Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to
identify out-of-range values for each variable and print out the respondent code, variable
code, variable name, record number, column number, and out-of-range value.
• Extreme values should be closely examined.
Data Cleaning Treatment of Missing Responses
• Substitute a Neutral Value – A neutral value, typically the mean response to the variable, is substituted
for the missing responses.
• Substitute an Imputed Response – The respondents' pattern of responses to other questions are used to
impute or calculate a suitable response to the missing questions.
• In casewise deletion, cases, or respondents, with any missing responses are discarded from the analysis.
• In pairwise deletion, instead of discarding all cases with any missing values, the researcher uses only
the cases or respondents with complete responses for each calculation.
Selecting a Data Analysis Strategy
Known Characteristics of the Data
Data Analysis Strategy
Properties of Statistical Techniques
Background and Philosophy of the Researcher
Measures of Center and Location
Center and Location
Mean Median Mode Weighted Mean
N
x
n
x
x
N
i
i
n
i
i







1
1







i
i
i
W
i
i
i
W
w
x
w
w
x
w
X
Overview
Mean (Arithmetic Average)
• The Mean is the arithmetic average of data values
• Population mean
• Sample mean
n = Sample Size
N = Population Size
n
x
x
x
n
x
x n
n
i
i






 
2
1
1
N
x
x
x
N
x
N
N
i
i







 
2
1
1
Mean (Arithmetic Average)
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
(continued)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
3
5
15
5
5
4
3
2
1






4
5
20
5
10
4
3
2
1






Median
• In an ordered array, the median is the “middle”
number, i.e., the number that splits the distribution
in half
• The median is not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median
• To find the median, sort the n data values from
low to high (sorted data is called a data array)
• Find the value in the i = (1/2)n position
• The ith position is called the Median Index Point
• If i is not an integer, round up to next highest
integer
(continued)
Median Example
• Note that n = 13
• Find the i = (1/2)n position:
i = (1/2)(13) = 6.5
• Since 6.5 is not an integer, round up to 7
• The median is the value in the 7th position:
Md = 12
(continued)
Data array:
4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24
Mode
• A measure of location
• The value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
0 1 2 3 4 5 6
No Mode
Shape of a Distribution
• Describes how data is distributed
• Symmetric or skewed
Mean = Median
Mean < Median Median < Mean
Right-Skewed
Left-Skewed Symmetric
(Longer tail extends to left) (Longer tail extends to right)
Weighted Mean
• Used when values are grouped by frequency or
relative importance
Days to
Complete
Frequency
5 4
6 12
7 8
8 2
Example: Sample of 26 Repair
Projects
Weighted Mean Days to
Complete:
days
6.31
26
164
2
8
12
4
8)
(2
7)
(8
6)
(12
5)
(4
w
x
w
X
i
i
i
W
















• Mean is generally used, unless extreme
values (outliers) exist
• Then Median is often used, since the
median is not sensitive to extreme
values.
• Example: Median home prices may be
reported for a region – less sensitive to
outliers
Which measure of location
is the “best”?
Measures of Central Tendency: Ungrouped
Data
• Measures of central tendency yield information about “particular
places or locations in a group of numbers.”
• Common Measures of Location
• Mode
• Median
• Mean
Mean of Grouped Data
• average of class midpoints
• Class frequencies
 


      
      



fM
f
fM
N
f M f M f M f M
f f f f
i i
i
1 1 2 2 3 3
1 2 3
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
   


fM
f
2150
50
43 0
.
Median of Grouped Data
Mode of Grouped Data
• Midpoint of the modal class
• Modal class has the greatest frequency
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Mean, Median and Mode
• Q. The frequency distribution below represents the weights in pounds
of a sample of packages carried last month by a small airfreight
company.
Class 10-10.9 11-11.9 12-12.9 13-13.9 14-14.9 15-15.9 16-16.9 17-17.9 18-18.9 19-19.9
Frequency 1 4 6 8 12 11 8 7 6 2
Compute sample mean, median and mode.
• The frequency distribution represents the salary (in Rupees) of an
MNC employees for last year.
Mean, Median and Mode
Class
(Rupee in
hundreds)
0–
49.99
50.00–
99.99
100.00–
149.99
150.00–
199.99
200.00–
249.99
250.00–
299.99
300.00–
349.99
350.00–
399.99
400.00–
449.99
450.00–
499.99
Frequency 78 123 187 82 51 47 13 9 6 4
Compute mean, median and mode salary.
Measures of Variation
Variation
Variance Standard Deviation
Population
Variance
Sample
Variance
Population
Standard
Deviation
Sample
Standard
Deviation
Range
• Measures of variation give information on
the spread or variability of the data values.
Variation
Same center,
different variation
Range
• Simplest measure of variation
• Difference between the largest and the smallest observations:
Range = xmaximum – xminimum
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
• Ignores the way in which data are distributed
• Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Disadvantages of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
• Average of squared deviations of values from the mean
• Population variance:
• Sample variance:
Variance
N
μ)
(x
σ
N
1
i
2
i
2




1
-
n
)
x
(x
s
n
1
i
2
i
2




Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data
• Population standard deviation:
• Sample standard deviation:
N
μ)
(x
σ
N
1
i
2
i




1
-
n
)
x
(x
s
n
1
i
2
i




Introduction to Probability
Distributions
• Random Variable
• Represents a possible numerical value from a random
event
• Takes on different values based on chance
Random
Variables
Discrete
Random Variable
Continuous
Random Variable
• A discrete random variable is a variable that can
assume only a countable number of values
Many possible outcomes:
• number of complaints per day
• number of TV’s in a household
• number of rings before the phone is answered
Only two possible outcomes:
• gender: male or female
• defective: yes or no
• spreads peanut butter first vs. spreads jelly first
Discrete Random Variable
Continuous Random Variable
• A continuous random variable is a variable that can
assume any value on a continuum (can assume an
uncountable number of values)
• thickness of an item
• time required to complete a task
• temperature of a solution
• height, in inches
• These can potentially take on any value, depending
only on the ability to measure accurately.
Discrete Random Variables
• Can only assume a countable number of values
Examples:
• Roll a die twice
Let x be the number of times 4 comes up
(then x could be 0, 1, or 2 times)
• Toss a coin 5 times.
Let x be the number of heads
(then x = 0, 1, 2, 3, 4, or 5)
Experiment: Toss 2 Coins. Let x = # heads.
T
T
Discrete Probability Distribution
4 possible outcomes
T
T
H
H
H H
Probability Distribution
0 1 2 x
x Value Probability
0 1/4 = .25
1 2/4 = .50
2 1/4 = .25
.50
.25
Probability
Probability Distributions
Continuous
Probability
Distributions
Binomial
Poisson
Probability
Distributions
Discrete
Probability
Distributions
Normal
Continuous Probability Distributions
• A continuous random variable is a variable that can
assume any value on a continuum (can assume an
uncountable number of values)
• thickness of an item
• time required to complete a task
• temperature of a solution
• height, in inches
• These can potentially take on any value, depending
only on the ability to measure accurately.
Factor Analysis
• Factor analysis is a general name denoting a class of procedures primarily used for
data reduction and summarization.
• Factor analysis is an interdependence technique in that an entire set of
interdependent relationships is examined without making the distinction between
dependent and independent variables.
• Factor analysis is used in the following circumstances:
• To identify underlying dimensions, or factors, that explain the
correlations among a set of variables.
• To identify a new, smaller, set of uncorrelated variables to
replace the original set of correlated variables in subsequent
multivariate analysis (regression or discriminant analysis).
• To identify a smaller set of salient variables from a larger set for
use in subsequent multivariate analysis.
Factor Analysis Model
Mathematically, each variable is expressed as a linear combination
of underlying factors. The covariation among the variables is
described in terms of a small number of common factors plus a
unique factor for each variable. If the variables are standardized,
the factor analysis model may be represented as:
Xi = Ai 1F1 + Ai 2F2 + Ai 3F3 + . . . + AimFm + ViUi
where
Xi = i th standardized variable
Aij = standardized multiple regression coefficient of
variable i on common factor j
F = common factor
Vi = standardized regression coefficient of variable i on
unique factor i
Ui = the unique factor for variable i
m = number of common factors
Factor Analysis Model
The unique factors are uncorrelated with each other and with the
common factors. The common factors themselves can be expressed as
linear combinations of the observed variables.
Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk
Where:
Fi = estimate of i th factor
Wi = weight or factor score coefficient
k = number of variables
Factor Analysis Model
• It is possible to select weights or factor score
coefficients so that the first factor explains the largest
portion of the total variance.
• Then a second set of weights can be selected, so that
the second factor accounts for most of the residual
variance, subject to being uncorrelated with the first
factor.
• This same principle could be applied to selecting
additional weights for the additional factors.
Statistics Associated with Factor Analysis
• Bartlett's test of sphericity. Bartlett's test of sphericity is a test statistic used
to examine the hypothesis that the variables are uncorrelated in the
population. In other words, the population correlation matrix is an identity
matrix; each variable correlates perfectly with itself (r = 1) but has no
correlation with the other variables (r = 0).
• Correlation matrix. A correlation matrix is a lower triangle matrix showing the
simple correlations, r, between all possible pairs of variables included in the
analysis. The diagonal elements, which are all 1, are usually omitted.
Statistics Associated with Factor Analysis
• Communality. Communality is the amount of variance a variable shares with all
the other variables being considered. This is also the proportion of variance
explained by the common factors. (0.5)
• Eigenvalue. The eigenvalue represents the total variance explained by each
factor. >1
• Factor loadings. Factor loadings are simple correlations between the variables
and the factors. >.5
• Factor loading plot. A factor loading plot is a plot of the original variables using
the factor loadings as coordinates.
• Factor matrix. A factor matrix contains the factor loadings of all the variables
on all the factors extracted.
Statistics Associated with Factor Analysis
• Factor scores. Factor scores are composite scores estimated for each
respondent on the derived factors.
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The Kaiser-Meyer-
Olkin (KMO) measure of sampling adequacy is an index used to examine the
appropriateness of factor analysis. High values (between 0.5 and 1.0) indicate
factor analysis is appropriate. Values below 0.5 imply that factor analysis may
not be appropriate.
• Percentage of variance. The percentage of the total variance attributed to each
factor. >60%
• Scree plot. A scree plot is a plot of the Eigenvalues against the number of
factors in order of extraction.
• Eigen value >=1
Conducting Factor Analysis
Construction of the Correlation Matrix
Method of Factor Analysis
Determination of Number of Factors
Determination of Model Fit
Problem Formulation
Calculation of
Factor Scores
Interpretation of Factors
Rotation of Factors
Selection of
Surrogate Variables
Conducting Factor Analysis: Formulate
the Problem
• The objectives of factor analysis should be identified.
• The variables to be included in the factor analysis should
be specified based on past research, theory, and judgment
of the researcher. It is important that the variables be
appropriately measured on an interval or ratio scale.
• An appropriate sample size should be used. As a rough
guideline, there should be at least four or five times as
many observations (sample size) as there are variables.
Correlation Matrix
Variables V1 V2 V3 V4 V5 V6
V1 1.000
V2 -0.530 1.000
V3 0.873 -0.155 1.000
V4 -0.086 0.572 -0.248 1.000
V5 -0.858 0.020 -0.778 -0.007 1.000
V6 0.004 0.640 -0.018 0.640 -0.136 1.000
Conducting Factor Analysis:
Construct the Correlation Matrix
• The analytical process is based on a matrix of correlations between the
variables.
• Bartlett's test of sphericity can be used to test the null hypothesis that the
variables are uncorrelated in the population: in other words, the population
correlation matrix is an identity matrix. If this hypothesis cannot be rejected,
then the appropriateness of factor analysis should be questioned.
• Another useful statistic is the Kaiser-Meyer-Olkin (KMO) measure of sampling
adequacy. Small values of the KMO statistic indicate that the correlations
between pairs of variables cannot be explained by other variables and that
factor analysis may not be appropriate.
Determine the Method of Factor
Analysis
• In principal components analysis, the total variance in the data is considered.
The diagonal of the correlation matrix consists of unities, and full variance is
brought into the factor matrix. Principal components analysis is recommended
when the primary concern is to determine the minimum number of factors
that will account for maximum variance in the data for use in subsequent
multivariate analysis. The factors are called principal components.
• In common factor analysis, the factors are estimated based only on the
common variance. Communalities are inserted in the diagonal of the
correlation matrix. This method is appropriate when the primary concern is to
identify the underlying dimensions and the common variance is of interest.
This method is also known as principal axis factoring.
Scree Plot
0.5
2 5
4
3 6
Component Number
0.0
2.0
3.0
Eigenvalue 1.0
1.5
2.5
1
A Classification of Univariate Techniques
Independent Related
Independent Related
* Two- Group test
* Z test
* One-Way ANOVA
* Paired
t test * Chi-Square
* Mann-Whitney
* Median
* K-S
* K-W ANOVA
* Sign
* Wilcoxon
* McNemar
* Chi-Square
Metric Data Non-numeric Data
Univariate Techniques
One Sample Two or More Samples One Sample Two or More Samples
* t test
* Z test
* Frequency
* Chi-Square
* K-S
* Runs
* Binomial
A Classification of Multivariate Techniques
More Than One
Dependent Variable
* Multivariate Analysis
of Variance
* Canonical Correlation
* Multiple Discriminant Analysis
* Structural Equation
Modeling
and Path Analysis
* Cross-Tabulation
* Analysis of Variance and
Covariance
* Multiple Regression
* 2-Group Discriminant/Logit
* Conjoint Analysis
* Factor Analysis
* Confirmatory
Factor Analysis
One Dependent Variable Variable Interdependence Interobject
Similarity
* Cluster Analysis
* Multidimensional Scaling
Dependence Technique Interdependence
Technique
Multivariate Techniques
Correlation
• The correlation, r, summarizes the strength of association between two
metric (interval or ratio scaled) variables, say X and Y.
• It is an index used to determine whether a linear or straight-line relationship
exists between X and Y.
• As it was originally proposed by Karl Pearson, it is also known as the Pearson
correlation coefficient.
It is also referred to as simple correlation, bivariate correlation, or merely the
correlation coefficient.
Factors influences correlation
• Chance coincidence
• Influence of third variable
• Mutual influence
Types of correlations
• Positive/Negative correlation
• Linear/Non-linear correlation
• Simple/partial/multiple correlation
• Simple correlation: x&y
• Partial correlation: x&y where z is constant
• Multiple correlation: more than 3 variables.
Methods of correlation analysis
• Scatter plot
• Karl-Pearson correlation
• Rank Correlation
• Method of least square
Correlation
• r varies between -1.0 and +1.0.
• The correlation coefficient between two
variables will be the same regardless of their
underlying units of measurement.
Karl Pearson Coefficient of Correlation
• Formula
Calculate correlation coefficient (Karl Pearson
coefficient of correlation)
• Find correlation between unemployed and index of production?
• Ans: r=
Year Index of production Number unemployed (in lakhs)
1991 100 15
1992 102 12
1993 104 13
1994 107 11
1995 105 12
1996 112 12
1997 103 19
1998 99 26
Calculate correlation coefficient (Karl Pearson
coefficient of correlation)
• Find correlation between Age and no. of sick days?
• Ans: r=
Employee Age sick days
1 30 1
2 32 0
3 35 2
4 40 5
5 48 2
6 50 4
7 52 6
8 55 5
9 57 7
10 61 8
Spearman's Rank Correlation
Where:
Ρ=rank correlation coefficient
di =difference between two ranks of each observation
n= number of observations
Rank correlation of following
Year
Index of production
(x)
Number unemployed (in lakhs)
(y)
1991 100 15
1992 102 12
1993 104 13
1994 107 11
1995 105 12
1996 112 12
1997 103 19
1998 99 26
Employee Age (x) sick days (y)
1 30 1
2 32 0
3 35 2
4 40 5
5 48 2
6 50 4
7 52 6
8 55 5
9 57 7
10 61 8
ρ= ρ=
Regression Analysis
Regression analysis examines associative relationships
between a metric dependent variable and one or more
independent variables in the following ways:
• Determine whether the independent variables explain a significant variation in
the dependent variable: whether a relationship exists.
• Determine how much of the variation in the dependent variable can be
explained by the independent variables: strength of the relationship.
• Determine the structure or form of the relationship: the mathematical
equation relating the independent and dependent variables.
• Predict the values of the dependent variable.
• Control for other independent variables when evaluating the contributions of a
specific variable or set of variables.
• Regression analysis is concerned with the nature and degree of association
between variables and does not imply or assume any causality.
Formulas
• y=mx+b
• Y=dependent variable
• X= independent variable
• b= intercept
• m=slope of line
Slope of line
Line intercept
Linear regression of following
Year
Index of production
(x)
Number unemployed (in lakhs)
(y)
1991 100 15
1992 102 12
1993 104 13
1994 107 11
1995 105 12
1996 112 12
1997 103 19
1998 99 26
Employee Age (x) sick days (y)
1 30 1
2 32 0
3 35 2
4 40 5
5 48 2
6 50 4
7 52 6
8 55 5
9 57 7
10 61 8

Weitere ähnliche Inhalte

Was ist angesagt?

Asnt level iii - study guide - pt
Asnt   level iii - study guide - ptAsnt   level iii - study guide - pt
Asnt level iii - study guide - pt
Miguel Aguilar Mena
 
Calibration of Instruments
Calibration of InstrumentsCalibration of Instruments
Calibration of Instruments
karoline Enoch
 

Was ist angesagt? (20)

Contact temperature sensor calibration
Contact temperature sensor calibrationContact temperature sensor calibration
Contact temperature sensor calibration
 
Metrology
MetrologyMetrology
Metrology
 
Rtd and thermocouples
Rtd and thermocouplesRtd and thermocouples
Rtd and thermocouples
 
Introduction to Temperature Measurement and Calibration Presented by Fluke Ca...
Introduction to Temperature Measurement and Calibration Presented by Fluke Ca...Introduction to Temperature Measurement and Calibration Presented by Fluke Ca...
Introduction to Temperature Measurement and Calibration Presented by Fluke Ca...
 
Calibration
CalibrationCalibration
Calibration
 
The beginners guide to heat treatment control v2
The beginners guide to heat treatment control v2The beginners guide to heat treatment control v2
The beginners guide to heat treatment control v2
 
Asnt level iii - study guide - pt
Asnt   level iii - study guide - ptAsnt   level iii - study guide - pt
Asnt level iii - study guide - pt
 
Equipment calibration PPT by Shravan Kumar
Equipment calibration PPT by Shravan KumarEquipment calibration PPT by Shravan Kumar
Equipment calibration PPT by Shravan Kumar
 
Rtd (resistance temperature detector)
Rtd (resistance temperature detector)Rtd (resistance temperature detector)
Rtd (resistance temperature detector)
 
Generalized measurement system
Generalized measurement systemGeneralized measurement system
Generalized measurement system
 
Calibration of Instruments
Calibration of InstrumentsCalibration of Instruments
Calibration of Instruments
 
Measurement and Metrology
Measurement and MetrologyMeasurement and Metrology
Measurement and Metrology
 
UNIT IV FORM MEASUREMENT
UNIT IV FORM MEASUREMENT UNIT IV FORM MEASUREMENT
UNIT IV FORM MEASUREMENT
 
Automating Temperature Sensor Calibration
Automating Temperature Sensor CalibrationAutomating Temperature Sensor Calibration
Automating Temperature Sensor Calibration
 
Unit 1 static and dynamic
Unit 1 static and dynamicUnit 1 static and dynamic
Unit 1 static and dynamic
 
Instrument Calibration
Instrument Calibration Instrument Calibration
Instrument Calibration
 
Sensors and transducers.
Sensors and transducers.Sensors and transducers.
Sensors and transducers.
 
Lm 35
Lm 35Lm 35
Lm 35
 
Linear &amp; angular measurement
Linear  &amp; angular measurementLinear  &amp; angular measurement
Linear &amp; angular measurement
 
Ensuring the validity of results
Ensuring the validity of resultsEnsuring the validity of results
Ensuring the validity of results
 

Ähnlich wie Unit 5.pptx

Collecting Research Data With Questionnaires And Interviews
Collecting Research Data With Questionnaires And InterviewsCollecting Research Data With Questionnaires And Interviews
Collecting Research Data With Questionnaires And Interviews
Amanda Walker
 

Ähnlich wie Unit 5.pptx (20)

Fieldwork/ Data collection process
Fieldwork/ Data collection processFieldwork/ Data collection process
Fieldwork/ Data collection process
 
Statistics for MBA.pptx
Statistics for MBA.pptxStatistics for MBA.pptx
Statistics for MBA.pptx
 
Data Collection Preparation
Data Collection PreparationData Collection Preparation
Data Collection Preparation
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
Sampling in Market Research
Sampling in Market ResearchSampling in Market Research
Sampling in Market Research
 
Business research Questionnaire Design
Business research Questionnaire DesignBusiness research Questionnaire Design
Business research Questionnaire Design
 
Collecting Research Data With Questionnaires And Interviews
Collecting Research Data With Questionnaires And InterviewsCollecting Research Data With Questionnaires And Interviews
Collecting Research Data With Questionnaires And Interviews
 
Training Program Evaluation
Training Program EvaluationTraining Program Evaluation
Training Program Evaluation
 
Great Survey Design
Great Survey DesignGreat Survey Design
Great Survey Design
 
Research design
Research designResearch design
Research design
 
Research design
Research designResearch design
Research design
 
Survey and Sample Size Calculation in Epidemiological Studies.pptx
Survey and Sample Size Calculation in Epidemiological Studies.pptxSurvey and Sample Size Calculation in Epidemiological Studies.pptx
Survey and Sample Size Calculation in Epidemiological Studies.pptx
 
QuestionPro Audience Webinar - How to Improve Data Quality For Your Research
QuestionPro Audience Webinar - How to Improve Data Quality For Your ResearchQuestionPro Audience Webinar - How to Improve Data Quality For Your Research
QuestionPro Audience Webinar - How to Improve Data Quality For Your Research
 
Designing research questionnaire
Designing research questionnaireDesigning research questionnaire
Designing research questionnaire
 
How to conduct a questionnaire for a scientific survey
How to conduct a questionnaire for a scientific surveyHow to conduct a questionnaire for a scientific survey
How to conduct a questionnaire for a scientific survey
 
BRS SA 2.0 (2021) - Part 3 of 3.pptx
BRS SA 2.0 (2021) - Part 3 of 3.pptxBRS SA 2.0 (2021) - Part 3 of 3.pptx
BRS SA 2.0 (2021) - Part 3 of 3.pptx
 
chapter 7.ppt
chapter 7.pptchapter 7.ppt
chapter 7.ppt
 
Mba ii rm unit-2.1 research process a
Mba ii rm unit-2.1 research process aMba ii rm unit-2.1 research process a
Mba ii rm unit-2.1 research process a
 
Smart Strategies to Leverage Patient Surveys for PMCF DATA COLLECTION
Smart Strategies to Leverage Patient Surveys for PMCF DATA COLLECTIONSmart Strategies to Leverage Patient Surveys for PMCF DATA COLLECTION
Smart Strategies to Leverage Patient Surveys for PMCF DATA COLLECTION
 
Chapter 8 data collection
Chapter 8 data collectionChapter 8 data collection
Chapter 8 data collection
 

Kürzlich hochgeladen

Brand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdfBrand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdf
tbatkhuu1
 

Kürzlich hochgeladen (20)

Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 Reports
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.
 
Cash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girlCash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girl
 
Defining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotlerDefining Marketing for the 21st Century,kotler
Defining Marketing for the 21st Century,kotler
 
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
 
Unraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxUnraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptx
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
 
Brand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdfBrand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdf
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
 
BDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 128 Noida Escorts >༒8448380779 Escort Service
 
LinkedIn Social Selling Master Class - David Wong
LinkedIn Social Selling Master Class - David WongLinkedIn Social Selling Master Class - David Wong
LinkedIn Social Selling Master Class - David Wong
 
personal branding kit for music business
personal branding kit for music businesspersonal branding kit for music business
personal branding kit for music business
 
Foundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David PisarekFoundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David Pisarek
 
No Cookies No Problem - Steve Krull, Be Found Online
No Cookies No Problem - Steve Krull, Be Found OnlineNo Cookies No Problem - Steve Krull, Be Found Online
No Cookies No Problem - Steve Krull, Be Found Online
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
 
Podcast Marketing Master Class - Roger Nairn
Podcast Marketing Master Class - Roger NairnPodcast Marketing Master Class - Roger Nairn
Podcast Marketing Master Class - Roger Nairn
 
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptxDigital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail Success
 
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 144 Noida Escorts >༒8448380779 Escort Service
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentation
 

Unit 5.pptx

  • 2. Topics to be covered • Unit-V: Data Analysis: Editing, Coding, Tabular representation of data, Graphical Representation of Data. • Questionnaire Construction, Content Analysis, Validity and Reliability Test. • Descriptive Statistics and Probability: Measures of Central Tendency, Dispersion, Skewness & Kurtosis, Probability and Laws, Random Variable, Expectation. • Probability Distribution and Sampling: Discrete, Binomial, Poisson, Continuous, Normal Sampling Distribution, Statistical Estimation. • Multivariate Data Analysis: Factor Analysis, Cluster Analysis, Discriminant Analysis, Multi- Dimensional Scaling, Conjoint Analysis.
  • 3. Fieldwork/Data Collection Process Selecting Field Workers Training Field Workers Supervising Field Workers Validating Fieldwork Evaluating Field Workers
  • 4. Selection of Field Workers The researcher should: • Develop job specifications for the project, taking into account the mode of data collection. • Decide what characteristics the field workers should have. • Recruit appropriate individuals.
  • 5. General Qualifications of Field Workers • Healthy. Field workers must have the stamina required to do the job. • Outgoing. The interviewers should be able to establish rapport with the respondents. • Communicative. Effective speaking and listening skills are a great asset. • Pleasant appearance. If the field worker's physical appearance is unpleasant or unusual, the data collected may be biased. • Educated. Interviewers must have good reading and writing skills. • Experienced. Experienced interviewers are likely to do a better job.
  • 6. Training of Field Workers • Making the Initial Contact – Interviewers should be trained to make opening remarks that will convince potential respondents that their participation is important. • Asking the Questions 1. Be thoroughly familiar with the questionnaire. 2. Ask the questions in the order in which they appear in the questionnaire. 3. Use the exact wording given in the questionnaire. 4. Read each question slowly. 5. Repeat questions that are not understood. 6. Ask every applicable question. 7. Follow instructions, skip patterns, probe carefully.
  • 7. Training of Field Workers • Probing – Some commonly used probing techniques: 1. Repeating the question. 2. Repeating the respondent's reply. 3. Using a pause or silent probe. 4. Boosting or reassuring the respondent. 5. Eliciting clarification. 6. Using objective/neutral questions or comments.
  • 8. Training of Field Workers • Recording the Answers – Guidelines for recording answers to unstructured questions: 1. Record responses during the interview. 2. Use the respondent's own words. 3. Do not summarize or paraphrase the respondent's answers. 4. Include everything that pertains to the question objectives. 5. Include all probes and comments. 6. Repeat the response as it is written down. • Terminating the Interview – The respondent should be left with a positive feeling about the interview.
  • 9. Guidelines on Interviewer Training: The Council of American Survey Research Organizations Training should be conducted under the direction of supervisory personnel and should cover the following: 1) The research process: how a study is developed, implemented & reported. 2) Importance of interviewers; need for honesty, objectivity & professionalism. 3) Confidentiality of the respondent & client. 4) Familiarity with market research terminology. 5) Importance of following the exact wording & recording responses verbatim. 6) Purpose & use of probing & clarifying techniques. 7) The reason for & use of classification & respondent information questions. 8) A review of samples of instructions & questionnaires. 9) Importance of the respondent’s positive feelings about survey research. An interviewer must be trained in the interviewing techniques outlined above.
  • 10. Guidelines on Supervision: The Council of American Survey Research Organizations All research projects should be properly supervised. It is the data collection agency’s responsibility to: 1) Properly supervise interviews. 2) See that an agreed-upon proportion of interviewers’ telephone calls are monitored. 3) Be available to report on the status of the project daily to the project director, unless otherwise instructed. 4) Keep all studies, materials, and findings confidential. 5) Notify concerned parties if the anticipated schedule is not met. 6) Attend all interviewer briefings. 7) Keep current & accurate records of the interviewing progress. 8) Make sure all interviewers have all materials in time. 9) Edit each questionnaire. 10) Provide consistent & positive feedback to the interviewers. 11) Not falsify any work.
  • 11. Supervision of Field Workers Supervision of field workers means making sure that they are following the procedures and techniques in which they were trained. Supervision involves quality control and editing, sampling control, control of cheating, and central office control. • Quality Control and Editing – This requires checking to see if the field procedures are being properly implemented. • Sampling Control – The supervisor attempts to ensure that the interviewers are strictly following the sampling plan. • Control of Cheating – Cheating can be minimized through proper training, supervision, and validation. • Central Office Control – Supervisors provide quality and cost-control information to the central office.
  • 12. Validation of Fieldwork Validation: • The supervisors call 10 - 25% of the respondents to inquire whether the field workers actually conducted the interviews. • The supervisors ask about the length and quality of the interview, reaction to the interviewer, and basic demographic data. • The demographic information is cross-checked against the information reported by the interviewers on the questionnaires.
  • 13. Evaluation of Field Workers • Cost and Time. The interviewers can be compared in terms of the total cost (salary and expenses) per completed interview. • Response Rates. It is important to monitor response rates on a timely basis so that corrective action can be taken if these rates are too low. • Quality of Interviewing. To evaluate interviewers on the quality of interviewing, the supervisor must directly observe the interviewing process. • Quality of Data. The completed questionnaires of each interviewer should be evaluated for the quality of data.
  • 14.
  • 15. Data Preparation Process Select Data Analysis Strategy Prepare Preliminary Plan of Data Analysis Check Questionnaire Edit Code Transcribe Clean Data Statistically Adjust the Data
  • 16. Questionnaire Checking A questionnaire returned from the field may be unacceptable for several reasons. • Parts of the questionnaire may be incomplete. • The pattern of responses may indicate that the respondent did not understand or follow the instructions. • The responses show little variance. • One or more pages are missing. • The questionnaire is received after the preestablished cutoff date. • The questionnaire is answered by someone who does not qualify for participation.
  • 17. Editing Treatment of Unsatisfactory Results • Returning to the Field – The questionnaires with unsatisfactory responses may be returned to the field, where the interviewers recontact the respondents. • Assigning Missing Values – If returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses. • Discarding Unsatisfactory Respondents – In this approach, the respondents with unsatisfactory responses are simply discarded.
  • 18. Coding Coding means assigning a code, usually a number, to each possible response to each question. The code includes an indication of the column position (field) and data record it will occupy. Coding Questions • Fixed field codes, which mean that the number of records for each respondent is the same and the same data appear in the same column(s) for all respondents, are highly desirable. • If possible, standard codes should be used for missing data. Coding of structured questions is relatively simple, since the response options are predetermined. • In questions that permit a large number of responses, each possible response option should be assigned a separate column.
  • 19. Coding Guidelines for Coding Unstructured Questions: • Category codes should be mutually exclusive and collectively exhaustive. • Only a few (10% or less) of the responses should fall into the “other” category. • Category codes should be assigned for critical issues even if no one has mentioned them. • Data should be coded to retain as much detail as possible.
  • 20. Codebook A codebook contains coding instructions and the necessary information about variables in the data set. A codebook generally contains the following information: • column number • record number • variable number • variable name • question number • instructions for coding
  • 21. Coding Questionnaires • The respondent code and the record number appear on each record in the data. • The first record contains the additional codes: project code, interviewer code, date and time codes, and validation code. • It is a good practice to insert blanks between parts.
  • 22. ID PREFER. QUALITY QUANTITY VALUE SERVICE INCOME 1 2 2 3 1 3 6 2 6 5 6 5 7 2 3 4 4 3 4 5 3 4 1 2 1 1 2 5 5 7 6 6 5 4 1 6 5 4 4 5 4 3 7 2 2 3 2 3 5 8 3 3 4 2 3 4 9 7 6 7 6 5 2 10 2 3 2 2 2 5 11 2 3 2 1 3 6 12 6 6 6 6 7 2 13 4 4 3 3 4 3 14 1 1 3 1 2 4 15 7 7 5 5 4 2 16 5 5 4 5 5 3 17 2 3 1 2 3 4 18 4 4 3 3 3 3 19 7 5 5 7 5 5 20 3 2 2 3 3 3 Restaurant Preference
  • 23. SPSS Variable View of the Data
  • 24. Codebook Excerpt Column Number Variable Number Variable Name Question Number Coding Instructions 1 1 ID 1 to 20 as coded 2 2 Preference 1 Input the number circled. 1=Weak Preference 7=Strong Preference 3 3 Quality 2 Input the number circled. 1=Poor 7=Excellent 4 4 Quantity 3 Input the number circled. 1=Poor 7=Excellent 5 5 Value 4 Input the number circled. 1=Poor 7=Excellent 6 6 Service 5 Input the number circled. 1=Poor 7=Excellent
  • 25. Column Number Variable Number Variable Name Question Number Coding Instructions 7 7 Income 6 Input the number circled. 1 = Less than $20,000 2 = $20,000 to 34,999 3 = $35,000 to 49,999 4 = $50,000 to 74,999 5 = $75,000 to 99,999 6 = $100,00 or more Codebook Excerpt (Cont.)
  • 26. Example of Questionnaire Coding Finally, in this part of the questionnaire we would like to ask you some background information for classification purposes. PART D Record #7 1. This questionnaire was answered by (29) 1. _____ Primarily the male head of household 2. _____ Primarily the female head of household 3. _____ Jointly by the male and female heads of household 2. Marital Status (30) 1. _____ Married 2. _____ Never Married 3. _____ Divorced/Separated/Widowed 3. What is the total number of family members living at home? _____ (31 - 32) 4. Number of children living at home: a. Under six years _____ (33) b. Over six years _____ (34) 5. Number of children not living at home _____ (35) 6. Number of years of formal education which you (and your spouse, if applicable) have completed. (please circle) College High School Undergraduate Graduate a. You 8 or less 9 10 11 12 13 14 15 16 17 18 19 20 21 22 or more (36-37) b. Spouse 8 or less 9 10 11 12 13 14 15 16 17 18 19 20 21 22 or more (37-38) 7. a. Your age: (40-41) b. Age of spouse (if applicable) (42-43) 8. If employed please indicate your household's occupations by checking the appropriate category. 44 45 Male Head Female Head 1. Professional and technical 2. Managers and administrators 3. Sales workers 4. Clerical and kindred workers 5. Craftsman/operative /laborers 6. Homemakers 7. Others (please specify) 8. Not applicable 9. Is your place of residence presently owned by household? (46) 1. Owned _____ 2. Rented _____ 10. How many years have you been residing in the greater Atlanta area? years. (47-48)
  • 27. Data Transcription Transcribed Data CATI/ CAPI Keypunching via CRT Terminal Digital Tech. Optical Recognition Bar Code & Other Technologies Verification: Correct Keypunching Errors Disks Other Storage Computer Memory Raw Data
  • 28. Data Cleaning Consistency Checks Consistency checks identify data that are out of range, logically inconsistent, or have extreme values. • Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to identify out-of-range values for each variable and print out the respondent code, variable code, variable name, record number, column number, and out-of-range value. • Extreme values should be closely examined.
  • 29. Data Cleaning Treatment of Missing Responses • Substitute a Neutral Value – A neutral value, typically the mean response to the variable, is substituted for the missing responses. • Substitute an Imputed Response – The respondents' pattern of responses to other questions are used to impute or calculate a suitable response to the missing questions. • In casewise deletion, cases, or respondents, with any missing responses are discarded from the analysis. • In pairwise deletion, instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for each calculation.
  • 30. Selecting a Data Analysis Strategy Known Characteristics of the Data Data Analysis Strategy Properties of Statistical Techniques Background and Philosophy of the Researcher
  • 31. Measures of Center and Location Center and Location Mean Median Mode Weighted Mean N x n x x N i i n i i        1 1        i i i W i i i W w x w w x w X Overview
  • 32. Mean (Arithmetic Average) • The Mean is the arithmetic average of data values • Population mean • Sample mean n = Sample Size N = Population Size n x x x n x x n n i i         2 1 1 N x x x N x N N i i          2 1 1
  • 33. Mean (Arithmetic Average) • The most common measure of central tendency • Mean = sum of values divided by the number of values • Affected by extreme values (outliers) (continued) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 3 5 15 5 5 4 3 2 1       4 5 20 5 10 4 3 2 1      
  • 34. Median • In an ordered array, the median is the “middle” number, i.e., the number that splits the distribution in half • The median is not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3
  • 35. Median • To find the median, sort the n data values from low to high (sorted data is called a data array) • Find the value in the i = (1/2)n position • The ith position is called the Median Index Point • If i is not an integer, round up to next highest integer (continued)
  • 36. Median Example • Note that n = 13 • Find the i = (1/2)n position: i = (1/2)(13) = 6.5 • Since 6.5 is not an integer, round up to 7 • The median is the value in the 7th position: Md = 12 (continued) Data array: 4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24
  • 37. Mode • A measure of location • The value that occurs most often • Not affected by extreme values • Used for either numerical or categorical data • There may be no mode • There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 5 0 1 2 3 4 5 6 No Mode
  • 38. Shape of a Distribution • Describes how data is distributed • Symmetric or skewed Mean = Median Mean < Median Median < Mean Right-Skewed Left-Skewed Symmetric (Longer tail extends to left) (Longer tail extends to right)
  • 39. Weighted Mean • Used when values are grouped by frequency or relative importance Days to Complete Frequency 5 4 6 12 7 8 8 2 Example: Sample of 26 Repair Projects Weighted Mean Days to Complete: days 6.31 26 164 2 8 12 4 8) (2 7) (8 6) (12 5) (4 w x w X i i i W                
  • 40. • Mean is generally used, unless extreme values (outliers) exist • Then Median is often used, since the median is not sensitive to extreme values. • Example: Median home prices may be reported for a region – less sensitive to outliers Which measure of location is the “best”?
  • 41. Measures of Central Tendency: Ungrouped Data • Measures of central tendency yield information about “particular places or locations in a group of numbers.” • Common Measures of Location • Mode • Median • Mean
  • 42. Mean of Grouped Data • average of class midpoints • Class frequencies                      fM f fM N f M f M f M f M f f f f i i i 1 1 2 2 3 3 1 2 3
  • 43.
  • 44.
  • 45. Calculation of Grouped Mean Class Interval Frequency Class Midpoint fM 20-under 30 6 25 150 30-under 40 18 35 630 40-under 50 11 45 495 50-under 60 11 55 605 60-under 70 3 65 195 70-under 80 1 75 75 50 2150       fM f 2150 50 43 0 .
  • 46.
  • 47.
  • 48.
  • 49.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Mode of Grouped Data • Midpoint of the modal class • Modal class has the greatest frequency Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
  • 56.
  • 57.
  • 58. Mean, Median and Mode • Q. The frequency distribution below represents the weights in pounds of a sample of packages carried last month by a small airfreight company. Class 10-10.9 11-11.9 12-12.9 13-13.9 14-14.9 15-15.9 16-16.9 17-17.9 18-18.9 19-19.9 Frequency 1 4 6 8 12 11 8 7 6 2 Compute sample mean, median and mode.
  • 59. • The frequency distribution represents the salary (in Rupees) of an MNC employees for last year. Mean, Median and Mode Class (Rupee in hundreds) 0– 49.99 50.00– 99.99 100.00– 149.99 150.00– 199.99 200.00– 249.99 250.00– 299.99 300.00– 349.99 350.00– 399.99 400.00– 449.99 450.00– 499.99 Frequency 78 123 187 82 51 47 13 9 6 4 Compute mean, median and mode salary.
  • 60. Measures of Variation Variation Variance Standard Deviation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range
  • 61. • Measures of variation give information on the spread or variability of the data values. Variation Same center, different variation
  • 62. Range • Simplest measure of variation • Difference between the largest and the smallest observations: Range = xmaximum – xminimum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:
  • 63. • Ignores the way in which data are distributed • Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Disadvantages of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119
  • 64. • Average of squared deviations of values from the mean • Population variance: • Sample variance: Variance N μ) (x σ N 1 i 2 i 2     1 - n ) x (x s n 1 i 2 i 2    
  • 65. Standard Deviation • Most commonly used measure of variation • Shows variation about the mean • Has the same units as the original data • Population standard deviation: • Sample standard deviation: N μ) (x σ N 1 i 2 i     1 - n ) x (x s n 1 i 2 i    
  • 66. Introduction to Probability Distributions • Random Variable • Represents a possible numerical value from a random event • Takes on different values based on chance Random Variables Discrete Random Variable Continuous Random Variable
  • 67. • A discrete random variable is a variable that can assume only a countable number of values Many possible outcomes: • number of complaints per day • number of TV’s in a household • number of rings before the phone is answered Only two possible outcomes: • gender: male or female • defective: yes or no • spreads peanut butter first vs. spreads jelly first Discrete Random Variable
  • 68. Continuous Random Variable • A continuous random variable is a variable that can assume any value on a continuum (can assume an uncountable number of values) • thickness of an item • time required to complete a task • temperature of a solution • height, in inches • These can potentially take on any value, depending only on the ability to measure accurately.
  • 69. Discrete Random Variables • Can only assume a countable number of values Examples: • Roll a die twice Let x be the number of times 4 comes up (then x could be 0, 1, or 2 times) • Toss a coin 5 times. Let x be the number of heads (then x = 0, 1, 2, 3, 4, or 5)
  • 70. Experiment: Toss 2 Coins. Let x = # heads. T T Discrete Probability Distribution 4 possible outcomes T T H H H H Probability Distribution 0 1 2 x x Value Probability 0 1/4 = .25 1 2/4 = .50 2 1/4 = .25 .50 .25 Probability
  • 72. Continuous Probability Distributions • A continuous random variable is a variable that can assume any value on a continuum (can assume an uncountable number of values) • thickness of an item • time required to complete a task • temperature of a solution • height, in inches • These can potentially take on any value, depending only on the ability to measure accurately.
  • 73. Factor Analysis • Factor analysis is a general name denoting a class of procedures primarily used for data reduction and summarization. • Factor analysis is an interdependence technique in that an entire set of interdependent relationships is examined without making the distinction between dependent and independent variables. • Factor analysis is used in the following circumstances: • To identify underlying dimensions, or factors, that explain the correlations among a set of variables. • To identify a new, smaller, set of uncorrelated variables to replace the original set of correlated variables in subsequent multivariate analysis (regression or discriminant analysis). • To identify a smaller set of salient variables from a larger set for use in subsequent multivariate analysis.
  • 74. Factor Analysis Model Mathematically, each variable is expressed as a linear combination of underlying factors. The covariation among the variables is described in terms of a small number of common factors plus a unique factor for each variable. If the variables are standardized, the factor analysis model may be represented as: Xi = Ai 1F1 + Ai 2F2 + Ai 3F3 + . . . + AimFm + ViUi where Xi = i th standardized variable Aij = standardized multiple regression coefficient of variable i on common factor j F = common factor Vi = standardized regression coefficient of variable i on unique factor i Ui = the unique factor for variable i m = number of common factors
  • 75. Factor Analysis Model The unique factors are uncorrelated with each other and with the common factors. The common factors themselves can be expressed as linear combinations of the observed variables. Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk Where: Fi = estimate of i th factor Wi = weight or factor score coefficient k = number of variables
  • 76. Factor Analysis Model • It is possible to select weights or factor score coefficients so that the first factor explains the largest portion of the total variance. • Then a second set of weights can be selected, so that the second factor accounts for most of the residual variance, subject to being uncorrelated with the first factor. • This same principle could be applied to selecting additional weights for the additional factors.
  • 77. Statistics Associated with Factor Analysis • Bartlett's test of sphericity. Bartlett's test of sphericity is a test statistic used to examine the hypothesis that the variables are uncorrelated in the population. In other words, the population correlation matrix is an identity matrix; each variable correlates perfectly with itself (r = 1) but has no correlation with the other variables (r = 0). • Correlation matrix. A correlation matrix is a lower triangle matrix showing the simple correlations, r, between all possible pairs of variables included in the analysis. The diagonal elements, which are all 1, are usually omitted.
  • 78. Statistics Associated with Factor Analysis • Communality. Communality is the amount of variance a variable shares with all the other variables being considered. This is also the proportion of variance explained by the common factors. (0.5) • Eigenvalue. The eigenvalue represents the total variance explained by each factor. >1 • Factor loadings. Factor loadings are simple correlations between the variables and the factors. >.5 • Factor loading plot. A factor loading plot is a plot of the original variables using the factor loadings as coordinates. • Factor matrix. A factor matrix contains the factor loadings of all the variables on all the factors extracted.
  • 79. Statistics Associated with Factor Analysis • Factor scores. Factor scores are composite scores estimated for each respondent on the derived factors. • Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The Kaiser-Meyer- Olkin (KMO) measure of sampling adequacy is an index used to examine the appropriateness of factor analysis. High values (between 0.5 and 1.0) indicate factor analysis is appropriate. Values below 0.5 imply that factor analysis may not be appropriate. • Percentage of variance. The percentage of the total variance attributed to each factor. >60% • Scree plot. A scree plot is a plot of the Eigenvalues against the number of factors in order of extraction. • Eigen value >=1
  • 80. Conducting Factor Analysis Construction of the Correlation Matrix Method of Factor Analysis Determination of Number of Factors Determination of Model Fit Problem Formulation Calculation of Factor Scores Interpretation of Factors Rotation of Factors Selection of Surrogate Variables
  • 81. Conducting Factor Analysis: Formulate the Problem • The objectives of factor analysis should be identified. • The variables to be included in the factor analysis should be specified based on past research, theory, and judgment of the researcher. It is important that the variables be appropriately measured on an interval or ratio scale. • An appropriate sample size should be used. As a rough guideline, there should be at least four or five times as many observations (sample size) as there are variables.
  • 82. Correlation Matrix Variables V1 V2 V3 V4 V5 V6 V1 1.000 V2 -0.530 1.000 V3 0.873 -0.155 1.000 V4 -0.086 0.572 -0.248 1.000 V5 -0.858 0.020 -0.778 -0.007 1.000 V6 0.004 0.640 -0.018 0.640 -0.136 1.000
  • 83. Conducting Factor Analysis: Construct the Correlation Matrix • The analytical process is based on a matrix of correlations between the variables. • Bartlett's test of sphericity can be used to test the null hypothesis that the variables are uncorrelated in the population: in other words, the population correlation matrix is an identity matrix. If this hypothesis cannot be rejected, then the appropriateness of factor analysis should be questioned. • Another useful statistic is the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. Small values of the KMO statistic indicate that the correlations between pairs of variables cannot be explained by other variables and that factor analysis may not be appropriate.
  • 84. Determine the Method of Factor Analysis • In principal components analysis, the total variance in the data is considered. The diagonal of the correlation matrix consists of unities, and full variance is brought into the factor matrix. Principal components analysis is recommended when the primary concern is to determine the minimum number of factors that will account for maximum variance in the data for use in subsequent multivariate analysis. The factors are called principal components. • In common factor analysis, the factors are estimated based only on the common variance. Communalities are inserted in the diagonal of the correlation matrix. This method is appropriate when the primary concern is to identify the underlying dimensions and the common variance is of interest. This method is also known as principal axis factoring.
  • 85. Scree Plot 0.5 2 5 4 3 6 Component Number 0.0 2.0 3.0 Eigenvalue 1.0 1.5 2.5 1
  • 86. A Classification of Univariate Techniques Independent Related Independent Related * Two- Group test * Z test * One-Way ANOVA * Paired t test * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA * Sign * Wilcoxon * McNemar * Chi-Square Metric Data Non-numeric Data Univariate Techniques One Sample Two or More Samples One Sample Two or More Samples * t test * Z test * Frequency * Chi-Square * K-S * Runs * Binomial
  • 87. A Classification of Multivariate Techniques More Than One Dependent Variable * Multivariate Analysis of Variance * Canonical Correlation * Multiple Discriminant Analysis * Structural Equation Modeling and Path Analysis * Cross-Tabulation * Analysis of Variance and Covariance * Multiple Regression * 2-Group Discriminant/Logit * Conjoint Analysis * Factor Analysis * Confirmatory Factor Analysis One Dependent Variable Variable Interdependence Interobject Similarity * Cluster Analysis * Multidimensional Scaling Dependence Technique Interdependence Technique Multivariate Techniques
  • 88. Correlation • The correlation, r, summarizes the strength of association between two metric (interval or ratio scaled) variables, say X and Y. • It is an index used to determine whether a linear or straight-line relationship exists between X and Y. • As it was originally proposed by Karl Pearson, it is also known as the Pearson correlation coefficient. It is also referred to as simple correlation, bivariate correlation, or merely the correlation coefficient.
  • 89. Factors influences correlation • Chance coincidence • Influence of third variable • Mutual influence
  • 90. Types of correlations • Positive/Negative correlation • Linear/Non-linear correlation • Simple/partial/multiple correlation • Simple correlation: x&y • Partial correlation: x&y where z is constant • Multiple correlation: more than 3 variables.
  • 91. Methods of correlation analysis • Scatter plot • Karl-Pearson correlation • Rank Correlation • Method of least square
  • 92. Correlation • r varies between -1.0 and +1.0. • The correlation coefficient between two variables will be the same regardless of their underlying units of measurement.
  • 93. Karl Pearson Coefficient of Correlation • Formula
  • 94. Calculate correlation coefficient (Karl Pearson coefficient of correlation) • Find correlation between unemployed and index of production? • Ans: r= Year Index of production Number unemployed (in lakhs) 1991 100 15 1992 102 12 1993 104 13 1994 107 11 1995 105 12 1996 112 12 1997 103 19 1998 99 26
  • 95. Calculate correlation coefficient (Karl Pearson coefficient of correlation) • Find correlation between Age and no. of sick days? • Ans: r= Employee Age sick days 1 30 1 2 32 0 3 35 2 4 40 5 5 48 2 6 50 4 7 52 6 8 55 5 9 57 7 10 61 8
  • 96. Spearman's Rank Correlation Where: Ρ=rank correlation coefficient di =difference between two ranks of each observation n= number of observations
  • 97. Rank correlation of following Year Index of production (x) Number unemployed (in lakhs) (y) 1991 100 15 1992 102 12 1993 104 13 1994 107 11 1995 105 12 1996 112 12 1997 103 19 1998 99 26 Employee Age (x) sick days (y) 1 30 1 2 32 0 3 35 2 4 40 5 5 48 2 6 50 4 7 52 6 8 55 5 9 57 7 10 61 8 ρ= ρ=
  • 98. Regression Analysis Regression analysis examines associative relationships between a metric dependent variable and one or more independent variables in the following ways: • Determine whether the independent variables explain a significant variation in the dependent variable: whether a relationship exists. • Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. • Determine the structure or form of the relationship: the mathematical equation relating the independent and dependent variables. • Predict the values of the dependent variable. • Control for other independent variables when evaluating the contributions of a specific variable or set of variables. • Regression analysis is concerned with the nature and degree of association between variables and does not imply or assume any causality.
  • 99. Formulas • y=mx+b • Y=dependent variable • X= independent variable • b= intercept • m=slope of line Slope of line Line intercept
  • 100. Linear regression of following Year Index of production (x) Number unemployed (in lakhs) (y) 1991 100 15 1992 102 12 1993 104 13 1994 107 11 1995 105 12 1996 112 12 1997 103 19 1998 99 26 Employee Age (x) sick days (y) 1 30 1 2 32 0 3 35 2 4 40 5 5 48 2 6 50 4 7 52 6 8 55 5 9 57 7 10 61 8