SlideShare ist ein Scribd-Unternehmen logo
1 von 48
DATA ANALYSIS – TESTING FOR
ASSOCIATION
Relationship :
 A consistent and systematic link between two or more variables
 While interpreting the relationship between variables following aspects are
taken into account :
1. Whether two or more variables are related at all i.e To measure
whether relationship is present vide concept of statistical
significance
2. If the relationship is present it is important to know the direction
which can be either Positive or Negative
3. Understanding strength of association
4. Type of relationship
Difference between Univariate and Bivariate
Univariate Data

Bivariate Data

•

involving a single variable

•

involving two variables

•

does not deal with causes or relationships

•

deals with causes or relationships

•

the major purpose of univariate analysis is to describe

•

the major purpose of bivariate analysis is to explain

•

central tendency - mean, mode, median

•

analysis of two variables simultaneously

•

dispersion - range, variance, max, min, quartiles, standard
deviation.

•

correlations

•

•

frequency distributions

comparisons, relationships, causes,
explanations

•

bar graph, histogram, pie chart, line
graph, box-and-whisker plot

•

tables where one variable is contingent on the values of the
other variable.

•

independent and dependent variables

Sample question: How many of the students in the freshman class Sample question: Is there a relationship between the number of
are female?
females in Computer Programming and their scores in
Mathematics?
1) To measure whether relationship is present vide concept of
statistical significance  Whether relation exist between two or more variables
 If we test for statistical significance and find that it exists then it is said
that relationship is present
 Stated another way , we say that knowledge about the behavior of one
variable allows us to make a useful prediction about the behavior of another
 For example :
If we found statistically significant relationship between the perceptions of the
quality of Santa Fe Grill food and satisfaction , we would say a relationship is
present and that perceptions of the quality of food will tell us what the
perception of satisfaction are likely to be
2) If the relationship is present it is important to know the direction
which can be either Positive or Negative
 Presence of relationship precedes direction
 The direction of relationship can either be positive or negative
For example :
Using Santa Fe Grill example we could say that a positive relationship
exists if respondents who rate the quality of food high also are
highly satisfied. Similarly , a negative relationship exists if
respondents say the speed of service is slow (low rating ) but they
are still satisfied (High rating)
3) Understanding strength of association
 In general categorize the strength of association as
a.
b.
c.
d.

Non existent
Weak
Moderate
Strong

 If a consistent and systematic relationship is not present then
the strength of association is nonexistent
 A weak association means there is low probability of
variables having relationship
 A strong association means there is high probability , a
consistent and systematic relationship exists
4) Type of relationship
 If we say two variables can be described as related, then we
would pose this as question “What is the nature of relationship”?
, How can the link between variables Y and X best be
described ?
 There are a number of different ways in which two variables (X
& Y) can share a relationship
 In the wake of finding answers to above questions following statistical
methodologies will be applied
a.Covariation
a.Chi Square Test
a.Correlation Coefficient
1. Pearson Correlation coefficient
2. Coefficient of determination
3. Spearman rank order correlation coefficient
a.Regression Analysis
COVARIATION :
 It is defined as amount of change in one variable that is consistently
related to the change in another variable of interest or degree of association
between two items/variables
 For example :
If we know DVD purchases are related to age ,then we want to know the
extent to which younger persons purchase more DVDs and ultimately which
types of DVDs
 If two variables are foound to change together on a reliable or consistent
basis then we can use that information to make predictions as well as
decisions on advertising and marketing strategies
 For example
Change in attitude towards Starbucks coffee advertising campaign as it
varies between light, medium and heavy consumers of Starbucks coffee
SCATTER PLOTS AND
CORRELATION


A scatter plot (or scatter diagram) is used to
show the relationship between two variables
SCATTER PLOT EXAMPLES
y

Linear
relationships

y

x
y

Curvilinear
relationships

x
y

x

x
SCATTER PLOT EXAMPLES
y

Strong
relationships

y

x
y

(continued)
Weak
relationships

x
y

x

x
SCATTER PLOT EXAMPLES
y

No
relationship

x
y

x

(continued)
Smoking and Lung Capacity

• We can see easily from the
graph that as smoking
goes up, lung capacity
tends to go down.
• The two variables covary
in opposite directions.
• We now examine two
statistics, covariance and
correlation, for quantifying
how variables covary.

Cigarettes (X)

Lung Capacity (Y)

0

45

5

42

10

33

15

31

20

29

50

40

Lung Capacity

One easy way to visually
describe covariation between
two variables is by using
SCATERRED DIAGRAM
which is graphic plot of the
relative position of two
variabkes using a horizontal
and a vertical axis to
represent the values of
respective variables

30

20
-10

Smoking

0

10

20

30
 The formula for calculating covariance of sample data is as follows :
x  = the independent variable
y  = the dependent variable
n  = number of data points in the sample
  = the mean of the independent variable x
  = the mean of the dependent variable y

 Example : To understand how covariance is used,
consider the table, which describes the rate of economic
growth (xi) and the rate of return on the S&P 500 (yi)
 Using the covariance formula, you can determine
whether economic growth and S&P 500 returns have a
positive or inverse relationship.
 Before you compute the covariance, calculate the mean
of x and y
A ) Now you can identify the variables
for the covariance formula as follows
x = 2.1, 2.5, 4.0, and 3.6 (economic
growth)
y = 8, 12, 14, and 10 (S&P 500 returns)
  = 3.1
  = 11
B) Substitute these values into the
covariance formula to determine the
relationship between economic growth
and S&P 500 returns.
Interpretation :
 The covariance between
the returns of the S&P 500
and economic growth is
1.53.
 Since the covariance is
positive, the variables are
positively related—they
move together in the same
direction
Smoking and Lung Capacity

• We can see easily from the
graph that as smoking
goes up, lung capacity
tends to go down.
• The two variables covary
in opposite directions.
• We now examine two
statistics, covariance and
correlation, for quantifying
how variables covary.

Cigarettes (X)

Lung Capacity (Y)

0

45

5

42

10

33

15

31

20

29

50

40

Lung Capacity

One easy way to visually
describe covariation between
two variables is by using
SCATERRED DIAGRAM
which is graphic plot of the
relative position of two
variabkes using a horizontal
and a vertical axis to
represent the values of
respective variables

30

20
-10

Smoking

0

10

20

30
Correlation :
 Correlation is another way to determine how two variables are related.
 In addition to telling you whether variables are positively or inversely related,
correlation also tells you the degree to which the variables tend to move together
 Correlation standardizes the measure of interdependence between two variables
and, consequently, tells you how closely the two variables move.
 The correlation measurement, called a correlation coefficient, will always take on
a value between 1 and – 1 called Pearson Correlation coefficient A) If the correlation coefficient is one
The variables have a perfect positive correlation.
This means that if one variable moves a given amount, the second moves
proportionally in the same direction.
A positive correlation coefficient less than one indicates a less than perfect positive
correlation, with the strength of the correlation growing as the number approaches
one.
B) If correlation coefficient is zero
No relationship exists between the variables
 If one variable moves, you can make no predictions about the
movement of the other variable; they are uncorrelated.
C) If correlation coefficient is –1
 The variables are perfectly negatively correlated (or inversely
correlated) and move in opposition to each other
 If one variable increases, the other variable decreases proportionally
 A negative correlation coefficient greater than –1 indicates a less than
perfect negative correlation, with the strength of the correlation
growing as the number approaches –1
 To calculate the correlation coefficient for two
variables, you would use the correlation
formula, shown below.

= correlation of the variables x and y
COV(x, y) = covariance of the variables x and y
sx = sample standard deviation of the random
variable x
sy = sample standard deviation of the random
variable y
x,y)

 To calculate correlation, you must know
the covariance for the two variables and the
standard deviations of each variable
 From the earlier example, you know that
the covariance of S&P 500 returns and
 Now you need to
determine the standard
deviation of each of the
variables
 You would calculate the
standard deviation of the
S&P 500 returns and the
economic growth
 Using the information
from above, you know that
COV(x,y) = 1.53
sx = 0.90
sy = 2.58
Now calculate the correlation coefficient by substituting the numbers
above into the correlation formula, as shown below.

A correlation coefficient of .66 tells you two important things:
•Because the correlation coefficient is a positive number, returns on
the S&P 500 and economic growth are postively related.
•Because .66 is relatively far from indicating no correlation, the
strength of the correlation between returns on the S&P 500 and
economic growth is strong
The coefficient of determination is the amount of variability in one measure
that is explained by the other measure
The coefficient of determination is the square of the correlation coefficient
(r2)
For example, if the correlation coefficient between two variables is r = 0.90, the
coefficient of determination is (0.90)2 = 0.81
Square of coefficient of correlation (Pearson correlation coefficient) gives
coefficient of determination given by r 2
This number ranges from .00 to 1.0 showing proportion variation explained or
accounted for in one variable by another
Spearman Rank Order correlation coefficient :
A statistical measure of linear association between two variables where
both have been measured using ordinal (rank order) scales
Example :
INTRODUCTION TO
REGRESSION ANALYSIS


Regression analysis is used to:
 Predict

the value of a dependent variable based on the
value of at least one independent variable

 Explain

the impact of changes in an independent
variable on the dependent variable

Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain
the dependent variable
SIMPLE LINEAR REGRESSION
MODEL


Only one independent variable, x



Relationship between x and y is described
by a linear function



Changes in y are assumed to be caused by
changes in x
TYPES OF REGRESSION MODELS
Positive Linear
Relationship

Negative Linear
Relationship

Relationship NOT Linear

No Relationship
POPULATION LINEAR REGRESSION
The population regression
model:
Population
Dependent
Variable

y intercept

Populatio
n Slope
Coefficien
t

Independen
t Variable

y = β0 + β1x + ε
Linear component

Rando
m Error
term, or
residual

Random Error
component
LINEAR REGRESSION
ASSUMPTIONS


Error values (ε) are statistically independent



Error values are normally distributed for any given
value of x



The probability distribution of the errors is normal



The probability distribution of the errors has
constant variance



The underlying relationship between the x variable
and the y variable is linear
POPULATION LINEAR REGRESSION

y

y = β0 + β1x + ε

(continued)

Observed Value
of y for xi

εi

Predicted
Value of y for
xi

Slope = β1
Random Error
for this x value

Intercept = β0

xi

x
ESTIMATED REGRESSION MODEL
The sample regression line provides an estimate
of the population regression line
Estimated
(or
predicted) y
value

Estimate of
the
regression
intercept

Estimate of the
regression
slope

ˆ
y i = b0 + b1x

Independen
t variable

The individual random error terms ei have a mean of
zero
LEAST SQUARES CRITERION


b0 and b1 are obtained by finding the values of b0
and b1 that minimize the sum of the squared
residuals

ˆ )2
∑ e = ∑ (y −y
2

=

∑ (y − (b

+ b1x))

2

0
THE LEAST SQUARES EQUATION


The formulas for b1 and b0 are:

b1

∑ ( x − x )( y − y )
=
∑ (x − x)
2

algebraic
equivalent:

b1 =

∑ x∑ y
∑ xy −
x2 −
∑

n
(∑ x ) 2
n

and

b0 = y − b1 x
INTERPRETATION OF THE
SLOPE AND THE INTERCEPT
b

is the estimated average value
of y when the value of x is zero
0

b

is the estimated change in the
average value of y as a result of a
one-unit change in x
1
FINDING THE LEAST
SQUARES EQUATION
The

coefficients b0 and b1 will
usually be found using computer
software, such as Excel or Minitab

Other

regression measures will also
be computed as part of computerbased regression analysis
SIMPLE LINEAR REGRESSION
EXAMPLE


A real estate agent wishes to examine the
relationship between the selling price of a home and
its size (measured in square feet)



A random sample of 10 houses is selected

Dependent
in $1000s

variable (y) = house price

Independent

variable (x) = square feet
SAMPLE DATA FOR HOUSE
PRICE MODEL
House Price in $1000s
(y)

Square Feet
(x)

245

1400

312

1600

279

1700

308

1875

199

1100

219

1550

405

2350

324

2450

319

1425

255

1700
REGRESSION USING EXCEL


Tools / Data Analysis / Regression
EXCEL OUTPUT
Regression Statistics
Multiple R

0.76211

R Square

0.58082

Adjusted R
Square

The regression equation
is:
house price = 98.24833 + 0.10977 (square feet)

0.52842

Standard Error

41.33032

Observations

ANOVA

10

df

SS

MS

F
11.084
8

Regression

1

18934.9348

18934.934
8

Residual

8

13665.5652

1708.1957

Total

9

Significance
F

32600.5000

Coefficien
ts

Standard Error

t Stat

Pvalue
0.1289

0.01039

Lower 95%

Upper
95%
232.0738
GRAPHICAL PRESENTATION
House price model: scatter plot and regression
line

Intercep
t
= 98.248

House Price ($1000s)



450
400
350
300
250
200
150
100
50
0

Slope
= 0.10977

0

500

1000

1500

2000

2500

3000

Square Feet

house price = 98.24833 + 0.10977 (square feet)
INTERPRETATION OF THE
INTERCEPT, B0

house price = 98.24833 + 0.10977 (square feet)


b0 is the estimated average value of Y when the value
of X is zero (if x = 0 is in the range of observed x
values)
 Here,

no houses had 0 square feet, so b0 = 98.24833 just
indicates that, for houses within the range of sizes
observed, $98,248.33 is the portion of the house price not
explained by square feet
INTERPRETATION OF THE
SLOPE COEFFICIENT, B1

house price = 98.24833 + 0.10977 (square feet)
b

measures the estimated change
in the average value of Y as a result
of a one-unit change in X
1

 Here,

b1 = .10977 tells us that the average value of a house
increases by .10977($1000) = $109.77, on average, for each
additional one square foot of size
LEAST SQUARES REGRESSION
PROPERTIES
 The

sum of the residuals from the least
ˆ
squares regression line is 0 ( ∑ ( y − y ) = 0 )

 The

sum of the squared residuals is a
ˆ
( y −y)2 )
minimum (minimized ∑

 The

simple regression line always passes
through the mean of the y variable and the
mean of the x variable

 The

least squares coefficients are unbiased

estimates of β0 and β1
EXPLAINED AND
UNEXPLAINED VARIATION


Total variation is made up of two parts:

SST =
Total sum
of Squares

SST = ∑ ( y − y )2

SSE +
Sum of
Squares Error

ˆ
SSE = ∑ ( y − y )2

SSR
Sum of
Squares
Regression

ˆ
SSR = ∑ ( y − y )2

where:

y = Average value of the dependent variable
y = Observed values of the dependent variable
ˆ
y = Estimated value of y for the given x value
EXPLAINED AND
UNEXPLAINED VARIATION
(continued)


SST = total sum of squares
 Measures

the variation of the yi values around their mean

y


SSE = error sum of squares
 Variation

attributable to factors other than the
relationship between x and y



SSR = regression sum of squares
 Explained

variation attributable to the relationship
between x and y
EXPLAINED AND
UNEXPLAINED VARIATION
(continued)

y
yi

∧
SSE = ∑(yi - yi )

_

∧
y

∧
y

2

SST = ∑(yi - y)2
∧ _ 2
SSR = ∑(yi - y)

_
y

Xi

_
y

x
THANKS……

Weitere ähnliche Inhalte

Was ist angesagt?

Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)HennaAnsari
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Descriptiongetyourcheaton
 
Medical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsMedical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsRamachandra Barik
 
Statistical Analysis Overview
Statistical Analysis OverviewStatistical Analysis Overview
Statistical Analysis OverviewEcumene
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionHuma Ansari
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to StatisticsSaurav Shrestha
 
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...Universidad Particular de Loja
 
Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)jasondeveau
 
Scale of measurement
Scale of measurementScale of measurement
Scale of measurementHennaAnsari
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statisticsguest290abe
 
descriptive data analysis
 descriptive data analysis descriptive data analysis
descriptive data analysisgnanasarita1
 
Univariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi squareUnivariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi squarekongara
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and ProbabilityBhavana Singh
 

Was ist angesagt? (16)

Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Description
 
Medical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsMedical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statistics
 
Multivariate
MultivariateMultivariate
Multivariate
 
Statistical Analysis Overview
Statistical Analysis OverviewStatistical Analysis Overview
Statistical Analysis Overview
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...
 
Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Scale of measurement
Scale of measurementScale of measurement
Scale of measurement
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
descriptive data analysis
 descriptive data analysis descriptive data analysis
descriptive data analysis
 
Chapter 14
Chapter 14 Chapter 14
Chapter 14
 
Univariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi squareUnivariate, bivariate analysis, hypothesis testing, chi square
Univariate, bivariate analysis, hypothesis testing, chi square
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and Probability
 

Ähnlich wie Data analysis test for association BY Prof Sachin Udepurkar

Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlationRashid Hussain
 
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docx
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docxReferenceArticleModule 18 Correlational ResearchMagnitude,.docx
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docxlorent8
 
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docxrhetttrevannion
 
Correlation Studies - Descriptive Studies
Correlation Studies - Descriptive StudiesCorrelation Studies - Descriptive Studies
Correlation Studies - Descriptive StudiesSalmaAsghar4
 
36030 Topic Discussion1Number of Pages 2 (Double Spaced).docx
36030 Topic Discussion1Number of Pages 2 (Double Spaced).docx36030 Topic Discussion1Number of Pages 2 (Double Spaced).docx
36030 Topic Discussion1Number of Pages 2 (Double Spaced).docxrhetttrevannion
 
Class 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptxClass 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptxCallplanetsDeveloper
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdfAadarshSah1
 
An-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptx
An-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptxAn-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptx
An-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptxShriramKargaonkar
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysisAwais Salman
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionMOHIT PANCHAL
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation AnalysisSaqib Ali
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notesJapheth Muthama
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis Misab P.T
 
Correlation IN STATISTICS
Correlation IN STATISTICSCorrelation IN STATISTICS
Correlation IN STATISTICSKriace Ward
 
Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression imptfreelancer
 
Simple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfSimple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfyadavrahulrahul799
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
01 psychological statistics 1
01 psychological statistics 101 psychological statistics 1
01 psychological statistics 1Noushad Feroke
 

Ähnlich wie Data analysis test for association BY Prof Sachin Udepurkar (20)

Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlation
 
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docx
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docxReferenceArticleModule 18 Correlational ResearchMagnitude,.docx
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docx
 
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
 
Correlation Studies - Descriptive Studies
Correlation Studies - Descriptive StudiesCorrelation Studies - Descriptive Studies
Correlation Studies - Descriptive Studies
 
36030 Topic Discussion1Number of Pages 2 (Double Spaced).docx
36030 Topic Discussion1Number of Pages 2 (Double Spaced).docx36030 Topic Discussion1Number of Pages 2 (Double Spaced).docx
36030 Topic Discussion1Number of Pages 2 (Double Spaced).docx
 
Class 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptxClass 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptx
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
 
S1 pb
S1 pbS1 pb
S1 pb
 
An-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptx
An-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptxAn-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptx
An-Introduction-to-Correlation-and-Linear-Regression FYBSc(IT) SNK.pptx
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notes
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 
Correlation IN STATISTICS
Correlation IN STATISTICSCorrelation IN STATISTICS
Correlation IN STATISTICS
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression impt
 
Simple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfSimple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdf
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
01 psychological statistics 1
01 psychological statistics 101 psychological statistics 1
01 psychological statistics 1
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Data analysis test for association BY Prof Sachin Udepurkar

  • 1.
  • 2. DATA ANALYSIS – TESTING FOR ASSOCIATION Relationship :  A consistent and systematic link between two or more variables  While interpreting the relationship between variables following aspects are taken into account : 1. Whether two or more variables are related at all i.e To measure whether relationship is present vide concept of statistical significance 2. If the relationship is present it is important to know the direction which can be either Positive or Negative 3. Understanding strength of association 4. Type of relationship
  • 3. Difference between Univariate and Bivariate Univariate Data Bivariate Data • involving a single variable • involving two variables • does not deal with causes or relationships • deals with causes or relationships • the major purpose of univariate analysis is to describe • the major purpose of bivariate analysis is to explain • central tendency - mean, mode, median • analysis of two variables simultaneously • dispersion - range, variance, max, min, quartiles, standard deviation. • correlations • • frequency distributions comparisons, relationships, causes, explanations • bar graph, histogram, pie chart, line graph, box-and-whisker plot • tables where one variable is contingent on the values of the other variable. • independent and dependent variables Sample question: How many of the students in the freshman class Sample question: Is there a relationship between the number of are female? females in Computer Programming and their scores in Mathematics?
  • 4. 1) To measure whether relationship is present vide concept of statistical significance  Whether relation exist between two or more variables  If we test for statistical significance and find that it exists then it is said that relationship is present  Stated another way , we say that knowledge about the behavior of one variable allows us to make a useful prediction about the behavior of another  For example : If we found statistically significant relationship between the perceptions of the quality of Santa Fe Grill food and satisfaction , we would say a relationship is present and that perceptions of the quality of food will tell us what the perception of satisfaction are likely to be
  • 5. 2) If the relationship is present it is important to know the direction which can be either Positive or Negative  Presence of relationship precedes direction  The direction of relationship can either be positive or negative For example : Using Santa Fe Grill example we could say that a positive relationship exists if respondents who rate the quality of food high also are highly satisfied. Similarly , a negative relationship exists if respondents say the speed of service is slow (low rating ) but they are still satisfied (High rating)
  • 6. 3) Understanding strength of association  In general categorize the strength of association as a. b. c. d. Non existent Weak Moderate Strong  If a consistent and systematic relationship is not present then the strength of association is nonexistent  A weak association means there is low probability of variables having relationship  A strong association means there is high probability , a consistent and systematic relationship exists
  • 7. 4) Type of relationship  If we say two variables can be described as related, then we would pose this as question “What is the nature of relationship”? , How can the link between variables Y and X best be described ?  There are a number of different ways in which two variables (X & Y) can share a relationship
  • 8.  In the wake of finding answers to above questions following statistical methodologies will be applied a.Covariation a.Chi Square Test a.Correlation Coefficient 1. Pearson Correlation coefficient 2. Coefficient of determination 3. Spearman rank order correlation coefficient a.Regression Analysis
  • 9. COVARIATION :  It is defined as amount of change in one variable that is consistently related to the change in another variable of interest or degree of association between two items/variables  For example : If we know DVD purchases are related to age ,then we want to know the extent to which younger persons purchase more DVDs and ultimately which types of DVDs  If two variables are foound to change together on a reliable or consistent basis then we can use that information to make predictions as well as decisions on advertising and marketing strategies  For example Change in attitude towards Starbucks coffee advertising campaign as it varies between light, medium and heavy consumers of Starbucks coffee
  • 10. SCATTER PLOTS AND CORRELATION  A scatter plot (or scatter diagram) is used to show the relationship between two variables
  • 14. Smoking and Lung Capacity • We can see easily from the graph that as smoking goes up, lung capacity tends to go down. • The two variables covary in opposite directions. • We now examine two statistics, covariance and correlation, for quantifying how variables covary. Cigarettes (X) Lung Capacity (Y) 0 45 5 42 10 33 15 31 20 29 50 40 Lung Capacity One easy way to visually describe covariation between two variables is by using SCATERRED DIAGRAM which is graphic plot of the relative position of two variabkes using a horizontal and a vertical axis to represent the values of respective variables 30 20 -10 Smoking 0 10 20 30
  • 15.  The formula for calculating covariance of sample data is as follows : x  = the independent variable y  = the dependent variable n  = number of data points in the sample   = the mean of the independent variable x   = the mean of the dependent variable y  Example : To understand how covariance is used, consider the table, which describes the rate of economic growth (xi) and the rate of return on the S&P 500 (yi)  Using the covariance formula, you can determine whether economic growth and S&P 500 returns have a positive or inverse relationship.
  • 16.  Before you compute the covariance, calculate the mean of x and y A ) Now you can identify the variables for the covariance formula as follows x = 2.1, 2.5, 4.0, and 3.6 (economic growth) y = 8, 12, 14, and 10 (S&P 500 returns)   = 3.1   = 11 B) Substitute these values into the covariance formula to determine the relationship between economic growth and S&P 500 returns.
  • 17. Interpretation :  The covariance between the returns of the S&P 500 and economic growth is 1.53.  Since the covariance is positive, the variables are positively related—they move together in the same direction
  • 18. Smoking and Lung Capacity • We can see easily from the graph that as smoking goes up, lung capacity tends to go down. • The two variables covary in opposite directions. • We now examine two statistics, covariance and correlation, for quantifying how variables covary. Cigarettes (X) Lung Capacity (Y) 0 45 5 42 10 33 15 31 20 29 50 40 Lung Capacity One easy way to visually describe covariation between two variables is by using SCATERRED DIAGRAM which is graphic plot of the relative position of two variabkes using a horizontal and a vertical axis to represent the values of respective variables 30 20 -10 Smoking 0 10 20 30
  • 19. Correlation :  Correlation is another way to determine how two variables are related.  In addition to telling you whether variables are positively or inversely related, correlation also tells you the degree to which the variables tend to move together  Correlation standardizes the measure of interdependence between two variables and, consequently, tells you how closely the two variables move.  The correlation measurement, called a correlation coefficient, will always take on a value between 1 and – 1 called Pearson Correlation coefficient A) If the correlation coefficient is one The variables have a perfect positive correlation. This means that if one variable moves a given amount, the second moves proportionally in the same direction. A positive correlation coefficient less than one indicates a less than perfect positive correlation, with the strength of the correlation growing as the number approaches one.
  • 20. B) If correlation coefficient is zero No relationship exists between the variables  If one variable moves, you can make no predictions about the movement of the other variable; they are uncorrelated. C) If correlation coefficient is –1  The variables are perfectly negatively correlated (or inversely correlated) and move in opposition to each other  If one variable increases, the other variable decreases proportionally  A negative correlation coefficient greater than –1 indicates a less than perfect negative correlation, with the strength of the correlation growing as the number approaches –1
  • 21.  To calculate the correlation coefficient for two variables, you would use the correlation formula, shown below. = correlation of the variables x and y COV(x, y) = covariance of the variables x and y sx = sample standard deviation of the random variable x sy = sample standard deviation of the random variable y x,y)  To calculate correlation, you must know the covariance for the two variables and the standard deviations of each variable  From the earlier example, you know that the covariance of S&P 500 returns and
  • 22.  Now you need to determine the standard deviation of each of the variables  You would calculate the standard deviation of the S&P 500 returns and the economic growth  Using the information from above, you know that COV(x,y) = 1.53 sx = 0.90 sy = 2.58
  • 23. Now calculate the correlation coefficient by substituting the numbers above into the correlation formula, as shown below. A correlation coefficient of .66 tells you two important things: •Because the correlation coefficient is a positive number, returns on the S&P 500 and economic growth are postively related. •Because .66 is relatively far from indicating no correlation, the strength of the correlation between returns on the S&P 500 and economic growth is strong
  • 24. The coefficient of determination is the amount of variability in one measure that is explained by the other measure The coefficient of determination is the square of the correlation coefficient (r2) For example, if the correlation coefficient between two variables is r = 0.90, the coefficient of determination is (0.90)2 = 0.81 Square of coefficient of correlation (Pearson correlation coefficient) gives coefficient of determination given by r 2 This number ranges from .00 to 1.0 showing proportion variation explained or accounted for in one variable by another
  • 25. Spearman Rank Order correlation coefficient : A statistical measure of linear association between two variables where both have been measured using ordinal (rank order) scales Example :
  • 26. INTRODUCTION TO REGRESSION ANALYSIS  Regression analysis is used to:  Predict the value of a dependent variable based on the value of at least one independent variable  Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable
  • 27. SIMPLE LINEAR REGRESSION MODEL  Only one independent variable, x  Relationship between x and y is described by a linear function  Changes in y are assumed to be caused by changes in x
  • 28. TYPES OF REGRESSION MODELS Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship
  • 29. POPULATION LINEAR REGRESSION The population regression model: Population Dependent Variable y intercept Populatio n Slope Coefficien t Independen t Variable y = β0 + β1x + ε Linear component Rando m Error term, or residual Random Error component
  • 30. LINEAR REGRESSION ASSUMPTIONS  Error values (ε) are statistically independent  Error values are normally distributed for any given value of x  The probability distribution of the errors is normal  The probability distribution of the errors has constant variance  The underlying relationship between the x variable and the y variable is linear
  • 31. POPULATION LINEAR REGRESSION y y = β0 + β1x + ε (continued) Observed Value of y for xi εi Predicted Value of y for xi Slope = β1 Random Error for this x value Intercept = β0 xi x
  • 32. ESTIMATED REGRESSION MODEL The sample regression line provides an estimate of the population regression line Estimated (or predicted) y value Estimate of the regression intercept Estimate of the regression slope ˆ y i = b0 + b1x Independen t variable The individual random error terms ei have a mean of zero
  • 33. LEAST SQUARES CRITERION  b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared residuals ˆ )2 ∑ e = ∑ (y −y 2 = ∑ (y − (b + b1x)) 2 0
  • 34. THE LEAST SQUARES EQUATION  The formulas for b1 and b0 are: b1 ∑ ( x − x )( y − y ) = ∑ (x − x) 2 algebraic equivalent: b1 = ∑ x∑ y ∑ xy − x2 − ∑ n (∑ x ) 2 n and b0 = y − b1 x
  • 35. INTERPRETATION OF THE SLOPE AND THE INTERCEPT b is the estimated average value of y when the value of x is zero 0 b is the estimated change in the average value of y as a result of a one-unit change in x 1
  • 36. FINDING THE LEAST SQUARES EQUATION The coefficients b0 and b1 will usually be found using computer software, such as Excel or Minitab Other regression measures will also be computed as part of computerbased regression analysis
  • 37. SIMPLE LINEAR REGRESSION EXAMPLE  A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)  A random sample of 10 houses is selected Dependent in $1000s variable (y) = house price Independent variable (x) = square feet
  • 38. SAMPLE DATA FOR HOUSE PRICE MODEL House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700
  • 39. REGRESSION USING EXCEL  Tools / Data Analysis / Regression
  • 40. EXCEL OUTPUT Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square The regression equation is: house price = 98.24833 + 0.10977 (square feet) 0.52842 Standard Error 41.33032 Observations ANOVA 10 df SS MS F 11.084 8 Regression 1 18934.9348 18934.934 8 Residual 8 13665.5652 1708.1957 Total 9 Significance F 32600.5000 Coefficien ts Standard Error t Stat Pvalue 0.1289 0.01039 Lower 95% Upper 95% 232.0738
  • 41. GRAPHICAL PRESENTATION House price model: scatter plot and regression line Intercep t = 98.248 House Price ($1000s)  450 400 350 300 250 200 150 100 50 0 Slope = 0.10977 0 500 1000 1500 2000 2500 3000 Square Feet house price = 98.24833 + 0.10977 (square feet)
  • 42. INTERPRETATION OF THE INTERCEPT, B0 house price = 98.24833 + 0.10977 (square feet)  b0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values)  Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet
  • 43. INTERPRETATION OF THE SLOPE COEFFICIENT, B1 house price = 98.24833 + 0.10977 (square feet) b measures the estimated change in the average value of Y as a result of a one-unit change in X 1  Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size
  • 44. LEAST SQUARES REGRESSION PROPERTIES  The sum of the residuals from the least ˆ squares regression line is 0 ( ∑ ( y − y ) = 0 )  The sum of the squared residuals is a ˆ ( y −y)2 ) minimum (minimized ∑  The simple regression line always passes through the mean of the y variable and the mean of the x variable  The least squares coefficients are unbiased estimates of β0 and β1
  • 45. EXPLAINED AND UNEXPLAINED VARIATION  Total variation is made up of two parts: SST = Total sum of Squares SST = ∑ ( y − y )2 SSE + Sum of Squares Error ˆ SSE = ∑ ( y − y )2 SSR Sum of Squares Regression ˆ SSR = ∑ ( y − y )2 where: y = Average value of the dependent variable y = Observed values of the dependent variable ˆ y = Estimated value of y for the given x value
  • 46. EXPLAINED AND UNEXPLAINED VARIATION (continued)  SST = total sum of squares  Measures the variation of the yi values around their mean y  SSE = error sum of squares  Variation attributable to factors other than the relationship between x and y  SSR = regression sum of squares  Explained variation attributable to the relationship between x and y
  • 47. EXPLAINED AND UNEXPLAINED VARIATION (continued) y yi ∧ SSE = ∑(yi - yi ) _ ∧ y ∧ y 2 SST = ∑(yi - y)2 ∧ _ 2 SSR = ∑(yi - y) _ y Xi _ y x

Hinweis der Redaktion

  1. {}