P G STAT 531 Lecture 9 Correlation

Lecture 9
Correlation
STAT-531
Data Analysis using Statistical Packages
Dr. Ashish. C. Patel
Assistant Professor,
Dept. of Animal Genetics & Breeding,
Veterinary College, Anand

Bivariate Population
• Generally we study one variable per member of
population at a time. Alternatively, we may have paired
variates for each member of the population where
every individual exhibits two values, one for each
variable. For example:
• In dairy cattle: Milk yield and fat yield,
Lactation days and milk yield,
Body weight and body length
• In Sheep: Birth weight and weaning weight,
Fleece yield and staple length,
Fiber density and wool yield
• In Poultry: Egg weight and egg number,
Egg weight and hatch weight

• A population wherein each individual has two values for
two different variables is known as bivariate
population.
• In bivariate population, out of the paired variates, one
variable may be cause and the other an effect.
• The cause variable is called independent variable and is
generally denoted by X while the effect variable is
referred to as the dependent variable and is denoted by
Y.
• When two variables are interdependent (without
specifying which one is dependent and which one is
independent), we say that the variables are related.
• Correlation: It is the study of association or degree of
relationship between the two variables under study

• Types of correlation
• i) Positive or Negative: The sign of correlation
depends upon the direction of variables.
• If both the variables vary in the same direction
i.e. if one increase other also increases or one
decrease other also decreases, correlation is said
to be positive.
• On the other hand if the variables vary in the
opposite directions i.e. if one increase other
decreases the correlation is said to be negative.

• Following is an example of positive Correlation:
Both the variables move in same direction
• Following is an example of negative
correlation: both the variables move in
opposite direction
X 20 30 40 60 80 X 100 90 60 40 30
Y 40 30 22 15 10 Y 10 20 30 40 50
X 10 12 15 18 20 X 80 70 60 40 30
Y 15 20 22 25 37 Y 50 44 30 20 10

ii) Simple, partial and multiple:
• It is based on the number of variables studied. When only
two variables are involved in the study of correlation, it is
called as the simple correlation.
• eg Feed in take and growth of animals,
Birth weight and number of piglets.
• When more than two variables are involved in the study, it
is either multiple or partial correlation.
• In multiple correlation, we study the correlation of one
dependant variable over all the other independent
variables, e.g. milk yield vs. first lactation period, food
supplied, age etc.
• In partial correlation, we study the relationship between
two variables, assuming that the other variables are
constant. e.g. correlation between the weight of broiler
and feed intake assuming the other factors like area
provided, labour used, medicinal cost etc. as constant.

iii) Linear and non linear:
• If the amount of change in one variable tends to bear
constant ratio to the amount of change in the other
variable then the correlation is said to be linear.
• The graphical representation of a linear correlation as a
straight line
• If the amount of change in one variable does not bear a
constant ratio to the amount of change in the other
variable then the correlation is said to be non linear or
curvilinear correlation.
• The graphical representation, It will not form a straight
line

Methods of studying correlation
• Scatter diagram
• Correlation Graph
• Karl Pearson's coefficient of correlation
• Concurrent deviation method and
• Rank Correlation

Scatter diagram
• The scatter diagram is a graphical presentation of a
bivariate population in which the values of independent
variable are represented on the X-axis and those of
dependent variable are represented on the Y-axis through
natural scale. For each of the pairs of values of the
variables a dot (.) is plotted on the graph paper. These
dots give an indication of the direction of the diagram.
• If the slope appears to be upward from the left bottom to
the right top, it indicates the sign of a positive correlation
between two variables.
• If the slope appears to be downward from the left top to
the right bottom, it indicates the sign of negative
correlation.
• However, if there is no any direction, either upward or
downward, it indicates the absence of correlation
between two variables under study.

a). Positive correlation b). Negative correlation
c) & d). No correlation

Merits of scatter diagram
• It is simple to draw the diagram and observe
correlation.
• From the shape of the scattered points we can
easily understand the pattern (linear or curvilinear),
direction (positive or negative) and perfection ( +1
or –1) of correlation.
• It can be used to detect unusual or abnormal
variations in the data.
Demerits of scatter diagram
• It does not provide the exact degree or extent of
correlation (numerical value) but only indicates the
relationship between two variables.
• It is not applicable for studying partial or multiple
correlations.

• CORRELATION GRAPH
• In this method, curves are plotted for the data on two
variables. By examining the direction and closeness of
the two curves so drawn, we can infer whether or not
the variables are related.
• This method is normally used for time series data.
However, like scatter diagram, this method also does not
offer any numerical value for coefficient of correlation.

KARL PEARSON'S COEFFICIENT OF CORRELATION
• Karl Pearson’s method, popularly known as
Pearsonian coefficient of correlation.
• The term correlation refers to relationship
between two variables, e.g., between height and
weight, feed consumption and weight gain,
rainfall and crop yield, etc.
• Simple correlation coefficient can be defined as
the measure of degree and direction of linear
relationship between two variables.
• It is a measure of interdependence between two
variables. It measures how the two variables co-
vary (vary together).

Formula
• Correlation coefficient between two variables X and Y is
obtained as the ratio of co-variance between X and Y to
the product of standard deviations of X and Y. Thus:
Rxy = )().(
var
YVarianceXVariance
ianceXYCo
=
YX SS
ianceXYCo var

Properties of correlation coefficient
• It ranges from –1 to +1.
• It is not affected either by the change of origin or change of scale.
• It is also not affected by the interchange of X and Y, i.e., rxy = ryx.
• It is geometric mean of the two regression coefficients, i.e., r =
• It is independent of unit of measurement, i.e. it has no unit (because
numerator and denominator both are in the same unit i.e. they are
standardized).
Limitations of Correlation Coefficient
• Sometimes correlation between two variables may occur due to the common
effect of a third factor.
• We may find a fair degree of correlation between two variables when actually
there is no relationship between them in actual practice. This may happen due
to small sample.
• The correlation coefficient measures the degree of linear relationship and a
low value of r rules out the possibility of a linear relationship, although a non-
linear relationship may exist between two variables as shown in the following
xyyx bb

Coefficient of Determination
• It is square of the simple correlation coefficient (r2)
• It measures the amount of variation in one variable that is
explained by another variable.
• In other words, it is defined as the ratio of the explained variation
to the total variation.
• Thus, if r = 0.8, r2 = 0.64, which means that 64 % of the variation in
the dependent variable has been explained by the independent
variable and the remaining 36 % of the variation is due to other
factors.
• The quantity r2 is always positive regardless of whether r is positive
or negative. Hence it does not show the direction of correlation
between two variables.
• The value of r2 decreases more rapidly than the value of r and the
value of r is always greater than r2, except under perfect correlation
when r = r2.

Coefficient of non-determination
• It is derived from the coefficient of determination and is one minus
the coefficient of determination. Symbolically represented by k2, it
is k2 = 1 – r2
Coefficient of alienation
• Symbolically represented by K, it is the square root of the
coefficient of non-determination.

Tests of Significant of Correlation Coefficient (t-test)
• H0: r = 0 then we apply t-test for knowing whether the given
sample from which the coefficient of correlation has been
obtained belongs to a population with r = 0.
We compare the calculated value of t with table value at (n – 2)
d.f. at desired level of significance.

Rank Correlation (Spearman’s Coefficient of Correlation)
• This correlation applies to data in the form of ranks.
• The data may be collected as ranks or may be ranked after
observations on some other scale.
• It measures correspondence between ranks and therefore is not
necessarily a measure of linear correlation.
The procedure consists of:
• Ranking the observations for each variable.
• Obtaining the differences in ranks of paired observations, d = Ri - Rj ,
where Ri and Rj refer to the ranks of first and second variables
respectively.
• Estimation of correlation by:

Test of significance of observed correlation coefficient

• https://www.socscistatistics.com/tests/spear
man/default2.aspx
• https://geographyfieldwork.com/SpearmansR
ankCalculator.html
• https://www.wessa.net/rwasp_spearman.was
p?outtype=Browser%20Black/White#output

P G STAT 531 Lecture 9 Correlation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Ähnlich wie P G STAT 531 Lecture 9 Correlation

Ähnlich wie P G STAT 531 Lecture 9 Correlation (20)

Mehr von Aashish Patel

Mehr von Aashish Patel (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

P G STAT 531 Lecture 9 Correlation