Unit-I, BP801T. BIOSTATISITCS AND RESEARCH METHODOLOGY (Theory)
Correlation: Definition, Karl Pearson’s coefficient of correlation, Multiple correlations -
Pharmaceuticals examples.
Correlation: is there a relationship between 2
variables.
3. Relationships between Variables
• Common aim of research is to try to relate a
variable of interest to one or more other
variables demonstrate an association
between variables (not necessarily causal)
• Ex: dose of drug and resulting systolic blood
pressure, advertising expenditure of a
company and end of year sales figures, etc.
• Establish a theoretical model to predict the
value of one variable from a number of
known factors
4. Dependent and Independent Variables
• The independent variable is the variable
which is under the investigator’s control
(denoted x)
• the dependent variable is the one which
the investigator is trying to estimate or
predict (denoted y)
• Can a relationship be used to predict what
happens to y as x changes (ie what
happens to the dependent variable as the
independent variable changes)?
7. The relationship between x and y
• Correlation: is there a relationship between 2
variables?
• Regression: how well a certain independent variable
predict dependent variable?
• Regression: a measure of the relation between
the mean value of one variable (e.g. output) and
corresponding values of other variables (e.g.
time and cost).
• CORRELATION CAUSATION
– In order to infer causality: manipulate independent variable
and observe effect on dependent variable
8. Correlation
• Correlation is the study of relationship between two or
variables
• Correlation methods are used to measure the association of
two or more variables
• Two values are related means, that one variable may be
predicted from the knowledge of the other.
• Ex: One predict the dissolution of tablets based on the
hardness
• Correlation is applied to the relationship of continuous
variables.
9. Correlation
A correlation is a relationship between two variables.
The data can be represented by the ordered pairs (x, y)
where x is the independent (or explanatory) variable,
and y is the dependent (or response) variable.
x
2 4
–2
– 4
y
2
6
x 1 2 3 4 5
y – 4 – 2 – 1 0 2
Example:
10.
11. Correlation key concepts:
Types of correlation Methods of studying
correlation
a) Scatter diagram
b) Karl pearson’s coefficient of correlation
c) Spearman’s Rank correlation coefficient
d) Method of least square
12. Scatter Plot
• Visual representation of scores.
• Each individual score is represented by a single
point on the graph.
• Allows you to see any patterns of trends that exist
in the data.
• A scatter plot can be used to determine whether
a linear (straight line) correlation exists between
two variables.
18. Linear Correlation
x
y
Negative Linear Correlation
x
y
No Correlation
x
y
Positive Linear Correlation
x
y
Nonlinear Correlation
As x increases, y
tends to decrease.
As x increases, y
tends to increase.
19.
20.
21. Correlation Coefficient
Statistic showing the degree of relation between two
variables
The correlation coefficient is a measure of the strength and the
direction of a linear relationship between two variables.
The symbol r represents the sample correlation coefficient.
The formula for r is
( )( )
( ) ( )
2 2
2 2
.
n xy x y
r
n x x n y y
−
=
− −
The range of the correlation coefficient is −1 to 1. If x and y have a
strong positive linear correlation, r is close to 1. If x and y have a
strong negative linear correlation, r is close to −1. If there is no linear
correlation or a weak linear correlation, r is close to 0.
22. Linear Correlation
x
y
Strong negative correlation
x
y
Weak positive correlation
x
y
Strong positive correlation
x
y
Nonlinear Correlation
r = −0.91 r = 0.88
r = 0.42
r = 0.07
23. Calculating a Correlation Coefficient
1. Find the sum of the x-values.
2. Find the sum of the y-values.
3. Multiply each x-value by its corresponding
y-value and find the sum.
4. Square each x-value and find the sum.
5. Square each y-value and find the sum.
6. Use these five sums to calculate the
correlation coefficient.
Continued.
Calculating a Correlation Coefficient
In Words In Symbols
x
y
xy
2
x
2
y
( )( )
( ) ( )
2 2
2 2
.
n xy x y
r
n x x n y y
−
=
− −
24. Correlation Coefficient
Example:
Calculate the correlation coefficient r for the following data.
x y xy x2 y2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
15
x
= 1
y
= − 9
xy
= 2
55
x
= 2
15
y
=
( )( )
( ) ( )
2 2
2 2
n xy x y
r
n x x n y y
−
=
− −
( )( )
( )2
2
5(9) 15 1
5(55) 15 5(15) 1
− −
=
− − −
60
50 74
= 0.986
There is a strong positive
linear correlation between x
and y.
25. Correlation Coefficient
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
Example:
The following data represents the number of hours 12
different students watched television during the
weekend and the scores of each student who took a test
the following Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.
Continued.
26. Correlation Coefficient
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
Example continued:
100
x
y
Hours watching TV
Test
score
80
60
40
20
2 4 6 8 10
Continued.
27. Correlation Coefficient
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500
Example continued:
( )( )
( ) ( )
2 2
2 2
n xy x y
r
n x x n y y
−
=
− −
( )( )
( )2
2
12(3724) 54 908
12(332) 54 12(70836) 908
−
=
− −
0.831
−
There is a strong negative linear correlation.
As the number of hours spent watching TV increases, the test
scores tend to decrease.
54
x
= 908
y
= 3724
xy
= 2
332
x
= 2
70836
y
=
28. Simple Correlation coefficient (r)
➢It is also called Pearson's correlation or
product moment correlation coefficient.
➢It measures the nature and strength between
two variables of the quantitative type.
The sign of r denotes the nature of
association
while the value of r denotes the strength of
association.
29. ➢If the sign is +ve this means the relation
is direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one
variable is associated with a
decrease in the other variable).
➢While if the sign is -ve this means an
inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).
30. ➢ The value of r ranges between ( -1) and ( +1)
➢ The value of r denotes the strength of the
association as illustrated by the following
diagram.
-1 1
0
-0.25
-0.75 0.75
0.25
strong strong
intermediate intermediate
weak weak
no relation
perfect
correlation
perfect
correlation
Direct
indirect
31. If r = Zero this means no association or
correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If 0.25 ≤ r < 0.75 = intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation.
32.
33. Karl Pearson's correlation coefficient
• Karl Pearson's correlation coefficient measures
degree of linear relationship between two variables
• If the relationship between two variables X and Y is
to be ascertained through Karl Pearson method ,
then the following formula is used:
• The value of the coefficient of correlation always
lies between ±1
34.
35. Multiple correlation
• Coefficient of multiple correlation is a
measure of how well a given variable can be
predicted using a linear function of a set of
other variables.
• It is the correlation between the variable's
values and the best predictions that can be
computed linearly from the predictive
variables.
• The coefficient of multiple correlation is known
as the square root of the coefficient of
determination,
36. Multiple correlation
• The coefficient of multiple correlation takes
values between 0 and 1.
• Higher values indicate higher predictability of
the dependent variable from the independent
variables, with a value of 1 indicating that the
predictions are exactly correct and a value of
0 indicating that no linear combination of the
independent variables is a better predictor
than is the fixed mean of the dependent
variable