SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
CORRELATION
Avjinder Singh Kaler and Kristi Mai
 Linear correlation coefficient, r, is a number that measures how well
paired sample data fit a straight-line pattern when graphed.
 Using paired sample data (sometimes called bivariate data), we find
the value of r (usually using technology), then we use that value to
conclude that there is (or is not) a linear correlation between the two
variables.
 In this section we will consider only linear relationships, which means
that when graphed, the points approximate a straight-line pattern.
 We will discuss methods of hypothesis testing for correlation.
Correlation – a correlation exists between two variables when the
values of one variable are somehow associated with the values of the
other variable.
• Can be positive, negative, non-existent, or non-linear
• A linear correlation exists between two variables when there is a
correlation and the plotted points of paired data result in a pattern that
can be approximated by a straight line.
We can often see a relationship between two variables by
constructing a scatterplot.
Scatter plots of paired data
1. The sample of paired data is a Simple Random Sample of
quantitative data
2. The pairs of data ( 𝑥,𝑦) have a bivariate normal distribution, meaning
the following:
• Visual examination of the scatter plot(s) confirms that the sample points
follow an approximately straight line(s)
• Because results can be strongly affected by the presence of outliers, any
outliers should be removed if they are known to be errors (Note: Use caution
when removing data points)
Note: These are the same as the Requirements for Simple Linear
Regression.
• Linear Correlation Coefficient (𝑟) – measures the strength of the linear
correlation between the paired quantitative 𝑥 and 𝑦 values in a sample
• Also known as the Pearson Product Moment Correlation Coefficient in honor of
Karl Pearson
• This is a Sample Statistic of the correlation that is linear between 𝑥 and 𝑦
• If this value is squared, the value is the Coefficient of Determination ( 𝑟2)
• Notation:
• 𝑟 : linear correlation coefficient for sample data
• 𝜌 : linear correlation coefficient for a population of paired data
• Formula for calculating 𝑟: 𝑟 =
𝑛 Σ𝑥𝑦 − Σ𝑥 ∗(Σ𝑦)
𝑛(Σ𝑥2)− Σ𝑥 2∗ 𝑛(Σ𝑦2)− Σ𝑦 2
1. – 1 ≤ r ≤ 1
2. If all values of either variable are converted to a different scale, the
value of r does not change.
3. The value of r is not affected by the choice of x and y.
4. r measures strength of a linear relationship.
5. r is very sensitive to outliers
• A single outlier can dramatically affect the value of r
• The value of 𝑟2 is the proportion of the variation in 𝑦 that is explained by
the linear relationship that exists between 𝑥 and 𝑦
• Thus, 𝑟2 is, also, the amount of variation in 𝑦 that is explained by the
regression line itself
• We may use 𝑟2 to describe the predictive power of the regression
equation
• To conclude that correlation implies causality
• Using data based on averages
• This type of data causes an inflated correlation coefficient
• To conclude that if there is no linear correlation, there is no correlation
at all
Hypotheses:
𝐻0: 𝜌 = 0
𝐻1: 𝜌 ≠ 0
(𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)
(𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠)
These hypotheses can be equivalently tested with the following
hypotheses:
𝐻0: 𝛽1 = 0
𝐻1: 𝛽1 ≠ 0
(𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)
(𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠)
Note: This equivalence will be important for the interpretation of the
technological output.
• Use Critical Value from Table A-6 (this is a simpler approach) and think
of the Linear Correlation Coefficient (𝑟) as a ‘test statistic’
• OR, use the following t-score test statistic with 𝑑𝑓 = 𝑛 − 2
𝑡 =
𝑟
1 − 𝑟2
𝑛 − 2
• This 𝑡 test statistic can be viewed in most technological output
corresponding to the test of significance for the slope in the regression
line (i.e. the second set of equivalent hypotheses listed above)
• If using critical values from Table A-6:
𝐼𝑓 𝑟 > 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0
𝐼𝑓 𝑟 ≤ 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒, 𝑓𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0
This test is obviously a two-tailed hypothesis test based on the alternative
hypothesis. Visualize this by plotting the possible values for 𝑟 on a number line
with a labeled critical region.
• If using the t-score test statistic:
• Use statistical software to calculate the correct p-value that corresponds
with the test statistic. Then, base the conclusion on comparison between
the p-value and 𝛼
One-tailed tests can occur with a claim of a positive linear correlation or a
claim of a negative linear correlation. In such cases, the hypotheses will be as
shown here.
For these one-tailed tests, the P-value method can be used as well.
• Construct a scatter plot and verify that the pattern of the points is
approximately a straight line pattern without outliers
• Assess the linear correlation between two variables of interest and
create a regression equation
• Consider any effects of a pattern over time
• Perform a Residual Analysis:
• Construct a residual plot and verify that there is no pattern (other than a
straight line pattern) and also verify that the residual plot does not become
thicker or thinner
• Use a histogram, normal quantile plot, or Shapiro Wilk test of normality to
confirm that the values of the residuals have a distribution that is
approximately normal
• Measurement Error – could be described as ‘explainable’ outliers
• Nonlinear Associations – ignoring possible nonlinear relationships
• Extrapolation – predicting far beyond the scope of our available
data
The paired shoe / height data from five males are listed below.
Using StatCrunch, find the value of the correlation coefficient r.
Requirement Check:
The data are a simple random sample of quantitative data, the plotted
points appear to roughly approximate a straight-line pattern, and there
are no outliers.
A few technologies are displayed below, used to calculate the
value of r.
We found previously for the shoe and height example that r = 0.591.
With r = 0.591, we get r2 = 0.349.
We conclude that about 34.9% of the variation in height can be
explained by the linear relationship between lengths of shoe prints and
heights.
Conduct a formal hypothesis test of the claim that there is a linear
correlation between the two variables.
Use a 0.05 significance level.
We test the claim:
𝐻0: 𝛽1 = 0
𝐻1: 𝛽1 ≠ 0
(𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)
(𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠)
We calculate the test statistic:
Table A-3 shows this test statistic yields a p-value that is greater
than 0.20.
2 2
0.591
1.269
1 1 0.591
2 5 2
r
t
r
n
  
 
 
StatCrunch provides a
P-value of 0.2937.
Because the p-value of 0.2937 is greater than the significance level of 0.05,
we fail to reject the null hypothesis.
We conclude there is not sufficient evidence to support the claim that there
is a linear correlation between shoe print length and heights of males.
With the test statistic, r = 0.591.
The critical values of r = ± 0.878 are found in Table A-6 with n = 5 and α = 0.05.
We fail to reject the null and conclude there is not sufficient evidence to
support the claim that there is a linear correlation between shoe print length
and heights of males.
Use the 5 pairs of shoe print lengths and heights to predict the height of
a person with a shoe print length of 29 cm.
The regression line does not fit the points well. The correlation is r = 0.591,
which suggests there is not a linear correlation (the p-value was 0.2937).
From StatCrunch,
The best predicted height is simply the mean of the sample heights:
177.3 cmy 
Use the 40 pairs of shoe print
lengths from Data Set 2 in
Appendix B to predict the height
of a person with a shoe print
length of 29 cm.
Now, the regression line does fit
the points well, and the
correlation of r = 0.813 suggests
that there is a linear correlation
since the p-value is < 0.0001.
The regression equation and scatterplot are shown below:
The given shoe length of 29 cm is not beyond the scope of the
available data, so substitute in 29 cm into the regression model:
A person with a shoe length of 29 cm is predicted to be 174.3 cm tall.
Using StatCrunch,
 
ˆ 80.9 3.22
80.9 3.22 29
174.3 cm
y x 
 

What if we have two or more explanatory variables?
Do we have a method for this? YES!! Of course, we do.
We may want to predict a sea turtle’s lifespan by more variables than
simply length of shell! I want to use variables that account for the diet,
exercise, mental health, and captivity status of the turtle! What about
variables that also account for the water quality in the turtle’s
surrounding environment? I want that too!!!
To do this, we can use something known as Multiple Linear Regression!

Weitere ähnliche Inhalte

Was ist angesagt?

What is a partial correlation?
What is a partial correlation?What is a partial correlation?
What is a partial correlation?Ken Plummer
 
Regression Analysis - Thiyagu
Regression Analysis - ThiyaguRegression Analysis - Thiyagu
Regression Analysis - ThiyaguThiyagu K
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAnil Pokhrel
 
Mpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlationMpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlationVasant Kothari
 
Karl pearson's correlation
Karl pearson's correlationKarl pearson's correlation
Karl pearson's correlationfairoos1
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regressiondessybudiyanti
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regressionalok tiwari
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
Spearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguSpearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguThiyagu K
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r squareAkhilesh Joshi
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence IntervalFarhan Alfin
 

Was ist angesagt? (20)

Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
What is a partial correlation?
What is a partial correlation?What is a partial correlation?
What is a partial correlation?
 
Regression Analysis - Thiyagu
Regression Analysis - ThiyaguRegression Analysis - Thiyagu
Regression Analysis - Thiyagu
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation
CorrelationCorrelation
Correlation
 
Mpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlationMpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlation
 
Karl pearson's correlation
Karl pearson's correlationKarl pearson's correlation
Karl pearson's correlation
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Small sample
Small sampleSmall sample
Small sample
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Correlation
CorrelationCorrelation
Correlation
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Spearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguSpearman Rank Correlation - Thiyagu
Spearman Rank Correlation - Thiyagu
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r square
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence Interval
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 

Andere mochten auch

Correlation Statistics
Correlation StatisticsCorrelation Statistics
Correlation Statisticstahmid rashid
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regressionjasondroesch
 
Scatter diagrams and correlation
Scatter diagrams and correlationScatter diagrams and correlation
Scatter diagrams and correlationkeithpeter
 
scatter diagram
 scatter diagram scatter diagram
scatter diagramshrey8916
 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & RegressionGrant Heller
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionAzmi Mohd Tamil
 
Formation evaluation and well log correlation
Formation evaluation and well log correlationFormation evaluation and well log correlation
Formation evaluation and well log correlationSwapnil Pal
 
Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Harve Abella
 
correlation_and_covariance
correlation_and_covariancecorrelation_and_covariance
correlation_and_covarianceEkta Doger
 
What is a Point Biserial Correlation?
What is a Point Biserial Correlation?What is a Point Biserial Correlation?
What is a Point Biserial Correlation?Ken Plummer
 

Andere mochten auch (12)

Correlation Statistics
Correlation StatisticsCorrelation Statistics
Correlation Statistics
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Scatter diagram
Scatter diagramScatter diagram
Scatter diagram
 
Scatter diagrams and correlation
Scatter diagrams and correlationScatter diagrams and correlation
Scatter diagrams and correlation
 
scatter diagram
 scatter diagram scatter diagram
scatter diagram
 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & Regression
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear Regression
 
Formation evaluation and well log correlation
Formation evaluation and well log correlationFormation evaluation and well log correlation
Formation evaluation and well log correlation
 
Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)
 
Correlation ppt...
Correlation ppt...Correlation ppt...
Correlation ppt...
 
correlation_and_covariance
correlation_and_covariancecorrelation_and_covariance
correlation_and_covariance
 
What is a Point Biserial Correlation?
What is a Point Biserial Correlation?What is a Point Biserial Correlation?
What is a Point Biserial Correlation?
 

Ähnlich wie Correlation in Statistics

Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionRione Drevale
 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression pptSantosh Bhaskar
 
Scatterplots, Correlation, and Regression
Scatterplots, Correlation, and RegressionScatterplots, Correlation, and Regression
Scatterplots, Correlation, and RegressionLong Beach City College
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysisFarzad Javidanrad
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docxEXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docxgitagrimston
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptxIloveBepis
 
Comparing the methods of Estimation of Three-Parameter Weibull distribution
Comparing the methods of Estimation of Three-Parameter Weibull distributionComparing the methods of Estimation of Three-Parameter Weibull distribution
Comparing the methods of Estimation of Three-Parameter Weibull distributionIOSRJM
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdfSathyaNarayanan47813
 

Ähnlich wie Correlation in Statistics (20)

Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
rugs koco.pptx
rugs koco.pptxrugs koco.pptx
rugs koco.pptx
 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression ppt
 
Scatterplots, Correlation, and Regression
Scatterplots, Correlation, and RegressionScatterplots, Correlation, and Regression
Scatterplots, Correlation, and Regression
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
R nonlinear least square
R   nonlinear least squareR   nonlinear least square
R nonlinear least square
 
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docxEXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
 
Statistical analysis in SPSS_
Statistical analysis in SPSS_ Statistical analysis in SPSS_
Statistical analysis in SPSS_
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
Correlation
CorrelationCorrelation
Correlation
 
Comparing the methods of Estimation of Three-Parameter Weibull distribution
Comparing the methods of Estimation of Three-Parameter Weibull distributionComparing the methods of Estimation of Three-Parameter Weibull distribution
Comparing the methods of Estimation of Three-Parameter Weibull distribution
 
Correlation.pdf
Correlation.pdfCorrelation.pdf
Correlation.pdf
 
Artificial Intelligence (Unit - 8).pdf
Artificial Intelligence   (Unit  -  8).pdfArtificial Intelligence   (Unit  -  8).pdf
Artificial Intelligence (Unit - 8).pdf
 
Statistics ppt
Statistics pptStatistics ppt
Statistics ppt
 

Mehr von Avjinder (Avi) Kaler

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerAvjinder (Avi) Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with KerasAvjinder (Avi) Kaler
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningAvjinder (Avi) Kaler
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfAvjinder (Avi) Kaler
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsAvjinder (Avi) Kaler
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Avjinder (Avi) Kaler
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Avjinder (Avi) Kaler
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesAvjinder (Avi) Kaler
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RAvjinder (Avi) Kaler
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsAvjinder (Avi) Kaler
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RAvjinder (Avi) Kaler
 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Avjinder (Avi) Kaler
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Avjinder (Avi) Kaler
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerAvjinder (Avi) Kaler
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experimentAvjinder (Avi) Kaler
 

Mehr von Avjinder (Avi) Kaler (20)

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
 
Population genetics
Population geneticsPopulation genetics
Population genetics
 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
 

Kürzlich hochgeladen

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Kürzlich hochgeladen (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Correlation in Statistics

  • 2.  Linear correlation coefficient, r, is a number that measures how well paired sample data fit a straight-line pattern when graphed.  Using paired sample data (sometimes called bivariate data), we find the value of r (usually using technology), then we use that value to conclude that there is (or is not) a linear correlation between the two variables.  In this section we will consider only linear relationships, which means that when graphed, the points approximate a straight-line pattern.  We will discuss methods of hypothesis testing for correlation.
  • 3. Correlation – a correlation exists between two variables when the values of one variable are somehow associated with the values of the other variable. • Can be positive, negative, non-existent, or non-linear • A linear correlation exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.
  • 4. We can often see a relationship between two variables by constructing a scatterplot. Scatter plots of paired data
  • 5.
  • 6. 1. The sample of paired data is a Simple Random Sample of quantitative data 2. The pairs of data ( 𝑥,𝑦) have a bivariate normal distribution, meaning the following: • Visual examination of the scatter plot(s) confirms that the sample points follow an approximately straight line(s) • Because results can be strongly affected by the presence of outliers, any outliers should be removed if they are known to be errors (Note: Use caution when removing data points) Note: These are the same as the Requirements for Simple Linear Regression.
  • 7. • Linear Correlation Coefficient (𝑟) – measures the strength of the linear correlation between the paired quantitative 𝑥 and 𝑦 values in a sample • Also known as the Pearson Product Moment Correlation Coefficient in honor of Karl Pearson • This is a Sample Statistic of the correlation that is linear between 𝑥 and 𝑦 • If this value is squared, the value is the Coefficient of Determination ( 𝑟2) • Notation: • 𝑟 : linear correlation coefficient for sample data • 𝜌 : linear correlation coefficient for a population of paired data • Formula for calculating 𝑟: 𝑟 = 𝑛 Σ𝑥𝑦 − Σ𝑥 ∗(Σ𝑦) 𝑛(Σ𝑥2)− Σ𝑥 2∗ 𝑛(Σ𝑦2)− Σ𝑦 2
  • 8. 1. – 1 ≤ r ≤ 1 2. If all values of either variable are converted to a different scale, the value of r does not change. 3. The value of r is not affected by the choice of x and y. 4. r measures strength of a linear relationship. 5. r is very sensitive to outliers • A single outlier can dramatically affect the value of r
  • 9. • The value of 𝑟2 is the proportion of the variation in 𝑦 that is explained by the linear relationship that exists between 𝑥 and 𝑦 • Thus, 𝑟2 is, also, the amount of variation in 𝑦 that is explained by the regression line itself • We may use 𝑟2 to describe the predictive power of the regression equation
  • 10. • To conclude that correlation implies causality • Using data based on averages • This type of data causes an inflated correlation coefficient • To conclude that if there is no linear correlation, there is no correlation at all
  • 11. Hypotheses: 𝐻0: 𝜌 = 0 𝐻1: 𝜌 ≠ 0 (𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛) (𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠) These hypotheses can be equivalently tested with the following hypotheses: 𝐻0: 𝛽1 = 0 𝐻1: 𝛽1 ≠ 0 (𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛) (𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠) Note: This equivalence will be important for the interpretation of the technological output.
  • 12. • Use Critical Value from Table A-6 (this is a simpler approach) and think of the Linear Correlation Coefficient (𝑟) as a ‘test statistic’ • OR, use the following t-score test statistic with 𝑑𝑓 = 𝑛 − 2 𝑡 = 𝑟 1 − 𝑟2 𝑛 − 2 • This 𝑡 test statistic can be viewed in most technological output corresponding to the test of significance for the slope in the regression line (i.e. the second set of equivalent hypotheses listed above)
  • 13. • If using critical values from Table A-6: 𝐼𝑓 𝑟 > 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝐼𝑓 𝑟 ≤ 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒, 𝑓𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 This test is obviously a two-tailed hypothesis test based on the alternative hypothesis. Visualize this by plotting the possible values for 𝑟 on a number line with a labeled critical region. • If using the t-score test statistic: • Use statistical software to calculate the correct p-value that corresponds with the test statistic. Then, base the conclusion on comparison between the p-value and 𝛼
  • 14. One-tailed tests can occur with a claim of a positive linear correlation or a claim of a negative linear correlation. In such cases, the hypotheses will be as shown here. For these one-tailed tests, the P-value method can be used as well.
  • 15. • Construct a scatter plot and verify that the pattern of the points is approximately a straight line pattern without outliers • Assess the linear correlation between two variables of interest and create a regression equation • Consider any effects of a pattern over time • Perform a Residual Analysis: • Construct a residual plot and verify that there is no pattern (other than a straight line pattern) and also verify that the residual plot does not become thicker or thinner • Use a histogram, normal quantile plot, or Shapiro Wilk test of normality to confirm that the values of the residuals have a distribution that is approximately normal
  • 16. • Measurement Error – could be described as ‘explainable’ outliers • Nonlinear Associations – ignoring possible nonlinear relationships • Extrapolation – predicting far beyond the scope of our available data
  • 17. The paired shoe / height data from five males are listed below. Using StatCrunch, find the value of the correlation coefficient r.
  • 18. Requirement Check: The data are a simple random sample of quantitative data, the plotted points appear to roughly approximate a straight-line pattern, and there are no outliers.
  • 19. A few technologies are displayed below, used to calculate the value of r.
  • 20. We found previously for the shoe and height example that r = 0.591. With r = 0.591, we get r2 = 0.349. We conclude that about 34.9% of the variation in height can be explained by the linear relationship between lengths of shoe prints and heights.
  • 21. Conduct a formal hypothesis test of the claim that there is a linear correlation between the two variables. Use a 0.05 significance level.
  • 22. We test the claim: 𝐻0: 𝛽1 = 0 𝐻1: 𝛽1 ≠ 0 (𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛) (𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠)
  • 23. We calculate the test statistic: Table A-3 shows this test statistic yields a p-value that is greater than 0.20. 2 2 0.591 1.269 1 1 0.591 2 5 2 r t r n       
  • 24. StatCrunch provides a P-value of 0.2937. Because the p-value of 0.2937 is greater than the significance level of 0.05, we fail to reject the null hypothesis. We conclude there is not sufficient evidence to support the claim that there is a linear correlation between shoe print length and heights of males.
  • 25. With the test statistic, r = 0.591. The critical values of r = ± 0.878 are found in Table A-6 with n = 5 and α = 0.05. We fail to reject the null and conclude there is not sufficient evidence to support the claim that there is a linear correlation between shoe print length and heights of males.
  • 26. Use the 5 pairs of shoe print lengths and heights to predict the height of a person with a shoe print length of 29 cm. The regression line does not fit the points well. The correlation is r = 0.591, which suggests there is not a linear correlation (the p-value was 0.2937). From StatCrunch, The best predicted height is simply the mean of the sample heights: 177.3 cmy 
  • 27. Use the 40 pairs of shoe print lengths from Data Set 2 in Appendix B to predict the height of a person with a shoe print length of 29 cm. Now, the regression line does fit the points well, and the correlation of r = 0.813 suggests that there is a linear correlation since the p-value is < 0.0001.
  • 28. The regression equation and scatterplot are shown below:
  • 29. The given shoe length of 29 cm is not beyond the scope of the available data, so substitute in 29 cm into the regression model: A person with a shoe length of 29 cm is predicted to be 174.3 cm tall. Using StatCrunch,   ˆ 80.9 3.22 80.9 3.22 29 174.3 cm y x    
  • 30. What if we have two or more explanatory variables? Do we have a method for this? YES!! Of course, we do. We may want to predict a sea turtle’s lifespan by more variables than simply length of shell! I want to use variables that account for the diet, exercise, mental health, and captivity status of the turtle! What about variables that also account for the water quality in the turtle’s surrounding environment? I want that too!!! To do this, we can use something known as Multiple Linear Regression!