SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Exploring Bivariate Data
Bivariate Data
 Analyzing patterns in scatterplots
 Correlation and linearity
 Least-squares regression line
 Residual plots, outliers, and influential points
 Transformations to achieve linearity:
logarithmic and power transformations
Scatterplots
 The most effective way to display the
relationship between two quantitative
variables.
 The values of one variable appear on the
horizontal axis, and the values of the
other variable appear on the vertical axis.
Each individual in the data appears as
the point in the plot fixed by the values of
both variables for that individual.
Scatterplot Variables
 Response Variable - measures the outcome
of a study. (The dependent variable, plotted
on the y-axis).
 Explanatory or Predictor Variable – helps
explain or predict changes in a response
variable. (The independent variable, plotted
on the x-axis.)
Example:
 If you think that alcohol causes body
temperature to increase, you might do a study
giving certain amounts of alcohol to mice, and
measuring the temperature drops.
 In this case the explanatory variable is the
amount of alcohol and the response variable
is the measured temperature drop.
There are two ways of determining
whether two variables are related:
1) By looking at a scatter plot (graphical
approach)
2) By calculating a “correlation coefficient”
(mathematical approach)
How to Make a Scatterplot
1. Decide which
variable should
go on each axis.
2. Label and scale
your axes.
3. Plot individual
data values.
Interpreting a Scatterplot
 In any graph of data, look for the overall
pattern and for striking deviations from that
pattern.
 You can describe the overall pattern of a
scatterplot by the form, direction and strength
of the relationship.
 An important kind of deviation is an outlier, an
individual value that falls outside the overall
pattern of the relationship.
Positive Linear Association
No Association
Clusters
Clusters of points
within the plot
can indicate the
presence of
another variable.
The scatterplot
on the right
shows two clear
clusters—one
near 2 minutes;
the other
between 4 – 5
minutes.
Gaps
Gaps are regions
(values) of the
explanatory variable
that have no
associated response
measurements.
The scatterplot on the
right shows a gap
between 600,00 and
80,000 white blood
cells (and probably
another between
80,000 and 100,000).
Correlation Coefficient (r)
 The correlation coefficient (r ) measures the
strength of the linear relationship between two
quantitative variables.
 Gives a numerical description of the strength
and direction of the linear association between
two variables.
r =
1
n −1
xi − x
sx





∑
yi − y
sy






Properties of r
 r is always a number
between -1 and 1
 r > 0 indicates a positive
association.
 r < 0 indicates a negative
association.
 Values of r near 0 indicate
a very weak linear
relationship.
 The strength of the linear
relationship increases as r
moves away from 0
towards -1 or 1.
 The extreme values r = -1
and r = 1 occur only in the
case of a perfect linear
relationship.
Correlation ≠ Causation
 Whenever we have a strong correlation, it is tempting to
explain it by imagining that the expanatory variable has
caused the response to help.
 A variable that is not explicitly part of a study but affects
the way the variables in the study appear to be related is
called a lurking variable.
 Because we can never be certain that observational data
are not hiding a lurking variable, it is never safe to
conclude that a scatterplot demonstrates a cause-and-
effect relationship, no matter how strong the correlation.
 Scatterplots and correlation coefficients never prove
causation.
Least-Squares Regression
(LSRL)
Least Squares Regression (linear regression) allows
you to fit a line to a scatter diagram in order to be
able to predict what the value of one variable will be
based on the value of another variable.
a: y intercept
b: slope of the linebxay +=ˆ
Regression Line
• A regression line is a
straight line that
describes how a
response variable y
changes as an
explanatory variable
x changes.
• We often use the
regression line for
predicting the value
of y for a given value
of x.
Interpreting a Regression Line
 The way the line is fitted to the data is through a
process called the method of least squares. The
main idea behind this method is that the square of
the vertical distance between each data point and
the line is minimized.
 The least squares regression line is a mathematical
model for the data that helps us predict values of
the response (dependant) variable from the
explanatory (independent) variable. Therefore,
with regression, unlike with correlation, we must
specify which is the response and which is the
explanatory variable.
Formulas for finding the slope and
y-intercept in a linear regression line:
slope y-intercept
a = y - bx
b1 = r
sy
sx
When will we ever need this?
 We use regression lines to make predictions.
 Interpolation – making predictions within
known data values.
 Extrapolation – making predictions beyond
known data values.
How good is our prediction?
The strength of a prediction which uses the LSRL
depends on how close the data points are to the
regression line. The mathematical approach to
describing this strength is via the coefficient of
determination. The coefficient of determination
gives us the proportion of variation in the values
of y that is explained by least-squares regression
of y on x. The coefficient of determination turns
out to be the correlation coefficient squared (r²).
Residuals
 Since the LSRL minimized the vertical distance between
the data values and a trend line we have a special name
for these vertical distances. They are called residuals.
 A residual is simply the difference between the
observed y and the predicted y.
Residual Plots
 Residuals help us
determine how well
our data can be
modeled by a straight
line, by enabling us to
construct a residual
plot.
 A residual plot is a
scatter diagram that
plots the residuals on
the y-axis and their
corresponding x
values on the x-axis.
INTERPRETING RESIDUAL PLOTS:
The following residual plot is in a curved
pattern and shows that the relationship is not
linear. A straight line is not a good summary
for such data.
INTERPRETING RESIDUAL PLOTS:
Increasing or decreasing spread about the line as
x increases indicates that prediction of y will be
less accurate for larger x as shown in this residual
plot.
INTERPRETING RESIDUAL PLOTS:
The following shows a residual plot that has a
uniform scatter of points about the fitted line
with no unusual observations. This tells us that
our linear model (regression line) will give us a
good prediction of the data.
Unusual and Influential Data
Outliers
Outlier: A value in a set of data that does not fit with the rest of
the data
Leverage
- An observation with an extreme value on a predictor variable.
• Leverage is a measure of how far an independent variable
deviates from its mean.
• These leverage points can have an effect on the estimate of
regression coefficients.
Influence
- Influence can be thought of as the product of leverage and
outlierness.
• Removing the observation substantially changes the
estimate of coefficients.
Outliers
 Data points more than 2
standard deviations away
from the mean of the data
set
 Data points that do not fit
the pattern governed by
the rest of the data
 In regression, any data
point that has an unusually
large residual
How can I tell if a point
in my data set is an
outlier?
• Take the IQR (interquartile
range) of your data set and
multiply it by 1.5. Subtract
that number from Quartile
1 and then from Quartile 3.
Any number lying outside
these points can be
considered an outlier.
Influential Points
 Influential points are normally outliers in the X
direction, but are not always outliers in terms of
regression
 A point is said to influence the data if it is
responsible for changes to the LSR line.
 Any point that has leverage on a set of data is
an influential point

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
quartiles,deciles,percentiles.ppt
quartiles,deciles,percentiles.pptquartiles,deciles,percentiles.ppt
quartiles,deciles,percentiles.ppt
 
Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervals
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Quartile
QuartileQuartile
Quartile
 
discrete and continuous data
discrete and continuous datadiscrete and continuous data
discrete and continuous data
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
SCATTER PLOTS
SCATTER PLOTSSCATTER PLOTS
SCATTER PLOTS
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Correlations using SPSS
Correlations using SPSSCorrelations using SPSS
Correlations using SPSS
 
Topic 15 correlation spss
Topic 15 correlation spssTopic 15 correlation spss
Topic 15 correlation spss
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Probability
ProbabilityProbability
Probability
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
 

Andere mochten auch

Standard deviation and variation
Standard deviation and variationStandard deviation and variation
Standard deviation and variationAditya Singh
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3Phong Vo
 
Intro probability 1
Intro probability 1Intro probability 1
Intro probability 1Phong Vo
 
Lecture slides stats1.13.l07.air
Lecture slides stats1.13.l07.airLecture slides stats1.13.l07.air
Lecture slides stats1.13.l07.airatutor_te
 
Attractive ppt on Hypothesis by ammara aftab
Attractive ppt on Hypothesis by ammara aftabAttractive ppt on Hypothesis by ammara aftab
Attractive ppt on Hypothesis by ammara aftabUniversity of Karachi
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4Phong Vo
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2Phong Vo
 
Standard deviation
Standard deviationStandard deviation
Standard deviationRahul Sharma
 
Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1Debra Wallace
 
Sampling and Sampling Distributions
Sampling and Sampling DistributionsSampling and Sampling Distributions
Sampling and Sampling DistributionsBk Islam Mumitul
 
Sampling distribution concepts
Sampling distribution conceptsSampling distribution concepts
Sampling distribution conceptsumar sheikh
 
Introduction to hypothesis testing ppt @ bec doms
Introduction to hypothesis testing ppt @ bec domsIntroduction to hypothesis testing ppt @ bec doms
Introduction to hypothesis testing ppt @ bec domsBabasab Patil
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributionsmandalina landy
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsBabasab Patil
 
scatter diagram
 scatter diagram scatter diagram
scatter diagramshrey8916
 

Andere mochten auch (20)

Transversals
TransversalsTransversals
Transversals
 
Standard deviation and variation
Standard deviation and variationStandard deviation and variation
Standard deviation and variation
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3
 
Intro probability 1
Intro probability 1Intro probability 1
Intro probability 1
 
Lecture slides stats1.13.l07.air
Lecture slides stats1.13.l07.airLecture slides stats1.13.l07.air
Lecture slides stats1.13.l07.air
 
Probability And Random Variable Lecture(Lec8)
Probability And Random Variable Lecture(Lec8)Probability And Random Variable Lecture(Lec8)
Probability And Random Variable Lecture(Lec8)
 
Attractive ppt on Hypothesis by ammara aftab
Attractive ppt on Hypothesis by ammara aftabAttractive ppt on Hypothesis by ammara aftab
Attractive ppt on Hypothesis by ammara aftab
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1
 
Histogram
HistogramHistogram
Histogram
 
Sampling and Sampling Distributions
Sampling and Sampling DistributionsSampling and Sampling Distributions
Sampling and Sampling Distributions
 
Sampling distribution concepts
Sampling distribution conceptsSampling distribution concepts
Sampling distribution concepts
 
Introduction to hypothesis testing ppt @ bec doms
Introduction to hypothesis testing ppt @ bec domsIntroduction to hypothesis testing ppt @ bec doms
Introduction to hypothesis testing ppt @ bec doms
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributions
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
scatter diagram
 scatter diagram scatter diagram
scatter diagram
 
Attribution theory
Attribution theoryAttribution theory
Attribution theory
 
ANOVA II
ANOVA IIANOVA II
ANOVA II
 

Ähnlich wie Exploring bivariate data

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notesJapheth Muthama
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
Linear regression
Linear regressionLinear regression
Linear regressionDepEd
 
Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011Lauren Crosby
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regressionnszakir
 
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfMSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfSuchita Rawat
 

Ähnlich wie Exploring bivariate data (20)

IDS.pdf
IDS.pdfIDS.pdf
IDS.pdf
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Regression
RegressionRegression
Regression
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
Correlation analysis notes
Correlation analysis notesCorrelation analysis notes
Correlation analysis notes
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
 
Notes Ch8
Notes Ch8Notes Ch8
Notes Ch8
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Correlation
CorrelationCorrelation
Correlation
 
Math n Statistic
Math n StatisticMath n Statistic
Math n Statistic
 
Chap04 01
Chap04 01Chap04 01
Chap04 01
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011Stats 3000 Week 2 - Winter 2011
Stats 3000 Week 2 - Winter 2011
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Regression -Linear.pptx
Regression -Linear.pptxRegression -Linear.pptx
Regression -Linear.pptx
 
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfMSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
 
Chapter05
Chapter05Chapter05
Chapter05
 

Mehr von Ulster BOCES

Sampling distributions
Sampling distributionsSampling distributions
Sampling distributionsUlster BOCES
 
Geometric distributions
Geometric distributionsGeometric distributions
Geometric distributionsUlster BOCES
 
Binomial distributions
Binomial distributionsBinomial distributions
Binomial distributionsUlster BOCES
 
Means and variances of random variables
Means and variances of random variablesMeans and variances of random variables
Means and variances of random variablesUlster BOCES
 
General probability rules
General probability rulesGeneral probability rules
General probability rulesUlster BOCES
 
Planning and conducting surveys
Planning and conducting surveysPlanning and conducting surveys
Planning and conducting surveysUlster BOCES
 
Overview of data collection methods
Overview of data collection methodsOverview of data collection methods
Overview of data collection methodsUlster BOCES
 
Normal probability plot
Normal probability plotNormal probability plot
Normal probability plotUlster BOCES
 
Exploring data stemplot
Exploring data   stemplotExploring data   stemplot
Exploring data stemplotUlster BOCES
 
Exploring data other plots
Exploring data   other plotsExploring data   other plots
Exploring data other plotsUlster BOCES
 
Exploring data histograms
Exploring data   histogramsExploring data   histograms
Exploring data histogramsUlster BOCES
 
Calculating percentages from z scores
Calculating percentages from z scoresCalculating percentages from z scores
Calculating percentages from z scoresUlster BOCES
 
Standardizing scores
Standardizing scoresStandardizing scores
Standardizing scoresUlster BOCES
 
Intro to statistics
Intro to statisticsIntro to statistics
Intro to statisticsUlster BOCES
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbersUlster BOCES
 
Displaying quantitative data
Displaying quantitative dataDisplaying quantitative data
Displaying quantitative dataUlster BOCES
 

Mehr von Ulster BOCES (20)

Sampling means
Sampling meansSampling means
Sampling means
 
Sampling distributions
Sampling distributionsSampling distributions
Sampling distributions
 
Geometric distributions
Geometric distributionsGeometric distributions
Geometric distributions
 
Binomial distributions
Binomial distributionsBinomial distributions
Binomial distributions
 
Means and variances of random variables
Means and variances of random variablesMeans and variances of random variables
Means and variances of random variables
 
Simulation
SimulationSimulation
Simulation
 
General probability rules
General probability rulesGeneral probability rules
General probability rules
 
Planning and conducting surveys
Planning and conducting surveysPlanning and conducting surveys
Planning and conducting surveys
 
Overview of data collection methods
Overview of data collection methodsOverview of data collection methods
Overview of data collection methods
 
Normal probability plot
Normal probability plotNormal probability plot
Normal probability plot
 
Exploring data stemplot
Exploring data   stemplotExploring data   stemplot
Exploring data stemplot
 
Exploring data other plots
Exploring data   other plotsExploring data   other plots
Exploring data other plots
 
Exploring data histograms
Exploring data   histogramsExploring data   histograms
Exploring data histograms
 
Calculating percentages from z scores
Calculating percentages from z scoresCalculating percentages from z scores
Calculating percentages from z scores
 
Density curve
Density curveDensity curve
Density curve
 
Standardizing scores
Standardizing scoresStandardizing scores
Standardizing scores
 
Intro to statistics
Intro to statisticsIntro to statistics
Intro to statistics
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbers
 
Displaying quantitative data
Displaying quantitative dataDisplaying quantitative data
Displaying quantitative data
 
A.2 se and sd
A.2 se  and sdA.2 se  and sd
A.2 se and sd
 

Kürzlich hochgeladen

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Kürzlich hochgeladen (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Exploring bivariate data

  • 2. Bivariate Data  Analyzing patterns in scatterplots  Correlation and linearity  Least-squares regression line  Residual plots, outliers, and influential points  Transformations to achieve linearity: logarithmic and power transformations
  • 3. Scatterplots  The most effective way to display the relationship between two quantitative variables.  The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.
  • 4. Scatterplot Variables  Response Variable - measures the outcome of a study. (The dependent variable, plotted on the y-axis).  Explanatory or Predictor Variable – helps explain or predict changes in a response variable. (The independent variable, plotted on the x-axis.)
  • 5. Example:  If you think that alcohol causes body temperature to increase, you might do a study giving certain amounts of alcohol to mice, and measuring the temperature drops.  In this case the explanatory variable is the amount of alcohol and the response variable is the measured temperature drop.
  • 6. There are two ways of determining whether two variables are related: 1) By looking at a scatter plot (graphical approach) 2) By calculating a “correlation coefficient” (mathematical approach)
  • 7. How to Make a Scatterplot 1. Decide which variable should go on each axis. 2. Label and scale your axes. 3. Plot individual data values.
  • 8. Interpreting a Scatterplot  In any graph of data, look for the overall pattern and for striking deviations from that pattern.  You can describe the overall pattern of a scatterplot by the form, direction and strength of the relationship.  An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of the relationship.
  • 11. Clusters Clusters of points within the plot can indicate the presence of another variable. The scatterplot on the right shows two clear clusters—one near 2 minutes; the other between 4 – 5 minutes.
  • 12. Gaps Gaps are regions (values) of the explanatory variable that have no associated response measurements. The scatterplot on the right shows a gap between 600,00 and 80,000 white blood cells (and probably another between 80,000 and 100,000).
  • 13. Correlation Coefficient (r)  The correlation coefficient (r ) measures the strength of the linear relationship between two quantitative variables.  Gives a numerical description of the strength and direction of the linear association between two variables. r = 1 n −1 xi − x sx      ∑ yi − y sy      
  • 14. Properties of r  r is always a number between -1 and 1  r > 0 indicates a positive association.  r < 0 indicates a negative association.  Values of r near 0 indicate a very weak linear relationship.  The strength of the linear relationship increases as r moves away from 0 towards -1 or 1.  The extreme values r = -1 and r = 1 occur only in the case of a perfect linear relationship.
  • 15. Correlation ≠ Causation  Whenever we have a strong correlation, it is tempting to explain it by imagining that the expanatory variable has caused the response to help.  A variable that is not explicitly part of a study but affects the way the variables in the study appear to be related is called a lurking variable.  Because we can never be certain that observational data are not hiding a lurking variable, it is never safe to conclude that a scatterplot demonstrates a cause-and- effect relationship, no matter how strong the correlation.  Scatterplots and correlation coefficients never prove causation.
  • 16. Least-Squares Regression (LSRL) Least Squares Regression (linear regression) allows you to fit a line to a scatter diagram in order to be able to predict what the value of one variable will be based on the value of another variable. a: y intercept b: slope of the linebxay +=ˆ
  • 17. Regression Line • A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. • We often use the regression line for predicting the value of y for a given value of x.
  • 18. Interpreting a Regression Line  The way the line is fitted to the data is through a process called the method of least squares. The main idea behind this method is that the square of the vertical distance between each data point and the line is minimized.  The least squares regression line is a mathematical model for the data that helps us predict values of the response (dependant) variable from the explanatory (independent) variable. Therefore, with regression, unlike with correlation, we must specify which is the response and which is the explanatory variable.
  • 19. Formulas for finding the slope and y-intercept in a linear regression line: slope y-intercept a = y - bx b1 = r sy sx
  • 20. When will we ever need this?  We use regression lines to make predictions.  Interpolation – making predictions within known data values.  Extrapolation – making predictions beyond known data values.
  • 21. How good is our prediction? The strength of a prediction which uses the LSRL depends on how close the data points are to the regression line. The mathematical approach to describing this strength is via the coefficient of determination. The coefficient of determination gives us the proportion of variation in the values of y that is explained by least-squares regression of y on x. The coefficient of determination turns out to be the correlation coefficient squared (r²).
  • 22. Residuals  Since the LSRL minimized the vertical distance between the data values and a trend line we have a special name for these vertical distances. They are called residuals.  A residual is simply the difference between the observed y and the predicted y.
  • 23. Residual Plots  Residuals help us determine how well our data can be modeled by a straight line, by enabling us to construct a residual plot.  A residual plot is a scatter diagram that plots the residuals on the y-axis and their corresponding x values on the x-axis.
  • 24. INTERPRETING RESIDUAL PLOTS: The following residual plot is in a curved pattern and shows that the relationship is not linear. A straight line is not a good summary for such data.
  • 25. INTERPRETING RESIDUAL PLOTS: Increasing or decreasing spread about the line as x increases indicates that prediction of y will be less accurate for larger x as shown in this residual plot.
  • 26. INTERPRETING RESIDUAL PLOTS: The following shows a residual plot that has a uniform scatter of points about the fitted line with no unusual observations. This tells us that our linear model (regression line) will give us a good prediction of the data.
  • 27. Unusual and Influential Data Outliers Outlier: A value in a set of data that does not fit with the rest of the data Leverage - An observation with an extreme value on a predictor variable. • Leverage is a measure of how far an independent variable deviates from its mean. • These leverage points can have an effect on the estimate of regression coefficients. Influence - Influence can be thought of as the product of leverage and outlierness. • Removing the observation substantially changes the estimate of coefficients.
  • 28. Outliers  Data points more than 2 standard deviations away from the mean of the data set  Data points that do not fit the pattern governed by the rest of the data  In regression, any data point that has an unusually large residual How can I tell if a point in my data set is an outlier? • Take the IQR (interquartile range) of your data set and multiply it by 1.5. Subtract that number from Quartile 1 and then from Quartile 3. Any number lying outside these points can be considered an outlier.
  • 29. Influential Points  Influential points are normally outliers in the X direction, but are not always outliers in terms of regression  A point is said to influence the data if it is responsible for changes to the LSR line.  Any point that has leverage on a set of data is an influential point