SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Multiple regression in R on Automobile data to predict
Gasoline Mileage
Rachana T. Bhatia - Rutgers University
 Basics of Regression Analysis
 Addressing Model Deviation of regression models
 Model selection criterion
 Types of regression Model
 Introduction to R
 Multiple regression (including polynomial regression) on Car Data
Rachana T. Bhatia - Rutgers University
 First step to learn predictive modelling
 Statistical technique for investigating and modeling the relationship between
variables
 Equation of straight line 𝒚 = 𝜷 𝟎 + 𝜷 𝟏 𝒙 + 𝜺
 𝜺 is a random variable that accounts for the failure of the model to fit the data
 𝒙 explanatory variable & 𝑦 response variable
 Regression does not necessarily imply causality
Rachana T. Bhatia - Rutgers University
Linear Regression Analysis 5th edition Montgomery, Peck & ViningRachana T. Bhatia - Rutgers University
 Least squares estimation- minimize the sum of squares of the
differences between the observed response, yi, and the straight line
 Fitted values & Residuals
 Hypothesis Testing for the value of slope and intercept - T-tests
 Significant relation between the variables- reject Null Hypothesis
 Alternative approach : p-value
 Confidence Interval – associated with randomness of the data
 Prediction Interval – associated with the random variable yet to be
observed.
Rachana T. Bhatia - Rutgers University
 Linearity
 Homoscedasticity
 Errors normally distributed (for inferential purposes)
 Independent
 Constant variance
 There is a probability distribution for y at each value of x
with mean: E 𝑌 𝑥 = β0 + β1 𝑥
Variance: Var 𝑌 𝑥 = σ2
Rachana T. Bhatia - Rutgers University
 Looking at the scatter plot
 Q-Q plot – Quantiles of the residuals vs normal distribution
 Residual plot – Residuals Vs Explanatory variable
Rachana T. Bhatia - Rutgers University
 Correctable non-linearity (simple and monotone )
 Non-Correctable linearity
Rachana T. Bhatia - Rutgers University
 Define a new variable u as 𝑢 = 𝑒 𝑥
Rachana T. Bhatia - Rutgers University
Some common transformations are:
v = ln(y)
v = p √y where p > 1 v = 1/y p where p > 0
Rachana T. Bhatia - Rutgers University
 How well a statistical model fits observed data
 How much of the total variation in Y is described by the variation in the
explanatory variables
 square of the sample correlation of the response variable and the explanatory
variable
 Lies between -∞ to 1
 Adjusted R-squared- adjusted for the number of coefficients in the model relative
to the sample size in order to correct it for bias
Rachana T. Bhatia - Rutgers University
 Mean Square Error
 Coefficient of Determination - R2
 Adjusted R2
 AIC (Akaike’s Information Criterion) - smaller values are better
 BIC (Bayesian Information Criterion) - smaller values are better
Rachana T. Bhatia - Rutgers University
 LEVERAGE – ‘standardized’ measure the distance of the ith observation abscissa from
the mean of the explanatory variables
 DFBETAS - standardized measures how much estimation of βj is influenced by the ith
observation.
 DFFITS - standardized measures how much estimation of ith fitted value is influenced
by the ith observation
 COOK’S Distance -standardized measure of the distance between the fitted values
obtained using the whole sample and the fitted values obtained after removing the jth
observation
Rachana T. Bhatia - Rutgers University
 Simple linear Model
 Polynomial regression – relationship is not linear
 Multiple linear Model – more than one explanatory variables- Categorical Data
 Robust regression (Least Absolute Deviations, Huber/ Bisquare function) - Data
contaminated with outliers
 Logistic regression – Response variable Binary (Logit and Probit link function)
 Ridge Regression – High multicollinearity
 Step wise regression – High dimensions (Forward selection/ backward elimination)
Rachana T. Bhatia - Rutgers University
 A power tool for statistics and data modeling
 R is free
 R is a language
 Graphics and data visualization
 A flexible statistical analysis toolkit
 R Studio - an Integrated Development Environment (IDE) for the R programming
language.
Rachana T. Bhatia - Rutgers University
 Setting the working directory
 Installing packages, updating and loading the packages
 Importing and Converting Data
 Creating vectors, data frames
 Connection to the outside world(file, gzfile,bzfile, url)
 Atomic classes of vectors : integer • numeric • character • complex • logical
Rachana T. Bhatia - Rutgers University
 Data Frames (tabular data)-stores different class of
objects{read.table/read.csv/data.frame)
 Analogous code for writing the data
 Foreign package (read.xport, read.spss )
 Reading larger data sets (Specifying the column classes)
 Inspect objects/dataframes
 Missing Values (Na / NaN)
Rachana T. Bhatia - Rutgers University
 Exploratory Analysis
 Subsetting (using [], [[]],$ )
 which.max/ which.min
 Handling missing values (complete.cases(), is.na…, na.rm = T)
 Splitting
 Apply , sapply, tapply, mapply
 Descriptive Analysis
 Summary()
 Str()
 Sd(), var(), median() , quantile(), hist()
 By(), table()
 Statistical test- t test , chi square test
Rachana T. Bhatia - Rutgers University
 Variation in gasoline mileage among makes and models of automobiles is
influenced substantially by the size of the vehicle and its engine.
 Downloaded from http://lib.stat.cmu.edu/DASL/Datafiles/carmpgdat.html
 Variable Names:
 VOL: Cubic feet of cab space
 HP: Engine horsepower
 MPG: Average miles per gallon (Response Variable)
 SP: Top speed (mph)
 WT: Vehicle weight (100 lb)
Rachana T. Bhatia - Rutgers University
 Prof. Andrew Magyar - Stat 563 - Introduction to Linear Regression_Course
Material
 Linear Regression Analysis 5th edition Montgomery, Peck & Vining
 http://www.ats.ucla.edu/stat/stata/dae/rreg.htm
 https://www.coursera.org/learn/r-programming/home/welcome
Rachana T. Bhatia - Rutgers University

Weitere ähnliche Inhalte

Was ist angesagt?

Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysisRabin BK
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm Sammer Qader
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsTransweb Global Inc
 
Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-Alamin Milton
 
Simple linear regression analysis
Simple linear  regression analysisSimple linear  regression analysis
Simple linear regression analysisNorma Mingo
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysisFarzad Javidanrad
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easyWeam Banjar
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regressionjasondroesch
 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression AnalysisJ P Verma
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
Regression analysis by akanksha Bali
Regression analysis by akanksha BaliRegression analysis by akanksha Bali
Regression analysis by akanksha BaliAkanksha Bali
 

Was ist angesagt? (19)

Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | Eonomics
 
Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-
 
Simple linear regression analysis
Simple linear  regression analysisSimple linear  regression analysis
Simple linear regression analysis
 
Regression
RegressionRegression
Regression
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression Analysis
 
Regression
Regression Regression
Regression
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Regression analysis by akanksha Bali
Regression analysis by akanksha BaliRegression analysis by akanksha Bali
Regression analysis by akanksha Bali
 
Regression
RegressionRegression
Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Andere mochten auch

Fitting polynomial data
Fitting polynomial dataFitting polynomial data
Fitting polynomial dataBart Lauwers
 
Introduction to Regression Analysis
Introduction to Regression AnalysisIntroduction to Regression Analysis
Introduction to Regression AnalysisSzilveszter Molnár
 
Übersicht Glm Workshop 2009
Übersicht Glm Workshop 2009Übersicht Glm Workshop 2009
Übersicht Glm Workshop 2009Mark Heckmann
 
Neural networks1
Neural networks1Neural networks1
Neural networks1Mohan Raj
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingVictor Ordu
 
The Aviation Insurance Industry Presentation (1)
The Aviation Insurance Industry Presentation (1)The Aviation Insurance Industry Presentation (1)
The Aviation Insurance Industry Presentation (1)Oliver Culley
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with ScalaJorge Paez
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsArchana M
 
An Introduction to Neural Networks and Machine Learning
An Introduction to Neural Networks and Machine LearningAn Introduction to Neural Networks and Machine Learning
An Introduction to Neural Networks and Machine LearningChris Nicholls
 
Lecture 30 introduction to logic
Lecture 30 introduction to logicLecture 30 introduction to logic
Lecture 30 introduction to logicHema Kashyap
 
Presentation on forecasting
Presentation on forecasting Presentation on forecasting
Presentation on forecasting Muhammad Sharjeel
 
Discrete mathematics by Seerat Abbas khan
Discrete mathematics by Seerat Abbas khanDiscrete mathematics by Seerat Abbas khan
Discrete mathematics by Seerat Abbas khanSeerat Abbas Khan
 
Coefficient of correlation...ppt
Coefficient of correlation...pptCoefficient of correlation...ppt
Coefficient of correlation...pptRahul Dhaker
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMohsin Ul Haq
 

Andere mochten auch (20)

Fitting polynomial data
Fitting polynomial dataFitting polynomial data
Fitting polynomial data
 
Introduction to Regression Analysis
Introduction to Regression AnalysisIntroduction to Regression Analysis
Introduction to Regression Analysis
 
Übersicht Glm Workshop 2009
Übersicht Glm Workshop 2009Übersicht Glm Workshop 2009
Übersicht Glm Workshop 2009
 
Neural networks1
Neural networks1Neural networks1
Neural networks1
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Simulation presentation
Simulation presentationSimulation presentation
Simulation presentation
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
The Aviation Insurance Industry Presentation (1)
The Aviation Insurance Industry Presentation (1)The Aviation Insurance Industry Presentation (1)
The Aviation Insurance Industry Presentation (1)
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with Scala
 
R programming language
R programming languageR programming language
R programming language
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
An Introduction to Neural Networks and Machine Learning
An Introduction to Neural Networks and Machine LearningAn Introduction to Neural Networks and Machine Learning
An Introduction to Neural Networks and Machine Learning
 
Lecture 30 introduction to logic
Lecture 30 introduction to logicLecture 30 introduction to logic
Lecture 30 introduction to logic
 
Presentation on forecasting
Presentation on forecasting Presentation on forecasting
Presentation on forecasting
 
Discrete math Truth Table
Discrete math Truth TableDiscrete math Truth Table
Discrete math Truth Table
 
Discrete mathematics by Seerat Abbas khan
Discrete mathematics by Seerat Abbas khanDiscrete mathematics by Seerat Abbas khan
Discrete mathematics by Seerat Abbas khan
 
Coefficient of correlation...ppt
Coefficient of correlation...pptCoefficient of correlation...ppt
Coefficient of correlation...ppt
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
Portales de actividades
Portales de actividadesPortales de actividades
Portales de actividades
 
Cv Denice
Cv DeniceCv Denice
Cv Denice
 

Ähnlich wie Introduction to Regression Analysis and R

SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 
Accounting serx
Accounting serxAccounting serx
Accounting serxzeer1234
 
Accounting serx
Accounting serxAccounting serx
Accounting serxzeer1234
 
Tensor decompositions for medical analytics
Tensor decompositions for medical analyticsTensor decompositions for medical analytics
Tensor decompositions for medical analyticsColleen Farrelly
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"James Neill
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
 
BasicStatistics.pdf
BasicStatistics.pdfBasicStatistics.pdf
BasicStatistics.pdfsweetAI1
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11Bonnie Green
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataDmitry Grapov
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfJermaeDizon2
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -IAkhila Prabhakaran
 

Ähnlich wie Introduction to Regression Analysis and R (20)

SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
 
Tensor decompositions for medical analytics
Tensor decompositions for medical analyticsTensor decompositions for medical analytics
Tensor decompositions for medical analytics
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
Quantitative Data analysis
Quantitative Data analysisQuantitative Data analysis
Quantitative Data analysis
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
BasicStatistics.pdf
BasicStatistics.pdfBasicStatistics.pdf
BasicStatistics.pdf
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
 
TamingStatistics
TamingStatisticsTamingStatistics
TamingStatistics
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
0 introduction
0  introduction0  introduction
0 introduction
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Introduction to Regression Analysis and R

  • 1. Multiple regression in R on Automobile data to predict Gasoline Mileage Rachana T. Bhatia - Rutgers University
  • 2.  Basics of Regression Analysis  Addressing Model Deviation of regression models  Model selection criterion  Types of regression Model  Introduction to R  Multiple regression (including polynomial regression) on Car Data Rachana T. Bhatia - Rutgers University
  • 3.  First step to learn predictive modelling  Statistical technique for investigating and modeling the relationship between variables  Equation of straight line 𝒚 = 𝜷 𝟎 + 𝜷 𝟏 𝒙 + 𝜺  𝜺 is a random variable that accounts for the failure of the model to fit the data  𝒙 explanatory variable & 𝑦 response variable  Regression does not necessarily imply causality Rachana T. Bhatia - Rutgers University
  • 4. Linear Regression Analysis 5th edition Montgomery, Peck & ViningRachana T. Bhatia - Rutgers University
  • 5.  Least squares estimation- minimize the sum of squares of the differences between the observed response, yi, and the straight line  Fitted values & Residuals  Hypothesis Testing for the value of slope and intercept - T-tests  Significant relation between the variables- reject Null Hypothesis  Alternative approach : p-value  Confidence Interval – associated with randomness of the data  Prediction Interval – associated with the random variable yet to be observed. Rachana T. Bhatia - Rutgers University
  • 6.  Linearity  Homoscedasticity  Errors normally distributed (for inferential purposes)  Independent  Constant variance  There is a probability distribution for y at each value of x with mean: E 𝑌 𝑥 = β0 + β1 𝑥 Variance: Var 𝑌 𝑥 = σ2 Rachana T. Bhatia - Rutgers University
  • 7.  Looking at the scatter plot  Q-Q plot – Quantiles of the residuals vs normal distribution  Residual plot – Residuals Vs Explanatory variable Rachana T. Bhatia - Rutgers University
  • 8.  Correctable non-linearity (simple and monotone )  Non-Correctable linearity Rachana T. Bhatia - Rutgers University
  • 9.  Define a new variable u as 𝑢 = 𝑒 𝑥 Rachana T. Bhatia - Rutgers University
  • 10. Some common transformations are: v = ln(y) v = p √y where p > 1 v = 1/y p where p > 0 Rachana T. Bhatia - Rutgers University
  • 11.  How well a statistical model fits observed data  How much of the total variation in Y is described by the variation in the explanatory variables  square of the sample correlation of the response variable and the explanatory variable  Lies between -∞ to 1  Adjusted R-squared- adjusted for the number of coefficients in the model relative to the sample size in order to correct it for bias Rachana T. Bhatia - Rutgers University
  • 12.  Mean Square Error  Coefficient of Determination - R2  Adjusted R2  AIC (Akaike’s Information Criterion) - smaller values are better  BIC (Bayesian Information Criterion) - smaller values are better Rachana T. Bhatia - Rutgers University
  • 13.  LEVERAGE – ‘standardized’ measure the distance of the ith observation abscissa from the mean of the explanatory variables  DFBETAS - standardized measures how much estimation of βj is influenced by the ith observation.  DFFITS - standardized measures how much estimation of ith fitted value is influenced by the ith observation  COOK’S Distance -standardized measure of the distance between the fitted values obtained using the whole sample and the fitted values obtained after removing the jth observation Rachana T. Bhatia - Rutgers University
  • 14.  Simple linear Model  Polynomial regression – relationship is not linear  Multiple linear Model – more than one explanatory variables- Categorical Data  Robust regression (Least Absolute Deviations, Huber/ Bisquare function) - Data contaminated with outliers  Logistic regression – Response variable Binary (Logit and Probit link function)  Ridge Regression – High multicollinearity  Step wise regression – High dimensions (Forward selection/ backward elimination) Rachana T. Bhatia - Rutgers University
  • 15.  A power tool for statistics and data modeling  R is free  R is a language  Graphics and data visualization  A flexible statistical analysis toolkit  R Studio - an Integrated Development Environment (IDE) for the R programming language. Rachana T. Bhatia - Rutgers University
  • 16.  Setting the working directory  Installing packages, updating and loading the packages  Importing and Converting Data  Creating vectors, data frames  Connection to the outside world(file, gzfile,bzfile, url)  Atomic classes of vectors : integer • numeric • character • complex • logical Rachana T. Bhatia - Rutgers University
  • 17.  Data Frames (tabular data)-stores different class of objects{read.table/read.csv/data.frame)  Analogous code for writing the data  Foreign package (read.xport, read.spss )  Reading larger data sets (Specifying the column classes)  Inspect objects/dataframes  Missing Values (Na / NaN) Rachana T. Bhatia - Rutgers University
  • 18.  Exploratory Analysis  Subsetting (using [], [[]],$ )  which.max/ which.min  Handling missing values (complete.cases(), is.na…, na.rm = T)  Splitting  Apply , sapply, tapply, mapply  Descriptive Analysis  Summary()  Str()  Sd(), var(), median() , quantile(), hist()  By(), table()  Statistical test- t test , chi square test Rachana T. Bhatia - Rutgers University
  • 19.  Variation in gasoline mileage among makes and models of automobiles is influenced substantially by the size of the vehicle and its engine.  Downloaded from http://lib.stat.cmu.edu/DASL/Datafiles/carmpgdat.html  Variable Names:  VOL: Cubic feet of cab space  HP: Engine horsepower  MPG: Average miles per gallon (Response Variable)  SP: Top speed (mph)  WT: Vehicle weight (100 lb) Rachana T. Bhatia - Rutgers University
  • 20.  Prof. Andrew Magyar - Stat 563 - Introduction to Linear Regression_Course Material  Linear Regression Analysis 5th edition Montgomery, Peck & Vining  http://www.ats.ucla.edu/stat/stata/dae/rreg.htm  https://www.coursera.org/learn/r-programming/home/welcome Rachana T. Bhatia - Rutgers University