SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Refresher on Regression
Analysis
By:
Weam Banjar. DDS., MS in clinical research
Overview:
• Regression Analysis is a technique to find out the relationship between
different variables.
• Regression looks closely into how a dependent variable is affected upon
varying an independent variable while keeping the other independent
variables constant
• Regression analysis is used for prediction and forecasting.
• Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent (target) and independent
variable (s) (predictor).
Advantages:
• Can be used to predict the future: By using the relevant model to a data set,
Regression Analysis can accurately predict a lot of useful information like
Stock Prices, Medical Conditions and even Sentiments of the public
Advantages:
• Can be used to predict the future: By using the relevant model to a data set,
Regression Analysis can accurately predict a lot of useful information like
Stock Prices, Medical Conditions and even Sentiments of the public
• Can be used to back major decisions and policies: Results from
regression analysis adds a scientific backing to a decision or policy and
makes it even more reliable as it likelihood of success is then high.
Advantages:
• Can be used to predict the future: By using the relevant model to a data set,
Regression Analysis can accurately predict a lot of useful information like
Stock Prices, Medical Conditions and even Sentiments of the public
• Can be used to back major decisions and policies: Results from
regression analysis adds a scientific backing to a decision or policy and
makes it even more reliable as it likelihood of success is then high.
• Can correct an error in thinking or disabuse: Sometimes, an anomaly
between the prediction of regression analysis and a decision/thinking can help
correct the fallacy of the decision.
Advantages:
• Can be used to predict the future: By using the relevant model to a data set,
Regression Analysis can accurately predict a lot of useful information like Stock
Prices, Medical Conditions and even Sentiments of the public
• Can be used to back major decisions and policies: Results from regression
analysis adds a scientific backing to a decision or policy and makes it even more
reliable as it likelihood of success is then high.
• Can correct an error in thinking or disabuse: Sometimes, an anomaly between
the prediction of regression analysis and a decision/thinking can help correct the
fallacy of the decision.
• Provides a new perspective: Large data sets realise their potential to provide new
dimensions to a study through the application of Regression Analysis.
How to select the right regression model:
• Data exploration is an inevitable part of building predictive model. It should be
you first step before selecting the right model like identify the relationship and
impact of variables
• To compare the goodness of fit for different models, we can analyse different
metrics like statistical significance of parameters, R-square, Adjusted r-
square, AIC, BIC and error term. Another one is the Mallow’s Cp criterion. This
essentially checks for possible bias in your model, by comparing the model
with all possible submodels (or a careful selection of them).
How to select the right regression model:
• Cross-validation is the best way to evaluate models used for prediction. Here you
divide your data set into two group (train and validate). A simple mean squared
difference between the observed and predicted values give you a measure for the
prediction accuracy.
• If your data set has multiple confounding variables, you should not
choose automatic model selection method because you do not want to put these in
a model at the same time.
• It’ll also depend on your objective. It can occur that a less powerful model is easy to
implement as compared to a highly statistically significant model.
• Regression regularization methods(Lasso, Ridge and ElasticNet) works well in case
of high dimensionality and multicollinearity among the variables in the data set.
How to select the right regression model:
Linear Regression
Overview:
• Linear regression is the most simple regression analysis technique. It is the
most commonly regression analysis mechanism in predictive analysis
• In this technique, the dependent variable is continuous, independent
variable(s) can be continuous or discrete, and nature of regression line is
linear.
• Linear Regression establishes a relationship between dependent variable
(Y) and one or more independent variables (X) using a best fit straight
line (also known as regression line).
Y=a+b*X + e
a is intercept, b is slope of the line and e is error term
Important notes:
• There must be linear relationship between independent and dependent variables
• Multiple regression suffers from multicollinearity, autocorrelation,
heteroskedasticity.
• Linear Regression is very sensitive to Outliers. It can terribly affect the regression
line and eventually the forecasted values.
• Multicollinearity can increase the variance of the coefficient estimates and make the
estimates very sensitive to minor changes in the model. The result is that the
coefficient estimates are unstable
• In case of multiple independent variables, we can go with forward
selection, backward elimination and step wise approach for selection of most
significant independent variables.
• The difference between simple linear regression and multiple linear regression is
that, multiple linear regression has (>1) independent variables, whereas simple
linear regression has only 1 independent variable
Logistic Regression
Overview:
• In Logistic Regression, the dependent variable is binary that is it has two
values. It can have values like True/False or 0/1 or Yes/No
• This model is used to determine the chance whether a dichotomous outcome
depends on one or more free (independent) variables
• Logistic regression is used to find the probability of event=Success and
event=Failure. We should use logistic regression when the dependent variable
is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges
from 0 to 1 and it can represented by following equation.
Differences between linear and logistic
regressions:
• In Logistic Regression, Conditional Distribution y|x is not a Gaussian
distribution but a Bernoulli distribution.
• In Logistic Regression, the predicted outcomes are probabilities determined
through logistic function and they are circumscribed between 0 and 1.
Understanding the concept
p is the probability of presence of the characteristic of interest.
Understanding the concept
p is the probability of presence of the characteristic of interest.
• Since we are working here with a binomial distribution (dependent variable),
we need to choose a link function which is best suited for this distribution.
And, it is logit function. In the equation above, the parameters are chosen to
maximize the likelihood of observing the sample values rather than
minimizing the sum of squared errors (like in ordinary regression).
Important notes:
• It is widely used for classification problems
• Logistic regression doesn’t require linear relationship between dependent and
independent variables. It can handle various types of relationships because it
applies a non-linear log transformation to the predicted odds ratio
• To avoid over fitting and under fitting, we should include all significant
variables. A good approach to ensure this practice is to use a step wise
method to estimate the logistic regression
Important notes:
• It requires large sample sizes because maximum likelihood estimates are
less powerful at low sample sizes than ordinary least square
• The independent variables should not be correlated with each other i.e. no
multi collinearity. However, we have the options to include interaction
effects of categorical variables in the analysis and in the model.
• If the values of dependent variable is ordinal, then it is called as Ordinal
logistic regression
• If dependent variable is multi class then it is known as Multinomial Logistic
regression.
Regression analysis made easy
Regression analysis made easy

Weitere ähnliche Inhalte

Was ist angesagt?

Confidence interval
Confidence intervalConfidence interval
Confidence interval
Dr Renju Ravi
 
The chi square test of indep of categorical variables
The chi square test of indep of categorical variablesThe chi square test of indep of categorical variables
The chi square test of indep of categorical variables
Regent University
 

Was ist angesagt? (20)

Properties of coefficient of correlation
Properties of coefficient of correlationProperties of coefficient of correlation
Properties of coefficient of correlation
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis ppt
 
Regression
RegressionRegression
Regression
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptx
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
 
The chi square test of indep of categorical variables
The chi square test of indep of categorical variablesThe chi square test of indep of categorical variables
The chi square test of indep of categorical variables
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Ähnlich wie Regression analysis made easy

Discriminant Analysis.pptx
Discriminant Analysis.pptxDiscriminant Analysis.pptx
Discriminant Analysis.pptx
GedaSheko
 
Multiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptxMultiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptx
linmaetonares2
 

Ähnlich wie Regression analysis made easy (20)

what is Correlations
what is Correlationswhat is Correlations
what is Correlations
 
Discriminant Analysis.pptx
Discriminant Analysis.pptxDiscriminant Analysis.pptx
Discriminant Analysis.pptx
 
Multiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptxMultiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptx
 
classification of various Multivariate techniques
classification of various Multivariate techniquesclassification of various Multivariate techniques
classification of various Multivariate techniques
 
Chapter 13 Data Analysis Inferential Methods and Analysis of Time Series
Chapter 13 Data Analysis Inferential Methods and Analysis of Time SeriesChapter 13 Data Analysis Inferential Methods and Analysis of Time Series
Chapter 13 Data Analysis Inferential Methods and Analysis of Time Series
 
Dependence Techniques
Dependence Techniques Dependence Techniques
Dependence Techniques
 
Propensity Score Matching Methods
Propensity Score Matching MethodsPropensity Score Matching Methods
Propensity Score Matching Methods
 
logisticregression-120102011227-phpapp01.pptx
logisticregression-120102011227-phpapp01.pptxlogisticregression-120102011227-phpapp01.pptx
logisticregression-120102011227-phpapp01.pptx
 
Generalized Linear Model and it Challenges
Generalized Linear Model and it ChallengesGeneralized Linear Model and it Challenges
Generalized Linear Model and it Challenges
 
Multivariate Variate Techniques
Multivariate Variate TechniquesMultivariate Variate Techniques
Multivariate Variate Techniques
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Research 101: Inferential Quantitative Analysis
Research 101: Inferential Quantitative AnalysisResearch 101: Inferential Quantitative Analysis
Research 101: Inferential Quantitative Analysis
 
validation and verification part 2.pptx
validation and verification part 2.pptxvalidation and verification part 2.pptx
validation and verification part 2.pptx
 
Data-Analysis.pptx
Data-Analysis.pptxData-Analysis.pptx
Data-Analysis.pptx
 
Discriminant analysis-Business Intelligence tool
Discriminant analysis-Business Intelligence toolDiscriminant analysis-Business Intelligence tool
Discriminant analysis-Business Intelligence tool
 
RM MLM PPT March_22nd 2023.pptx
RM MLM PPT March_22nd 2023.pptxRM MLM PPT March_22nd 2023.pptx
RM MLM PPT March_22nd 2023.pptx
 
Datascience
DatascienceDatascience
Datascience
 
datascience.docx
datascience.docxdatascience.docx
datascience.docx
 
computer application in pharmaceutical research
computer application in pharmaceutical researchcomputer application in pharmaceutical research
computer application in pharmaceutical research
 
Log reg pdf.pdf
Log reg pdf.pdfLog reg pdf.pdf
Log reg pdf.pdf
 

Mehr von Weam Banjar

Mehr von Weam Banjar (20)

Research ethics-WMB-July312022.pdf
Research ethics-WMB-July312022.pdfResearch ethics-WMB-July312022.pdf
Research ethics-WMB-July312022.pdf
 
Systematic Review at Glance-WMB-July282022.pdf
Systematic Review at Glance-WMB-July282022.pdfSystematic Review at Glance-WMB-July282022.pdf
Systematic Review at Glance-WMB-July282022.pdf
 
Designing Research Grant Budget
Designing Research Grant Budget Designing Research Grant Budget
Designing Research Grant Budget
 
Research Misconduct
Research MisconductResearch Misconduct
Research Misconduct
 
Levels of irb review july 16 2019
Levels of irb review july 16 2019Levels of irb review july 16 2019
Levels of irb review july 16 2019
 
Writing for publication
Writing for publicationWriting for publication
Writing for publication
 
Sampling techniques
Sampling techniquesSampling techniques
Sampling techniques
 
Risk Comparison
Risk ComparisonRisk Comparison
Risk Comparison
 
Designing a poster for scientific conference
Designing a poster for scientific conferenceDesigning a poster for scientific conference
Designing a poster for scientific conference
 
Clinical care vs Clinical Research
Clinical care vs Clinical ResearchClinical care vs Clinical Research
Clinical care vs Clinical Research
 
Research support services at pnu
Research support services at pnuResearch support services at pnu
Research support services at pnu
 
Contingency table
Contingency tableContingency table
Contingency table
 
Black chapter of biomedical research
Black chapter of biomedical researchBlack chapter of biomedical research
Black chapter of biomedical research
 
Data presentation
Data presentationData presentation
Data presentation
 
Research poster
Research posterResearch poster
Research poster
 
Levels of irb review
Levels of irb reviewLevels of irb review
Levels of irb review
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
Research grant 101
Research grant 101Research grant 101
Research grant 101
 
Research grant understanding the concept
Research grant  understanding the conceptResearch grant  understanding the concept
Research grant understanding the concept
 
Health science research grant program orientation session
Health science research grant program orientation sessionHealth science research grant program orientation session
Health science research grant program orientation session
 

KĂźrzlich hochgeladen

Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
AlinaDevecerski
 

KĂźrzlich hochgeladen (20)

Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
 

Regression analysis made easy

  • 1. Refresher on Regression Analysis By: Weam Banjar. DDS., MS in clinical research
  • 2. Overview: • Regression Analysis is a technique to find out the relationship between different variables. • Regression looks closely into how a dependent variable is affected upon varying an independent variable while keeping the other independent variables constant • Regression analysis is used for prediction and forecasting. • Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor).
  • 3. Advantages: • Can be used to predict the future: By using the relevant model to a data set, Regression Analysis can accurately predict a lot of useful information like Stock Prices, Medical Conditions and even Sentiments of the public
  • 4. Advantages: • Can be used to predict the future: By using the relevant model to a data set, Regression Analysis can accurately predict a lot of useful information like Stock Prices, Medical Conditions and even Sentiments of the public • Can be used to back major decisions and policies: Results from regression analysis adds a scientific backing to a decision or policy and makes it even more reliable as it likelihood of success is then high.
  • 5. Advantages: • Can be used to predict the future: By using the relevant model to a data set, Regression Analysis can accurately predict a lot of useful information like Stock Prices, Medical Conditions and even Sentiments of the public • Can be used to back major decisions and policies: Results from regression analysis adds a scientific backing to a decision or policy and makes it even more reliable as it likelihood of success is then high. • Can correct an error in thinking or disabuse: Sometimes, an anomaly between the prediction of regression analysis and a decision/thinking can help correct the fallacy of the decision.
  • 6. Advantages: • Can be used to predict the future: By using the relevant model to a data set, Regression Analysis can accurately predict a lot of useful information like Stock Prices, Medical Conditions and even Sentiments of the public • Can be used to back major decisions and policies: Results from regression analysis adds a scientific backing to a decision or policy and makes it even more reliable as it likelihood of success is then high. • Can correct an error in thinking or disabuse: Sometimes, an anomaly between the prediction of regression analysis and a decision/thinking can help correct the fallacy of the decision. • Provides a new perspective: Large data sets realise their potential to provide new dimensions to a study through the application of Regression Analysis.
  • 7. How to select the right regression model: • Data exploration is an inevitable part of building predictive model. It should be you first step before selecting the right model like identify the relationship and impact of variables • To compare the goodness of fit for different models, we can analyse different metrics like statistical significance of parameters, R-square, Adjusted r- square, AIC, BIC and error term. Another one is the Mallow’s Cp criterion. This essentially checks for possible bias in your model, by comparing the model with all possible submodels (or a careful selection of them).
  • 8. How to select the right regression model: • Cross-validation is the best way to evaluate models used for prediction. Here you divide your data set into two group (train and validate). A simple mean squared difference between the observed and predicted values give you a measure for the prediction accuracy. • If your data set has multiple confounding variables, you should not choose automatic model selection method because you do not want to put these in a model at the same time. • It’ll also depend on your objective. It can occur that a less powerful model is easy to implement as compared to a highly statistically significant model. • Regression regularization methods(Lasso, Ridge and ElasticNet) works well in case of high dimensionality and multicollinearity among the variables in the data set.
  • 9. How to select the right regression model:
  • 11. Overview: • Linear regression is the most simple regression analysis technique. It is the most commonly regression analysis mechanism in predictive analysis • In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear. • Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line).
  • 12. Y=a+b*X + e a is intercept, b is slope of the line and e is error term
  • 13. Important notes: • There must be linear relationship between independent and dependent variables • Multiple regression suffers from multicollinearity, autocorrelation, heteroskedasticity. • Linear Regression is very sensitive to Outliers. It can terribly affect the regression line and eventually the forecasted values. • Multicollinearity can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable • In case of multiple independent variables, we can go with forward selection, backward elimination and step wise approach for selection of most significant independent variables. • The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable
  • 15. Overview: • In Logistic Regression, the dependent variable is binary that is it has two values. It can have values like True/False or 0/1 or Yes/No • This model is used to determine the chance whether a dichotomous outcome depends on one or more free (independent) variables • Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can represented by following equation.
  • 16. Differences between linear and logistic regressions: • In Logistic Regression, Conditional Distribution y|x is not a Gaussian distribution but a Bernoulli distribution. • In Logistic Regression, the predicted outcomes are probabilities determined through logistic function and they are circumscribed between 0 and 1.
  • 17. Understanding the concept p is the probability of presence of the characteristic of interest.
  • 18. Understanding the concept p is the probability of presence of the characteristic of interest. • Since we are working here with a binomial distribution (dependent variable), we need to choose a link function which is best suited for this distribution. And, it is logit function. In the equation above, the parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors (like in ordinary regression).
  • 19. Important notes: • It is widely used for classification problems • Logistic regression doesn’t require linear relationship between dependent and independent variables. It can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio • To avoid over fitting and under fitting, we should include all significant variables. A good approach to ensure this practice is to use a step wise method to estimate the logistic regression
  • 20. Important notes: • It requires large sample sizes because maximum likelihood estimates are less powerful at low sample sizes than ordinary least square • The independent variables should not be correlated with each other i.e. no multi collinearity. However, we have the options to include interaction effects of categorical variables in the analysis and in the model. • If the values of dependent variable is ordinal, then it is called as Ordinal logistic regression • If dependent variable is multi class then it is known as Multinomial Logistic regression.