SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Multiple Linear Regression
Terminologies
Introduction & Example
Standard input/tuning parameters & Sample UI
Sample output UI
Interpretation of Output
Limitations
Business use cases
What is
covered
Terminologies
• Predictors and Target variable :
• Target variable usually denoted by Y , is the variable being predicted and is also
called dependent variable, output variable, response variable or outcome variable
• Predictor, usually denoted by X , sometimes called an independent or explanatory
variable, is a variable that is being used to predict the target variable
• Correlation :
• Correlation is a statistical measure that indicates the extent to which two variables
fluctuate together
• Upper & Lower N% confidence intervals:
• A confidence interval is a statistical measure for saying, "I am pretty sure the true
value of a number I am approximating is within this range with n% confidence
INTRODUCTION
• OBJECTIVE :
• It is a statistical technique that attempts to explore
the relationship between two or more variables ( Xi
and Y)
• BENEFIT :
• Regression model output helps identify important
factors ( Xi ) impacting the dependent variable (Y)
and also the nature of relationship between each
of these factors and dependent variable
• MODEL :
• Linear regression model equation takes the form of
Y=𝛽0+𝛽i Xi +𝜀𝑖 as shown in image in right :
Example: Multiple linear regression
Temperature Humidity Yield
50 57 112
53 54 118
54 54 128
55 60 121
56 66 125
59 59 136
62 61 144
65 58 142
67 59 149
71 64 161
72 56 167
74 66 168
75 52 162
76 68 171
79 52 175
80 62 182
Input
data
Output
Regression Statistics
R Square 0.98
Coefficients P-value Lower 95% Upper 95%
Intercept -5.14 0.68 -31.49 21.21
Temperature 2.19 0.00 1.99 2.40
Humidity 0.15 0.44 -0.26 0.57
Model is a good fit
as R square > 0.7
• P value for Temperature is <0.05 ;
• Hence Temperature is an important factor
for predicting Yield
• But p value for Humidity is >0.05 which
means Humidity is not impacting Yield
significantly
• With one unit increase in
Temperature there is 2 times
increase in Yield
• Coefficient of
Temperature will be
between 1.99 and 2.40
with 95% confidence (5 %
chance of error)
Let’s conduct the Multiple linear regression analysis on independent variables : Temperature & Humidity and target
variable : Yield as shown below:
Note : Intercept is not an important statistics for checking the relation between X & Y
Independent
variables (Xi)
Target
Variable (Y)
Standard input/tuning parameters & Sample UI
Select the predictors
Temperature
Humidity
Yield
Pressure range
Step
1
Step 3
Step size =1
Number of Iterations = 100
Step
2
Display the output window
containing following :
o Model summary
o Line fit plot
o Normal probability plot
o Residual versus Fit plot
Step 4
Note :
 Categorical predictors should be auto detected & converted to dummy/binary variables before applying regression
 Decision on selection of predictors depends on the business knowledge and the correlation value between target
variable and predictors , those with significant positive/negative correlation with Y should be included in model
 Thumb rule for number of predictors is, it should be at most (total number of observations / 20)
By default these parameters should
be set with the values mentioned
Select the target variable
Temperature
Humidity
Yield
Pressure range
More than one
predictors can be
selected
Sample output : 1. Model Summary
Regression Statistics
R Square 0.98
P-value :
o It is used to evaluate whether the corresponding predictor X has any significant impact on the target
variable Y
o As p –value for temperature here is < 0.05 (highlighted in red font in table above) , temperature has
significant relation with Yield
o In contrast, p value for Humidity is >0.05 which makes it insignificant for predicting Yield
Value of a temperature coefficient
lies between 1.99 and 2.4 with 95%
confidence
 R square : It shows the goodness of fit of the model. It lies between 0 to
1 and closer this value to 1, better the model
Coefficient:
o It shows the magnitude as well as direction of impact of predictors (temperature and humidity in this
case) on a target variable Y (Yield)
o For example , in this case , with one unit increase in temperature, there is ‘2.19 unit increase’ in Yield
( yield increases 2 times with one unit increase in Temperature)
Check Interpretation section for more details
Coefficients P-value Lower 95% Upper 95%
Intercept -5.14 0.68 -31.49 21.21
Temperature 2.19 0.00 1.99 2.40
Humidity 0.15 0.44 -0.26 0.57
P value for ANOVA test : 0.02
 Anova p- value : It indicates whether one of the coefficients is
significant in the model , only if p value is <0.05 should the further
model interpretation be made
Line fit plots are used to check the assumption
of linearity between each Xi & Y
Normal Probability plot is used to check the
assumption of normality & to detect outliers
Residual plot is used to check the assumption
of equal error variances & outliers
Sample Output : 2. Plots
Check Interpretation section for more details
 In case of non linearity between any Xi and Y, transformations can be applied on Xi to make it linearly
correlated to Y or else that particular variable has to be dropped from the input into model building
Interpretation of Important Model Summary
Statistics
Multiple R :
•R > 0.7 represents a strong
positive correlation
between X and Y
•0.4 < = R < 0.7 represents a
weak positive correlation
between X and Y
•0 <= R < 0.4 represents a
negligible/no correlation
between X and Y
•-0.4 < = R < -0.7 represents
a weak negative
correlation between X and
Y
•R < - 0.7 represents a
strong negative correlation
between X and Y
R Square :
•R square > 0.7 represents a
very good model i.e. model
is able to explain 70%
variability in Y
•R square between 0 to 0.7
represents a model not fit
well and assumptions of
normality and linearity
should be checked for
better fitment of a model
P value :
•At 95% confidence
threshold , if p-value for a
predictor X is <0.05 then X
is a significant/important
predictor
•At 95% confidence
threshold , if p-value for a
predictor X is >0.05 then X
is an
insignificant/unimportant
predictor i.e. it doesn’t
have significant relation
with target variable Y
Coefficients :
•It indicates with how much
magnitude the output
variable will change with
one unit change in X
•For example, if coefficient
of X is 2 then Y will
increase 2 times with one
unit increase in X
•If coefficient of X is -2
then Y will decrease 2
times with one unit
increase in X
Interpretation of plots
: Line Fit plot
This plot is used to plot the relationship between
each Xi (predictor) & Y (target variable) with Y
on y axis and each Xi on x axis
As shown in the figure1 in right, as
temperature(X) increases, so does the Yield(Y),
hence there is a linear relationship between X
and Y and linear regression is applicable on this
data
If line doesn’t display linearity as shown in
figures 2 & 3 in right then transformation can be
applied on that particular variable before
proceeding with model building
If data transformation doesn’t help then either
that variable(Xi) can be dropped from the
analysis or non linear model should be chosen
depending on the distribution pattern of scatter
plot
Figure 1
Figure 2
Figure 3
Interpretation of plots
: Normal Probability
plot
This plots the percentile vs. variable (Xi or Y)
distribution
It is used to check the assumptions of
normality and outliers in data
It can be helpful to add the trend line to see
whether the variable fits a straight line
The plot in figure 1 shows that the pattern of
dots in the plot lies close to a straight line;
Therefore, the variable is normally
distributed and there are no outliers
Examples of non normal data are shown in
figure 2 &3 in right and example of outliers is
shown in figure 4 :
Figure 1
Figure 2
Figure 3
Figure 4
Interpretation of plots
: Residual versus Fit
plot
It is the scattered plot of standardized residuals
on Y axis and predicted (fitted) values on X axis
It is used to detect the unequal residual
variances and outliers in data
Here are the characteristics of a well-behaved
residual vs. fits plot :
The residuals should "bounce randomly" around
the 0 line and should roughly form a "horizontal
band" around the 0 line as shown in figure 1.
This suggests that the variances of the error
terms are equal
No one residual should "stand out" from the
basic random pattern of residuals. This suggests
that there are no outliers
For example the red data point in figure 1 is an
outlier, such outliers should be removed from
data before proceeding with model
interpretation
Figure 1
Figure 2
 Plots shown in figures 2 & 3 above depict
Errors in particular range of predicted
target – unequally distributed error of
predicted vs actual target, which is not
desirable for linear regression analysis
Figure 3
Limitations
Linear regression is limited to predicting numeric output i.e. dependent variable
has to be numeric in nature
The minimum sample size should be at least 20 cases for each predictor.
Two or more predictors which are highly correlated to each other should be
removed before running the regression model. For example, sales data
containing price in INR and price in USD as predictors, we should remove one of
these column from predictors.
This method is applicable only when linearity between each Xi and Y is reflected
in the data, and this linearity can be checked through the Line fit plot, which is a
scatter plot between each Xi and Y as described in the Interpretation section.
Limitations
Target/independent variables should be normally
distributed
Note: A normal distribution is an arrangement of a
dataset in which most values are in the middle of
the range and the rest taper off symmetrically
toward either extreme. It will look like a bell curve
as shown in figure 1 in right
Outliers in data (target as well as independent
variables) can affect the analysis, hence outliers
need to be removed
Note: Outliers are the observations lying outside
overall pattern of distribution as shown in figure 2
in right
These extreme values/outliers can be replaced with
1st or 99th percentile values to improve the model
accuracy
Outliers
Figure 1
Figure 2
Business use case 1
• Business problem :
• An ecommerce company wants to measure the impact of product price, product promotions, presence of
festive season etc. on product sales
• Input data:
• Predictor/independent variables:
• Product price data
• Product promotions data such as discounts
• Flag representing presence/absence of festive season
• Dependent variable : Product sales data
• Business benefit:
• Product sales manager will get to know which among the predictors included in the analysis have significant
impact on product sales
• For the impactful predictors , important strategic decisions can be made to meet the targeted product sales
• For instance, if promotions and festive seasons turn out to be significant factors, each with positive coefficient
then these factors should be given more focus while devising a marketing strategy to improve sales as they
are directly affecting the sales in a positive way
Business use case 2
• Business problem :
• An agriculture production firm wants to predict the impact of amount of rainfall , humidity ,
temperature etc. on the yield of particular crop
• Input data:
• Predictor/independent variables :
• Amount of rainfall during monsoon months
• Humidity levels/measurements
• Temperature measurements
• Dependent variable : Crop production
• Business benefit:
• An agriculture firm can understand the impact of each of these predictors on target variable
• For instance , if temperature and rain fall have positive significant impact but Humidity levels
have negative significant impact on crop yield then crop production can be done in high
temperature and rain fall levels in conjunction with low humidity levels in order to produce
the desired crop yield
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
December 2020

Weitere ähnliche Inhalte

Was ist angesagt?

Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisShailendra Tomar
 
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Smarten Augmented Analytics
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easyWeam Banjar
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...Smarten Augmented Analytics
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...Smarten Augmented Analytics
 
Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-Alamin Milton
 
Simple linear regression and correlation
Simple linear regression and correlationSimple linear regression and correlation
Simple linear regression and correlationShakeel Nouman
 
Regression & It's Types
Regression & It's TypesRegression & It's Types
Regression & It's TypesMehul Boricha
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysisNimrita Koul
 

Was ist angesagt? (20)

Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
 
Chapter 14
Chapter 14 Chapter 14
Chapter 14
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
 
Chapter13
Chapter13Chapter13
Chapter13
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-Econometrics chapter 5-two-variable-regression-interval-estimation-
Econometrics chapter 5-two-variable-regression-interval-estimation-
 
Simple linear regression and correlation
Simple linear regression and correlationSimple linear regression and correlation
Simple linear regression and correlation
 
Regression & It's Types
Regression & It's TypesRegression & It's Types
Regression & It's Types
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
 

Ähnlich wie What is Multiple Linear Regression and How Can it be Helpful for Business Analysis?

manecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptxmanecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptxasdfg hjkl
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficientMuhamamdZiaSamad
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5saark
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regressionvinovk
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
Chapter 13 (1).pdf
Chapter 13 (1).pdfChapter 13 (1).pdf
Chapter 13 (1).pdfOyonFaiyaz
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSSParag Shah
 
Presentation 4
Presentation 4Presentation 4
Presentation 4uliana8
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdflisow86669
 

Ähnlich wie What is Multiple Linear Regression and How Can it be Helpful for Business Analysis? (20)

Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
manecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptxmanecohuhuhuhubasicEstimation-1.pptx
manecohuhuhuhubasicEstimation-1.pptx
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
 
OLS chapter
OLS chapterOLS chapter
OLS chapter
 
data analysis
data analysisdata analysis
data analysis
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
SPSS
SPSSSPSS
SPSS
 
12 rhl gta
12 rhl gta12 rhl gta
12 rhl gta
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Statistical analysis in SPSS_
Statistical analysis in SPSS_ Statistical analysis in SPSS_
Statistical analysis in SPSS_
 
Chapter 13 (1).pdf
Chapter 13 (1).pdfChapter 13 (1).pdf
Chapter 13 (1).pdf
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
Demand Estimation
Demand EstimationDemand Estimation
Demand Estimation
 
Presentation 4
Presentation 4Presentation 4
Presentation 4
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 

Mehr von Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?Smarten Augmented Analytics
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...Smarten Augmented Analytics
 
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?Smarten Augmented Analytics
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?Smarten Augmented Analytics
 
What is the Decision Tree Analysis and How Does it Help a Business to Analyze...
What is the Decision Tree Analysis and How Does it Help a Business to Analyze...What is the Decision Tree Analysis and How Does it Help a Business to Analyze...
What is the Decision Tree Analysis and How Does it Help a Business to Analyze...Smarten Augmented Analytics
 

Mehr von Smarten Augmented Analytics (19)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
 
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
 
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?
 
What is the Decision Tree Analysis and How Does it Help a Business to Analyze...
What is the Decision Tree Analysis and How Does it Help a Business to Analyze...What is the Decision Tree Analysis and How Does it Help a Business to Analyze...
What is the Decision Tree Analysis and How Does it Help a Business to Analyze...
 

Kürzlich hochgeladen

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 

Kürzlich hochgeladen (20)

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 

What is Multiple Linear Regression and How Can it be Helpful for Business Analysis?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Terminologies Introduction & Example Standard input/tuning parameters & Sample UI Sample output UI Interpretation of Output Limitations Business use cases What is covered
  • 4. Terminologies • Predictors and Target variable : • Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable • Predictor, usually denoted by X , sometimes called an independent or explanatory variable, is a variable that is being used to predict the target variable • Correlation : • Correlation is a statistical measure that indicates the extent to which two variables fluctuate together • Upper & Lower N% confidence intervals: • A confidence interval is a statistical measure for saying, "I am pretty sure the true value of a number I am approximating is within this range with n% confidence
  • 5. INTRODUCTION • OBJECTIVE : • It is a statistical technique that attempts to explore the relationship between two or more variables ( Xi and Y) • BENEFIT : • Regression model output helps identify important factors ( Xi ) impacting the dependent variable (Y) and also the nature of relationship between each of these factors and dependent variable • MODEL : • Linear regression model equation takes the form of Y=𝛽0+𝛽i Xi +𝜀𝑖 as shown in image in right :
  • 6. Example: Multiple linear regression Temperature Humidity Yield 50 57 112 53 54 118 54 54 128 55 60 121 56 66 125 59 59 136 62 61 144 65 58 142 67 59 149 71 64 161 72 56 167 74 66 168 75 52 162 76 68 171 79 52 175 80 62 182 Input data Output Regression Statistics R Square 0.98 Coefficients P-value Lower 95% Upper 95% Intercept -5.14 0.68 -31.49 21.21 Temperature 2.19 0.00 1.99 2.40 Humidity 0.15 0.44 -0.26 0.57 Model is a good fit as R square > 0.7 • P value for Temperature is <0.05 ; • Hence Temperature is an important factor for predicting Yield • But p value for Humidity is >0.05 which means Humidity is not impacting Yield significantly • With one unit increase in Temperature there is 2 times increase in Yield • Coefficient of Temperature will be between 1.99 and 2.40 with 95% confidence (5 % chance of error) Let’s conduct the Multiple linear regression analysis on independent variables : Temperature & Humidity and target variable : Yield as shown below: Note : Intercept is not an important statistics for checking the relation between X & Y Independent variables (Xi) Target Variable (Y)
  • 7. Standard input/tuning parameters & Sample UI Select the predictors Temperature Humidity Yield Pressure range Step 1 Step 3 Step size =1 Number of Iterations = 100 Step 2 Display the output window containing following : o Model summary o Line fit plot o Normal probability plot o Residual versus Fit plot Step 4 Note :  Categorical predictors should be auto detected & converted to dummy/binary variables before applying regression  Decision on selection of predictors depends on the business knowledge and the correlation value between target variable and predictors , those with significant positive/negative correlation with Y should be included in model  Thumb rule for number of predictors is, it should be at most (total number of observations / 20) By default these parameters should be set with the values mentioned Select the target variable Temperature Humidity Yield Pressure range More than one predictors can be selected
  • 8. Sample output : 1. Model Summary Regression Statistics R Square 0.98 P-value : o It is used to evaluate whether the corresponding predictor X has any significant impact on the target variable Y o As p –value for temperature here is < 0.05 (highlighted in red font in table above) , temperature has significant relation with Yield o In contrast, p value for Humidity is >0.05 which makes it insignificant for predicting Yield Value of a temperature coefficient lies between 1.99 and 2.4 with 95% confidence  R square : It shows the goodness of fit of the model. It lies between 0 to 1 and closer this value to 1, better the model Coefficient: o It shows the magnitude as well as direction of impact of predictors (temperature and humidity in this case) on a target variable Y (Yield) o For example , in this case , with one unit increase in temperature, there is ‘2.19 unit increase’ in Yield ( yield increases 2 times with one unit increase in Temperature) Check Interpretation section for more details Coefficients P-value Lower 95% Upper 95% Intercept -5.14 0.68 -31.49 21.21 Temperature 2.19 0.00 1.99 2.40 Humidity 0.15 0.44 -0.26 0.57 P value for ANOVA test : 0.02  Anova p- value : It indicates whether one of the coefficients is significant in the model , only if p value is <0.05 should the further model interpretation be made
  • 9. Line fit plots are used to check the assumption of linearity between each Xi & Y Normal Probability plot is used to check the assumption of normality & to detect outliers Residual plot is used to check the assumption of equal error variances & outliers Sample Output : 2. Plots Check Interpretation section for more details  In case of non linearity between any Xi and Y, transformations can be applied on Xi to make it linearly correlated to Y or else that particular variable has to be dropped from the input into model building
  • 10. Interpretation of Important Model Summary Statistics Multiple R : •R > 0.7 represents a strong positive correlation between X and Y •0.4 < = R < 0.7 represents a weak positive correlation between X and Y •0 <= R < 0.4 represents a negligible/no correlation between X and Y •-0.4 < = R < -0.7 represents a weak negative correlation between X and Y •R < - 0.7 represents a strong negative correlation between X and Y R Square : •R square > 0.7 represents a very good model i.e. model is able to explain 70% variability in Y •R square between 0 to 0.7 represents a model not fit well and assumptions of normality and linearity should be checked for better fitment of a model P value : •At 95% confidence threshold , if p-value for a predictor X is <0.05 then X is a significant/important predictor •At 95% confidence threshold , if p-value for a predictor X is >0.05 then X is an insignificant/unimportant predictor i.e. it doesn’t have significant relation with target variable Y Coefficients : •It indicates with how much magnitude the output variable will change with one unit change in X •For example, if coefficient of X is 2 then Y will increase 2 times with one unit increase in X •If coefficient of X is -2 then Y will decrease 2 times with one unit increase in X
  • 11. Interpretation of plots : Line Fit plot This plot is used to plot the relationship between each Xi (predictor) & Y (target variable) with Y on y axis and each Xi on x axis As shown in the figure1 in right, as temperature(X) increases, so does the Yield(Y), hence there is a linear relationship between X and Y and linear regression is applicable on this data If line doesn’t display linearity as shown in figures 2 & 3 in right then transformation can be applied on that particular variable before proceeding with model building If data transformation doesn’t help then either that variable(Xi) can be dropped from the analysis or non linear model should be chosen depending on the distribution pattern of scatter plot Figure 1 Figure 2 Figure 3
  • 12. Interpretation of plots : Normal Probability plot This plots the percentile vs. variable (Xi or Y) distribution It is used to check the assumptions of normality and outliers in data It can be helpful to add the trend line to see whether the variable fits a straight line The plot in figure 1 shows that the pattern of dots in the plot lies close to a straight line; Therefore, the variable is normally distributed and there are no outliers Examples of non normal data are shown in figure 2 &3 in right and example of outliers is shown in figure 4 : Figure 1 Figure 2 Figure 3 Figure 4
  • 13. Interpretation of plots : Residual versus Fit plot It is the scattered plot of standardized residuals on Y axis and predicted (fitted) values on X axis It is used to detect the unequal residual variances and outliers in data Here are the characteristics of a well-behaved residual vs. fits plot : The residuals should "bounce randomly" around the 0 line and should roughly form a "horizontal band" around the 0 line as shown in figure 1. This suggests that the variances of the error terms are equal No one residual should "stand out" from the basic random pattern of residuals. This suggests that there are no outliers For example the red data point in figure 1 is an outlier, such outliers should be removed from data before proceeding with model interpretation Figure 1 Figure 2  Plots shown in figures 2 & 3 above depict Errors in particular range of predicted target – unequally distributed error of predicted vs actual target, which is not desirable for linear regression analysis Figure 3
  • 14. Limitations Linear regression is limited to predicting numeric output i.e. dependent variable has to be numeric in nature The minimum sample size should be at least 20 cases for each predictor. Two or more predictors which are highly correlated to each other should be removed before running the regression model. For example, sales data containing price in INR and price in USD as predictors, we should remove one of these column from predictors. This method is applicable only when linearity between each Xi and Y is reflected in the data, and this linearity can be checked through the Line fit plot, which is a scatter plot between each Xi and Y as described in the Interpretation section.
  • 15. Limitations Target/independent variables should be normally distributed Note: A normal distribution is an arrangement of a dataset in which most values are in the middle of the range and the rest taper off symmetrically toward either extreme. It will look like a bell curve as shown in figure 1 in right Outliers in data (target as well as independent variables) can affect the analysis, hence outliers need to be removed Note: Outliers are the observations lying outside overall pattern of distribution as shown in figure 2 in right These extreme values/outliers can be replaced with 1st or 99th percentile values to improve the model accuracy Outliers Figure 1 Figure 2
  • 16. Business use case 1 • Business problem : • An ecommerce company wants to measure the impact of product price, product promotions, presence of festive season etc. on product sales • Input data: • Predictor/independent variables: • Product price data • Product promotions data such as discounts • Flag representing presence/absence of festive season • Dependent variable : Product sales data • Business benefit: • Product sales manager will get to know which among the predictors included in the analysis have significant impact on product sales • For the impactful predictors , important strategic decisions can be made to meet the targeted product sales • For instance, if promotions and festive seasons turn out to be significant factors, each with positive coefficient then these factors should be given more focus while devising a marketing strategy to improve sales as they are directly affecting the sales in a positive way
  • 17. Business use case 2 • Business problem : • An agriculture production firm wants to predict the impact of amount of rainfall , humidity , temperature etc. on the yield of particular crop • Input data: • Predictor/independent variables : • Amount of rainfall during monsoon months • Humidity levels/measurements • Temperature measurements • Dependent variable : Crop production • Business benefit: • An agriculture firm can understand the impact of each of these predictors on target variable • For instance , if temperature and rain fall have positive significant impact but Humidity levels have negative significant impact on crop yield then crop production can be done in high temperature and rain fall levels in conjunction with low humidity levels in order to produce the desired crop yield
  • 18. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com December 2020