SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Intermediate
Regression Topics
  Daniel Gerlanc, Director
    Enplus Advisors Inc
Topics


Abalone Data

Variable Transformation

Simulation for Predictive Inference
http://archive.ics.uci.edu/ml/datasets/Abalone




                   Abalone
Loading the data
>   abalone.path = "~/data/abalone.csv"
>   abalone.cols = c("sex", "length", "diameter", "height", "whole.wt",
+                    "shucked.wt", "viscera.wt", "shell.wt", "rings")
>
>   abalone <- read.csv(abalone.path, sep=",", row.names=NULL,
+                       col.names=abalone.cols)
>   str(abalone)

'data.frame':!
             4177 obs. of 9 variables:
 $ sex       : chr "M" "M" "F" "M" ...
 $ length    : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
 $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
 $ height    : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
 $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ...
 $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
 $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ...
 $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
 $ rings     : int 15 7 9 10 7 8 20 16 9 19 ...
Uses lattice graphics




             Draw pictures
Lattice Plots
> xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".",
       subset=volume < 0.2,
       panel=function(x, y, ...) {
          panel.lmline(x, y, ...)
          panel.xyplot(x, y, ...)
       },
       ylab="rings")


ggplot2 is a newer package that can be used to create similar plots.
Infant    Adult




   Combine groups
Why Transform?


Interpretability

Additive vs. Multiplicative Form

Prediction
Simple Model
> fit.1 <- lm(rings ~ sex + shell.wt, abalone)

> summary(fit.1)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    6.2423    0.0799   78.08   <2e-16 ***
sex            0.9142    0.0984    9.29   <2e-16 ***
shell.wt      12.8581    0.3300   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Centering with z-scores


 Subtract the mean from each input and
 divide by 1 or 2 standard deviations

 Dummy/Proxy variables may be centered as
 well
Center Values
> abalone.adj <- abalone[, c(outcome, predictors)]
for (i in predictors) {
  abalone.adj[[i]] <-
    (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]]))
}

Also look into the ‘scale’ function
Why center?


Interpret coefficients in terms of standard
deviations

Gives a sense of variable importance
Interpretability
> fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj)

> summary(fit.1a)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone.adj)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   9.9337     0.0385 258.33    <2e-16 ***
sex           0.8539     0.0919    9.29   <2e-16 ***
shell.wt      3.5798     0.0919   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Multiple R-squared: 0.406,!
                          Adjusted R-squared: 0.406
F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
Two Models
    lm(formula = rings ~ sex + shell.wt, data = abalone)
                coef.est coef.se
    (Intercept) 6.24      0.08
    sex          0.91     0.10
    shell.wt    12.86     0.33
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



    lm(formula = rings ~ sex + shell.wt, data = abalone.adj)
                coef.est coef.se
    (Intercept) 9.93     0.04
    sex         0.85     0.09
    shell.wt    3.58     0.09
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



Smaller difference in SD terms
Why divide by 2 SDs
So binary variables may be interpreted
similarly to continuous variables

e.g., Binary Value of 0, 1 occurring with equal
frequency has an sd of 0.5.
sqrt(0.5 * (1 - 0.5)) = 0.5

(1 - 0.5) / (2 * 0.5) = 0.5    (1 - 0.5) / (2 * 0.5) = +1

(0 - 0.5) / (2 * 0.5) = -0.5   (0 - 0.5) / (2 * 0.5) = -1

-0.5 --> +0.5                  -1 --> +1
                   Diff of 1                  Diff of 2
Prediction
Simulation
Allow for more general inferences

Propagation of uncertainty
Prediction Errors
90% Percentile Adult vs. 50% Infant
    fit.4   <- lm(log(rings) ~ sex + log(shell.wt), abalone)

    large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90))
    small.infant <- log(median(abalone$shell.wt[abalone$sex == 0]))
    x.a <- sum(c(1, 1, large.abalone) * coef(fit.4))
    x.i <- sum(c(1, 0, small.infant) * coef(fit.4))

    set.seed(1)
    n.sims <- 1000
    pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4)))
    pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4)))
    pred.diff <- pred.a - pred.i

    > mean(pred.diff)
    4.5

    > quantile(pred.diff, c(0.025, 0.975))

    2.5% 98%
    -1.9 11.3
Simulation for
      Inferential Uncertainty
 Simulate residual
standard deviation


 Simulate
Inferential Uncertainty
## Create 1000 simulations of the residual standard error and coefficients

fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone)

n.sims      <-   1000
obj         <-   summary(fit.5) # save off the summary object
sigma.hat   <-   obj$sigma
b.hat       <-   obj$coef[, 'Estimate', drop=TRUE]
cov.beta    <-   obj$cov.unscaled # extract the covariance matrix
k           <-   obj$df[1] # number of predictors
n           <-   obj$df[1] + obj$df[2] # number of observations

set.seed(1)
sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k))

beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat)))
for (i in seq_len(n.sims)) {
  beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta)
}
Inferential Uncertainty

Weitere ähnliche Inhalte

Was ist angesagt?

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis freeMahdiKarimi29
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180Mahmoud Samir Fayed
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"MauricioTics2016
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonPaul Hawks
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningMax Kleiner
 

Was ist angesagt? (13)

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis free
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
 
Kursus
KursusKursus
Kursus
 
Programação funcional em Python
Programação funcional em PythonProgramação funcional em Python
Programação funcional em Python
 
08 functions
08 functions08 functions
08 functions
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint Lesson
 
Hanya contoh saja dari xampp
Hanya contoh saja dari xamppHanya contoh saja dari xampp
Hanya contoh saja dari xampp
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 

Andere mochten auch

Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisFraudBusters
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big PictureCBAP Master
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditFraudBusters
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis TechniquesGajanan Bochare
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud DetectionFraudBusters
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterFraudBusters
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudFraudBusters
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and productionLexbe_Webinars
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions America, Ltd.
 

Andere mochten auch (20)

Simplifying stats
Simplifying  statsSimplifying  stats
Simplifying stats
 
ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011
 
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big Picture
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic Audit
 
Go Predictive Analytics
Go Predictive AnalyticsGo Predictive Analytics
Go Predictive Analytics
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis Techniques
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud Detection
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a Fraudster
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay Fraud
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and production
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
 
Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1
 
High Range Pressure Switches MD Series
High Range Pressure Switches MD SeriesHigh Range Pressure Switches MD Series
High Range Pressure Switches MD Series
 
Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016
 
Tanveer ACCA Accountant
Tanveer ACCA AccountantTanveer ACCA Accountant
Tanveer ACCA Accountant
 
R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)
 
Pamplet
PampletPamplet
Pamplet
 
Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)
 

Ähnlich wie Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

11. Linear Models
11. Linear Models11. Linear Models
11. Linear ModelsFAO
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Dr. Volkan OBAN
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics nazlitemu
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance PuzzlersDoug Hawkins
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIDr. Volkan OBAN
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdfzehiwot hone
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxaulasnilda
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxjeremylockett77
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdfdash41
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2Kevin Chun-Hsien Hsu
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)asghar123456
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...ShuaiGao3
 

Ähnlich wie Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics (20)

11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
Input analysis
Input analysisInput analysis
Input analysis
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance Puzzlers
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdf
 
hw4analysis
hw4analysishw4analysis
hw4analysis
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdf
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
 

Kürzlich hochgeladen

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

  • 1. Intermediate Regression Topics Daniel Gerlanc, Director Enplus Advisors Inc
  • 4. Loading the data > abalone.path = "~/data/abalone.csv" > abalone.cols = c("sex", "length", "diameter", "height", "whole.wt", + "shucked.wt", "viscera.wt", "shell.wt", "rings") > > abalone <- read.csv(abalone.path, sep=",", row.names=NULL, + col.names=abalone.cols) > str(abalone) 'data.frame':! 4177 obs. of 9 variables: $ sex : chr "M" "M" "F" "M" ... $ length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ... $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ... $ height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ... $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ... $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ... $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ... $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ... $ rings : int 15 7 9 10 7 8 20 16 9 19 ...
  • 5. Uses lattice graphics Draw pictures
  • 6. Lattice Plots > xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".", subset=volume < 0.2, panel=function(x, y, ...) { panel.lmline(x, y, ...) panel.xyplot(x, y, ...) }, ylab="rings") ggplot2 is a newer package that can be used to create similar plots.
  • 7. Infant Adult Combine groups
  • 8. Why Transform? Interpretability Additive vs. Multiplicative Form Prediction
  • 9. Simple Model > fit.1 <- lm(rings ~ sex + shell.wt, abalone) > summary(fit.1) Call: lm(formula = rings ~ sex + shell.wt, data = abalone) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2423 0.0799 78.08 <2e-16 *** sex 0.9142 0.0984 9.29 <2e-16 *** shell.wt 12.8581 0.3300 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom
  • 10. Centering with z-scores Subtract the mean from each input and divide by 1 or 2 standard deviations Dummy/Proxy variables may be centered as well
  • 11. Center Values > abalone.adj <- abalone[, c(outcome, predictors)] for (i in predictors) { abalone.adj[[i]] <- (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]])) } Also look into the ‘scale’ function
  • 12. Why center? Interpret coefficients in terms of standard deviations Gives a sense of variable importance
  • 13. Interpretability > fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj) > summary(fit.1a) Call: lm(formula = rings ~ sex + shell.wt, data = abalone.adj) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.9337 0.0385 258.33 <2e-16 *** sex 0.8539 0.0919 9.29 <2e-16 *** shell.wt 3.5798 0.0919 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom Multiple R-squared: 0.406,! Adjusted R-squared: 0.406 F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
  • 14. Two Models lm(formula = rings ~ sex + shell.wt, data = abalone) coef.est coef.se (Intercept) 6.24 0.08 sex 0.91 0.10 shell.wt 12.86 0.33 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 lm(formula = rings ~ sex + shell.wt, data = abalone.adj) coef.est coef.se (Intercept) 9.93 0.04 sex 0.85 0.09 shell.wt 3.58 0.09 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 Smaller difference in SD terms
  • 15. Why divide by 2 SDs So binary variables may be interpreted similarly to continuous variables e.g., Binary Value of 0, 1 occurring with equal frequency has an sd of 0.5. sqrt(0.5 * (1 - 0.5)) = 0.5 (1 - 0.5) / (2 * 0.5) = 0.5 (1 - 0.5) / (2 * 0.5) = +1 (0 - 0.5) / (2 * 0.5) = -0.5 (0 - 0.5) / (2 * 0.5) = -1 -0.5 --> +0.5 -1 --> +1 Diff of 1 Diff of 2
  • 17. Simulation Allow for more general inferences Propagation of uncertainty
  • 18. Prediction Errors 90% Percentile Adult vs. 50% Infant fit.4 <- lm(log(rings) ~ sex + log(shell.wt), abalone) large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90)) small.infant <- log(median(abalone$shell.wt[abalone$sex == 0])) x.a <- sum(c(1, 1, large.abalone) * coef(fit.4)) x.i <- sum(c(1, 0, small.infant) * coef(fit.4)) set.seed(1) n.sims <- 1000 pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4))) pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4))) pred.diff <- pred.a - pred.i > mean(pred.diff) 4.5 > quantile(pred.diff, c(0.025, 0.975)) 2.5% 98% -1.9 11.3
  • 19. Simulation for Inferential Uncertainty Simulate residual standard deviation Simulate
  • 20. Inferential Uncertainty ## Create 1000 simulations of the residual standard error and coefficients fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone) n.sims <- 1000 obj <- summary(fit.5) # save off the summary object sigma.hat <- obj$sigma b.hat <- obj$coef[, 'Estimate', drop=TRUE] cov.beta <- obj$cov.unscaled # extract the covariance matrix k <- obj$df[1] # number of predictors n <- obj$df[1] + obj$df[2] # number of observations set.seed(1) sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k)) beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat))) for (i in seq_len(n.sims)) { beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta) }

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n