SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Intermediate
Regression Topics
  Daniel Gerlanc, Director
    Enplus Advisors Inc
Topics


Abalone Data

Variable Transformation

Simulation for Predictive Inference
http://archive.ics.uci.edu/ml/datasets/Abalone




                   Abalone
Loading the data
>   abalone.path = "~/data/abalone.csv"
>   abalone.cols = c("sex", "length", "diameter", "height", "whole.wt",
+                    "shucked.wt", "viscera.wt", "shell.wt", "rings")
>
>   abalone <- read.csv(abalone.path, sep=",", row.names=NULL,
+                       col.names=abalone.cols)
>   str(abalone)

'data.frame':!
             4177 obs. of 9 variables:
 $ sex       : chr "M" "M" "F" "M" ...
 $ length    : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
 $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
 $ height    : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
 $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ...
 $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
 $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ...
 $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
 $ rings     : int 15 7 9 10 7 8 20 16 9 19 ...
Uses lattice graphics




             Draw pictures
Lattice Plots
> xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".",
       subset=volume < 0.2,
       panel=function(x, y, ...) {
          panel.lmline(x, y, ...)
          panel.xyplot(x, y, ...)
       },
       ylab="rings")


ggplot2 is a newer package that can be used to create similar plots.
Infant    Adult




   Combine groups
Why Transform?


Interpretability

Additive vs. Multiplicative Form

Prediction
Simple Model
> fit.1 <- lm(rings ~ sex + shell.wt, abalone)

> summary(fit.1)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    6.2423    0.0799   78.08   <2e-16 ***
sex            0.9142    0.0984    9.29   <2e-16 ***
shell.wt      12.8581    0.3300   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Centering with z-scores


 Subtract the mean from each input and
 divide by 1 or 2 standard deviations

 Dummy/Proxy variables may be centered as
 well
Center Values
> abalone.adj <- abalone[, c(outcome, predictors)]
for (i in predictors) {
  abalone.adj[[i]] <-
    (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]]))
}

Also look into the ‘scale’ function
Why center?


Interpret coefficients in terms of standard
deviations

Gives a sense of variable importance
Interpretability
> fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj)

> summary(fit.1a)

Call:
lm(formula = rings ~ sex + shell.wt, data = abalone.adj)

Residuals:
   Min     1Q Median      3Q    Max
-5.750 -1.592 -0.535   0.886 15.736

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   9.9337     0.0385 258.33    <2e-16 ***
sex           0.8539     0.0919    9.29   <2e-16 ***
shell.wt      3.5798     0.0919   38.96   <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.5 on 4174 degrees of freedom
Multiple R-squared: 0.406,!
                          Adjusted R-squared: 0.406
F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
Two Models
    lm(formula = rings ~ sex + shell.wt, data = abalone)
                coef.est coef.se
    (Intercept) 6.24      0.08
    sex          0.91     0.10
    shell.wt    12.86     0.33
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



    lm(formula = rings ~ sex + shell.wt, data = abalone.adj)
                coef.est coef.se
    (Intercept) 9.93     0.04
    sex         0.85     0.09
    shell.wt    3.58     0.09
    ---
    n = 4177, k = 3
    residual sd = 2.49, R-Squared = 0.41



Smaller difference in SD terms
Why divide by 2 SDs
So binary variables may be interpreted
similarly to continuous variables

e.g., Binary Value of 0, 1 occurring with equal
frequency has an sd of 0.5.
sqrt(0.5 * (1 - 0.5)) = 0.5

(1 - 0.5) / (2 * 0.5) = 0.5    (1 - 0.5) / (2 * 0.5) = +1

(0 - 0.5) / (2 * 0.5) = -0.5   (0 - 0.5) / (2 * 0.5) = -1

-0.5 --> +0.5                  -1 --> +1
                   Diff of 1                  Diff of 2
Prediction
Simulation
Allow for more general inferences

Propagation of uncertainty
Prediction Errors
90% Percentile Adult vs. 50% Infant
    fit.4   <- lm(log(rings) ~ sex + log(shell.wt), abalone)

    large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90))
    small.infant <- log(median(abalone$shell.wt[abalone$sex == 0]))
    x.a <- sum(c(1, 1, large.abalone) * coef(fit.4))
    x.i <- sum(c(1, 0, small.infant) * coef(fit.4))

    set.seed(1)
    n.sims <- 1000
    pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4)))
    pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4)))
    pred.diff <- pred.a - pred.i

    > mean(pred.diff)
    4.5

    > quantile(pred.diff, c(0.025, 0.975))

    2.5% 98%
    -1.9 11.3
Simulation for
      Inferential Uncertainty
 Simulate residual
standard deviation


 Simulate
Inferential Uncertainty
## Create 1000 simulations of the residual standard error and coefficients

fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone)

n.sims      <-   1000
obj         <-   summary(fit.5) # save off the summary object
sigma.hat   <-   obj$sigma
b.hat       <-   obj$coef[, 'Estimate', drop=TRUE]
cov.beta    <-   obj$cov.unscaled # extract the covariance matrix
k           <-   obj$df[1] # number of predictors
n           <-   obj$df[1] + obj$df[2] # number of observations

set.seed(1)
sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k))

beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat)))
for (i in seq_len(n.sims)) {
  beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta)
}
Inferential Uncertainty

Weitere ähnliche Inhalte

Was ist angesagt?

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis freeMahdiKarimi29
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180Mahmoud Samir Fayed
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"MauricioTics2016
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonPaul Hawks
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningMax Kleiner
 

Was ist angesagt? (13)

Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Multi dof modal analysis free
Multi dof modal analysis freeMulti dof modal analysis free
Multi dof modal analysis free
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
 
Kursus
KursusKursus
Kursus
 
Programação funcional em Python
Programação funcional em PythonProgramação funcional em Python
Programação funcional em Python
 
08 functions
08 functions08 functions
08 functions
 
The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180The Ring programming language version 1.5.1 book - Part 60 of 180
The Ring programming language version 1.5.1 book - Part 60 of 180
 
Manual "The meuse data set"
Manual "The meuse data set"Manual "The meuse data set"
Manual "The meuse data set"
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
The Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint LessonThe Chain Rule Powerpoint Lesson
The Chain Rule Powerpoint Lesson
 
Hanya contoh saja dari xampp
Hanya contoh saja dari xamppHanya contoh saja dari xampp
Hanya contoh saja dari xampp
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 

Andere mochten auch

Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisFraudBusters
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big PictureCBAP Master
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditFraudBusters
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis TechniquesGajanan Bochare
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud DetectionFraudBusters
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterFraudBusters
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudFraudBusters
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and productionLexbe_Webinars
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions America, Ltd.
 

Andere mochten auch (20)

Simplifying stats
Simplifying  statsSimplifying  stats
Simplifying stats
 
ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011ACCOUNTING & AUDITING WITH EXCEL2011
ACCOUNTING & AUDITING WITH EXCEL2011
 
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data AnalysisDetecting and Auditing for Fraud in Financial Statements Using Data Analysis
Detecting and Auditing for Fraud in Financial Statements Using Data Analysis
 
Babok2 Big Picture
Babok2 Big PictureBabok2 Big Picture
Babok2 Big Picture
 
Using Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic AuditUsing Data Analytics to Conduct a Forensic Audit
Using Data Analytics to Conduct a Forensic Audit
 
Go Predictive Analytics
Go Predictive AnalyticsGo Predictive Analytics
Go Predictive Analytics
 
9 Quantitative Analysis Techniques
9   Quantitative Analysis Techniques9   Quantitative Analysis Techniques
9 Quantitative Analysis Techniques
 
Quick Response Fraud Detection
Quick Response Fraud DetectionQuick Response Fraud Detection
Quick Response Fraud Detection
 
Think Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a FraudsterThink Like a Fraudster to Catch a Fraudster
Think Like a Fraudster to Catch a Fraudster
 
Using Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay FraudUsing Data Analytics to Find and Deter Procure to Pay Fraud
Using Data Analytics to Find and Deter Procure to Pay Fraud
 
Faster document review and production
Faster document review and productionFaster document review and production
Faster document review and production
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
 
Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1Azure BootCamp presentation 2016 v1.1
Azure BootCamp presentation 2016 v1.1
 
High Range Pressure Switches MD Series
High Range Pressure Switches MD SeriesHigh Range Pressure Switches MD Series
High Range Pressure Switches MD Series
 
Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016Contenedores Docker en SUSE: OpenExpo 2016
Contenedores Docker en SUSE: OpenExpo 2016
 
Tanveer ACCA Accountant
Tanveer ACCA AccountantTanveer ACCA Accountant
Tanveer ACCA Accountant
 
R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)R type Three Valve Manifold (3VS)
R type Three Valve Manifold (3VS)
 
Pamplet
PampletPamplet
Pamplet
 
Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)
 

Ähnlich wie Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

11. Linear Models
11. Linear Models11. Linear Models
11. Linear ModelsFAO
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Dr. Volkan OBAN
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics nazlitemu
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance PuzzlersDoug Hawkins
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIDr. Volkan OBAN
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdfzehiwot hone
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxaulasnilda
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docxjeremylockett77
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdfdash41
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2Kevin Chun-Hsien Hsu
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)asghar123456
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...ShuaiGao3
 

Ähnlich wie Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics (20)

11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
Input analysis
Input analysisInput analysis
Input analysis
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics
 
Java Performance Puzzlers
Java Performance PuzzlersJava Performance Puzzlers
Java Performance Puzzlers
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdf
 
hw4analysis
hw4analysishw4analysis
hw4analysis
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx1  PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
1 PROBABILITY DISTRIBUTIONS R. BEHBOUDI Triangu.docx
 
Assignment 5.1.pdf
Assignment 5.1.pdfAssignment 5.1.pdf
Assignment 5.1.pdf
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)Amth250 octave matlab some solutions (1)
Amth250 octave matlab some solutions (1)
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
Time Series Analysis on Egg depositions (in millions) of age-3 Lake Huron Blo...
 

Kürzlich hochgeladen

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Boston Predictive Analytics: Linear and Logistic Regression Using R - Intermediate Topics

  • 1. Intermediate Regression Topics Daniel Gerlanc, Director Enplus Advisors Inc
  • 4. Loading the data > abalone.path = "~/data/abalone.csv" > abalone.cols = c("sex", "length", "diameter", "height", "whole.wt", + "shucked.wt", "viscera.wt", "shell.wt", "rings") > > abalone <- read.csv(abalone.path, sep=",", row.names=NULL, + col.names=abalone.cols) > str(abalone) 'data.frame':! 4177 obs. of 9 variables: $ sex : chr "M" "M" "F" "M" ... $ length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ... $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ... $ height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ... $ whole.wt : num 0.514 0.226 0.677 0.516 0.205 ... $ shucked.wt: num 0.2245 0.0995 0.2565 0.2155 0.0895 ... $ viscera.wt: num 0.101 0.0485 0.1415 0.114 0.0395 ... $ shell.wt : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ... $ rings : int 15 7 9 10 7 8 20 16 9 19 ...
  • 5. Uses lattice graphics Draw pictures
  • 6. Lattice Plots > xyplot(jitter(rings) ~ shell.wt | sex, abalone, grid=T, pch=".", subset=volume < 0.2, panel=function(x, y, ...) { panel.lmline(x, y, ...) panel.xyplot(x, y, ...) }, ylab="rings") ggplot2 is a newer package that can be used to create similar plots.
  • 7. Infant Adult Combine groups
  • 8. Why Transform? Interpretability Additive vs. Multiplicative Form Prediction
  • 9. Simple Model > fit.1 <- lm(rings ~ sex + shell.wt, abalone) > summary(fit.1) Call: lm(formula = rings ~ sex + shell.wt, data = abalone) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2423 0.0799 78.08 <2e-16 *** sex 0.9142 0.0984 9.29 <2e-16 *** shell.wt 12.8581 0.3300 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom
  • 10. Centering with z-scores Subtract the mean from each input and divide by 1 or 2 standard deviations Dummy/Proxy variables may be centered as well
  • 11. Center Values > abalone.adj <- abalone[, c(outcome, predictors)] for (i in predictors) { abalone.adj[[i]] <- (abalone.adj[[i]] - mean(abalone.adj[[i]])) / (2 * sd(abalone.adj[[i]])) } Also look into the ‘scale’ function
  • 12. Why center? Interpret coefficients in terms of standard deviations Gives a sense of variable importance
  • 13. Interpretability > fit.1a <- lm(rings ~ sex + shell.wt, abalone.adj) > summary(fit.1a) Call: lm(formula = rings ~ sex + shell.wt, data = abalone.adj) Residuals: Min 1Q Median 3Q Max -5.750 -1.592 -0.535 0.886 15.736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.9337 0.0385 258.33 <2e-16 *** sex 0.8539 0.0919 9.29 <2e-16 *** shell.wt 3.5798 0.0919 38.96 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.5 on 4174 degrees of freedom Multiple R-squared: 0.406,! Adjusted R-squared: 0.406 F-statistic: 1.43e+03 on 2 and 4174 DF, p-value: <2e-16
  • 14. Two Models lm(formula = rings ~ sex + shell.wt, data = abalone) coef.est coef.se (Intercept) 6.24 0.08 sex 0.91 0.10 shell.wt 12.86 0.33 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 lm(formula = rings ~ sex + shell.wt, data = abalone.adj) coef.est coef.se (Intercept) 9.93 0.04 sex 0.85 0.09 shell.wt 3.58 0.09 --- n = 4177, k = 3 residual sd = 2.49, R-Squared = 0.41 Smaller difference in SD terms
  • 15. Why divide by 2 SDs So binary variables may be interpreted similarly to continuous variables e.g., Binary Value of 0, 1 occurring with equal frequency has an sd of 0.5. sqrt(0.5 * (1 - 0.5)) = 0.5 (1 - 0.5) / (2 * 0.5) = 0.5 (1 - 0.5) / (2 * 0.5) = +1 (0 - 0.5) / (2 * 0.5) = -0.5 (0 - 0.5) / (2 * 0.5) = -1 -0.5 --> +0.5 -1 --> +1 Diff of 1 Diff of 2
  • 17. Simulation Allow for more general inferences Propagation of uncertainty
  • 18. Prediction Errors 90% Percentile Adult vs. 50% Infant fit.4 <- lm(log(rings) ~ sex + log(shell.wt), abalone) large.abalone <- log(quantile(subset(abalone, sex == 1)$shell.wt, 0.90)) small.infant <- log(median(abalone$shell.wt[abalone$sex == 0])) x.a <- sum(c(1, 1, large.abalone) * coef(fit.4)) x.i <- sum(c(1, 0, small.infant) * coef(fit.4)) set.seed(1) n.sims <- 1000 pred.a <- exp(rnorm(n.sims, x.a, sigma.hat(fit.4))) pred.i <- exp(rnorm(n.sims, x.i, sigma.hat(fit.4))) pred.diff <- pred.a - pred.i > mean(pred.diff) 4.5 > quantile(pred.diff, c(0.025, 0.975)) 2.5% 98% -1.9 11.3
  • 19. Simulation for Inferential Uncertainty Simulate residual standard deviation Simulate
  • 20. Inferential Uncertainty ## Create 1000 simulations of the residual standard error and coefficients fit.5 <- lm(log(rings) ~ sex + shell.wt + sex:shell.wt, abalone) n.sims <- 1000 obj <- summary(fit.5) # save off the summary object sigma.hat <- obj$sigma b.hat <- obj$coef[, 'Estimate', drop=TRUE] cov.beta <- obj$cov.unscaled # extract the covariance matrix k <- obj$df[1] # number of predictors n <- obj$df[1] + obj$df[2] # number of observations set.seed(1) sigma.sim <- sigma.hat * sqrt((n-k) / rchisq(n.sims, n-k)) beta.sim <- matrix(NA_real_, n.sims, k, dimnames=list(NULL, names(beta.hat))) for (i in seq_len(n.sims)) { beta.sim[i, ] <- MASS::mvrnorm(1, b.hat, sigma.sim[i]^2 * cov.beta) }

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n