Project co prediction Regression analysis | MTH 426 IITK
1. -Multiple Regression Analysis
Speculating Daily Maximum Carbon
Monoxide (CO) Level
Team Member Roll No
Bhanu Yadav 13198
Nakul Surana 13418
Instructor: Dr. Sharmishtha Mitra
2. ◦ Increasing pollution levels in urban areas is harmful
◦ In this study we wish to predict CO levels a week prior
◦ In order to plan some outdoor activities in upcoming
week
◦ CO level between (3PPM -6PPM) is considered as safe
Objective
3. ◦ Use Hourly Data from March 2004 to February 2005 to
forecast daily maximum level of CO for
5th April 2005 to 11th April 2005
◦ Dataset contains 9358 instances of hourly averaged
response of several pollutants in Italian City
◦ Taken from - UCI machine learning repository- Air
Quality data set
DATA
4. Variable Y CO
Possible X Variable
PTO8.S1(CO), NMHC(GT), C6H6(GT), PTO8.S2(NMHC), NOx(GT), PTO8.S3(NOx),
NO2(GT), PTO8.S4(NO2), PTO8.S5(O3), T, RH and AH
X Variable NMHC had more than 90% missing values (Excluded
from the possible X variables set)
All other variables had less than 10% missing values
Replaced the missing values by the previous hour values and for
consecutive missing values with last week-hour values
Transformation of Data
7. This suggests a seasonality of CO w.r.t. days of the
year to compensate that we will introduce dummy
variables
X4 = 1 if days of the year are between 200 to 300
= 0 otherwise
And a seasonality of CO w.r.t. days of the week
X5 = 1 if Monday, Tuesday, Saturday and Sunday
= 0 otherwise
Dummy Variable
8. Input Variables
• Daily maximum C6H6 (lag 8)
• Daily maximum T (lag 7)
• Daily maximum AH (lag 7)
• Monthly dummy variables
• Weekly dummy variables Output Variable
• Daily maximum CO concentration
Best Model
11. Plot of Residuals against the Fitted Values yˆi
Residue Analysis
12. Y = 2.2 + 0.15 (Max C6H6) – 0.05 (Max T) – 0.02 (Max
AH) + 0.31 (Monthly dummy) + 0.16 (Weekly
dummy)
R2_adjusted = 0.656 => Our model can explain 65%
of the variability in the data
Normal probability plot of the residual behaves
properly
Plot of Residuals against the Fitted Values
yˆibehaves properly too
Conclusions