• Built data mining methods of CART, bagging and random forest to evaluate the quality of Portuguese "Vinho Verde" red wine;
• Found out test error rate decreased by 7% by using random forest over CART methods and the red wine quality were mainly determined by the physicochemical factors, alcohol and sulphates.
2. Data Description
• Source:
Paulo Cortez, University of Minho, Guimarães,
Portugal, http://www3.dsi.uminho.pt/pcortez A.
Cerdeira, F. Almeida, T. Matos and J. Reis,
Viticulture Commission of the Vinho Verde
Region(CVRVV), Porto, Portugal @2009
3. Data Description
• The dataset is related to red variant of
the Portuguese "Vinho Verde" wine.
• Due to privacy and logistic issues, only
physicochemical (inputs) and sensory
(the output) variables are available.
7. R code for training set
and test set
B<-20
for(i in 1:B){
set.seed(i)
indexes<-sample(1:nrow(data),size=1000,replace=F)
train<-data[indexes[1:1000],]
test<-data[-indexes[1:1000],]
}