This is a wine data set provided by UC Irvine’s Machine Learning Reporsity https://archive.ics.uci.edu/ml/index.php

Wine Quality Data Set:

##   fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1           7.4             0.70        0.00            1.9     0.076
## 2           7.8             0.88        0.00            2.6     0.098
## 3           7.8             0.76        0.04            2.3     0.092
## 4          11.2             0.28        0.56            1.9     0.075
## 5           7.4             0.70        0.00            1.9     0.076
##   free.sulfur.dioxide total.sulfur.dioxide density   pH sulphates alcohol
## 1                  11                   34  0.9978 3.51      0.56     9.4
## 2                  25                   67  0.9968 3.20      0.68     9.8
## 3                  15                   54  0.9970 3.26      0.65     9.8
## 4                  17                   60  0.9980 3.16      0.58     9.8
## 5                  11                   34  0.9978 3.51      0.56     9.4
##   color quality
## 1   Red       5
## 2   Red       5
## 3   Red       5
## 4   Red       6
## 5   Red       5

The data set comes in white and red wines, they were combined and noted for this analysis.

Histogram of the quality levels:

Data set categories:

## $fixed.acidity
## numeric(0)
## 
## $volatile.acidity
## numeric(0)
## 
## $citric.acid
## numeric(0)
## 
## $residual.sugar
## numeric(0)
## 
## $chlorides
## numeric(0)
## 
## $free.sulfur.dioxide
## numeric(0)
## 
## $total.sulfur.dioxide
## numeric(0)
## 
## $density
## numeric(0)
## 
## $pH
## numeric(0)
## 
## $sulphates
## numeric(0)
## 
## $alcohol
## numeric(0)
## 
## $color
## factor(0)
## Levels: Red White
## 
## $quality
## factor(0)
## Levels: 3 4 5 6 7 8 9

This is a analysis aims to see what model would be best to predict the quality of the wine given the factors above.

The methods are: Neural Net, Random Forest, Naive Bayes, and Support Vector Machine.

The code for each is:

NNWine = nnet(formula = quality~., data = train, size = 20, rang = .1, decay = .01, maxit = 10000, trace = FALSE)
RFWine = randomForest(train$quality~., data = train, mtry = 3, nodesize = 20, ntree = 10000)
BayesWine = naiveBayes(train$quality~., data = train)
SVMWine = svm(train$quality~., data = train, kernel = "radial")

Accuracy for each model is tested based on how often each model correctly identifies the correct quality level. 2nd accuracy is how often its predicts the correct quality +/- one level.

Nueral Net accuracies:

##    
##       3   4   5   6   7   8
##   3   0   2   0   5   1   0
##   4   0   3  36  23   0   0
##   5   1   5 399 244   9   0
##   6   1   0 192 622  67   1
##   7   0   0  10 188 129   2
##   8   0   0   0  38  19   0
##   9   0   0   0   0   2   0
## [1] 0.5767884
## [1] 0.9544772

Random Forest accuracies:

##    
##       3   4   5   6   7   8   9
##   3   0   0   2   6   0   0   0
##   4   0   0  40  22   0   0   0
##   5   0   0 447 206   5   0   0
##   6   0   0 174 669  40   0   0
##   7   0   0   9 197 123   0   0
##   8   0   0   0  36  18   3   0
##   9   0   0   0   1   1   0   0
## [1] 0.6213107
## [1] 0.9589795

Naive Bayes accuracies:

##    
##       3   4   5   6   7   8   9
##   3   1   1   0   3   3   0   0
##   4   0   6  26  17  13   0   0
##   5   7  12 358 219  62   0   0
##   6   7  15 233 377 246   5   0
##   7   1   4  39 101 178   5   1
##   8   0   0   4  18  34   1   0
##   9   0   0   0   0   2   0   0
## [1] 0.4607304
## [1] 0.8994497

Support Vector Machine accuracies:

##    
##       3   4   5   6   7   8   9
##   3   0   0   2   6   0   0   0
##   4   0   0  37  25   0   0   0
##   5   0   0 404 252   2   0   0
##   6   0   0 197 656  30   0   0
##   7   0   0   8 252  69   0   0
##   8   0   0   0  46  11   0   0
##   9   0   0   0   1   1   0   0
## [1] 0.5647824
## [1] 0.9544772

Random Forest performs extremely well for this. We’ll try out NCAA predictions soon.