EXERCISE 4.2: Wine class prediction with Neural Networks (multinomial classification)

In this problem, we are given a dataset containing the quality class of some Italian wines, together with their chemical characteristics (alcohol content, flavonoids, colour intensity...) Our task is to build a neural network model and train it in order to predict the wine quality class. This is an example of a multinomial regression.

In detail, the attributes of this dataset are:

Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue
OD280/OD315 of diluted wines
Proline

Further information concerning this dataset can be found online on the UCI Machine Learning Repository dedicated page or in particular on this file

Our prediction concerns the quality class of the wine (1, 2 or 3) that is given in the first column of the data.

Skills employed:

download and import data from internet
design and train a Neural Network for multinomial classification using BetaML
use the additional BetaML functions partition and accuracy and the models OneHotEncoder, Scaler and ConfusionMatrix.

Instructions

If you have already cloned or downloaded the whole course repository the folder with the exercise is on [REPOSITORY_ROOT]/lessonsMaterial/04_NN/wineClass. Otherwise download a zip of just that folder here.

In the folder you will find the file WineClass.jl containing the julia file that you will have to complete to implement the missing parts and run the file (follow the instructions on that file). In that folder you will also find the Manifest.toml file. The proposal of resolution below has been tested with the environment defined by that file. If you are stuck and you don't want to lookup to the resolution above you can also ask for help in the forum at the bottom of this page. Good luck!

Resolution

Click "ONE POSSIBLE SOLUTION" to get access to (one possible) solution for each part of the code that you are asked to implement.

1) Setting up the environment...

Start by setting the working directory to the directory of this file and activate it. If you have the provided Manifest.toml file in the directory, just run Pkg.instantiate(), otherwise manually add the packages Pipe, HTTP, Plots and BetaML.

ONE POSSIBLE SOLUTION

cd(@__DIR__)         
using Pkg             
Pkg.activate(".")   
# If using a Julia version different than 1.10 please uncomment and run the following line (reproductibility guarantee will hower be lost)
# Pkg.resolve()   
Pkg.instantiate() 
using Random
Random.seed!(123)

2) Load the packages

Load the packages DelimitedFiles, Pipe, HTTP, Plots and BetaML.

ONE POSSIBLE SOLUTION

using DelimitedFiles, Pipe, HTTP, Plots, BetaML

3) Load the data

Load from internet or from local file the input data as a Matrix. You can use readdlm` using the comma as field separator.

dataURL="https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"

ONE POSSIBLE SOLUTION

data    = @pipe HTTP.get(dataURL).body |> readdlm(_,',')

4) Write the feature matrix and the label vector

Now create the X matrix of features using the second to final columns of the data you loaded above and the Y vector by taking the 1st column. Transform the Y vector to a vector of integers using the Int() function (broadcasted). Make sure you have a 178×13 matrix and a 178 elements vector

ONE POSSIBLE SOLUTION

X = data[:,2:end]
Y = Int.(data[:,1] )

5) Partition the data

Partition the data in (xtrain,xtest) and (ytrain,ytest) keeping 80% of the data for training and reserving 20% for testing. Keep the default option to shuffle the data, as the input data isn't.

ONE POSSIBLE SOLUTION

((xtrain,xtest),(ytrain,ytest)) = partition([X,Y],[0.8,0.2])

6) Implement one-hot encoding of categorical variables

As the output is multinomial we need to encode ytrain. We use the OneHotEncoder() model to make ytrain_oh

ONE POSSIBLE SOLUTION

ytrain_oh = fit!(OneHotEncoder(),ytrain)

7) Define the neural network architecture

Define a NeuralNetworkEstimator model with the following characteristics:

3 dense layers with respectively 13, 20 and 3 nodes and activation function relu
a VectorFunctionLayer with 3 nodes and softmax as activation function
crossentropy as the neural network cost function
training options: 100 epochs and 6 records to be used on each batch

ONE POSSIBLE SOLUTION

l1 = DenseLayer(13,20,f=relu)
l2 = DenseLayer(20,20,f=relu)
l3 = DenseLayer(20,3,f=relu)
l4 = VectorFunctionLayer(3,f=softmax)
mynn= NeuralNetworkEstimator(layers=[l1,l2,l3,l4],loss=crossentropy,batch_size=6,epochs=100)

8) Train the model

Train the model using ytrain and a scaled version of xtrain (where all columns have zero mean and 1 standard deviation)

ONE POSSIBLE SOLUTION

fit!(mynn,fit!(Scaler(),xtrain),ytrain_oh)

9) Predict the labels

Predict the training labels ŷtrain and the test labels ŷtest. Recall you did the training on the scaled features!

ONE POSSIBLE SOLUTION

ŷtrain   = predict(mynn, fit!(Scaler(),xtrain)) 
ŷtest    = predict(mynn, fit!(Scaler(),xtest))

10) Evaluate the model

Compute the train and test accuracies using the function accuracy

ONE POSSIBLE SOLUTION

trainAccuracy  = accuracy(ytrain,ŷtrain)
testAccuracy   = accuracy(ytest,ŷtest)

11) Evaluate the model more in detail

Compute and print a Confusion Matrix of the test data true vs. predicted

ONE POSSIBLE SOLUTION

cm = ConfusionMatrix()
fit!(cm,ytest,ŷtest)
println(cm)

12) Plot the errors

Run the following commands to plots the average loss per epoch

plot(info(mynn)["loss_per_epoch"])

13) (Optional) Use unscaled data

Run the same workflow without scaling the data or using squared_cost as cost function. How this affect the quality of your predictions ?

ONE POSSIBLE SOLUTION

Random.seed!(123)
((xtrain,xtest),(ytrain,ytest)) = partition([X,Y],[0.8,0.2])
ytrain_oh = fit!(OneHotEncoder(),ytrain) 
l1 = DenseLayer(13,20,f=relu)
l2 = DenseLayer(20,20,f=relu)
l3 = DenseLayer(20,3,f=relu)
l4 = VectorFunctionLayer(3,f=softmax)
mynn= NeuralNetworkEstimator(layers=[l1,l2,l3,l4],loss=crossentropy,batch_size=6,epochs=100)
fit!(mynn,xtrain,ytrain_oh)
ŷtrain   = predict(mynn, xtrain) 
ŷtest    = predict(mynn, xtest) 
trainAccuracy  = accuracy(ytrain,ŷtrain)
testAccuracy   = accuracy(ytest,ŷtest)  
plot(info(mynn)["loss_per_epoch"])

Random.seed!(123)
((xtrain,xtest),(ytrain,ytest)) = partition([X,Y],[0.8,0.2])
ytrain_oh = fit!(OneHotEncoder(),ytrain) 
l1 = DenseLayer(13,20,f=relu)
l2 = DenseLayer(20,20,f=relu)
l3 = DenseLayer(20,3,f=relu)
l4 = VectorFunctionLayer(3,f=softmax)
mynn= NeuralNetworkEstimator(layers=[l1,l2,l3,l4],loss=squared_cost,batch_size=6,epochs=100)
fit!(mynn,fit!(Scaler(),xtrain),ytrain_oh)
ŷtrain   = predict(mynn, fit!(Scaler(),xtrain)) 
ŷtest    = predict(mynn, fit!(Scaler(),xtest)) 
trainAccuracy  = accuracy(ytrain,ŷtrain)
testAccuracy   = accuracy(ytest,ŷtest)  
plot(info(mynn)["loss_per_epoch"])