Deep learning is a subset of Machine Learning (that is, again, a subset of Artificial Intelligence) whose algorithms are based on the layers used in artificial neural networks. It has a variety of applications, among which image recognition, that is what we are going to discuss in this article.
To show how to build, train and predict with your neural network, I will use Tensorflow, that you can easily run on your Jupyter Notebook.
import tensorflow as tf from tensorflow import keras import matplotlib.pyplot as plt import numpy as np import pydot_ng as pydot
Now let’s download and have a first sight of our database (available in Keras libraries):
from keras.datasets import cifar10 #I'm dividing my data into training and test set (x_train, y_train), (x_test, y_test) = cifar10.load_data() x_train.shape, x_test.shape, y_train.shape, y_test.shape
As you can see, in our x training set we have 50000 images, each of 32×32 pixels and with 3 channels (same for the x test set, but with only 10000 observations). On the other hand, our y sets are arrays of numbers ranging from 0 to 9, corresponding to our classes. So we can start by creating a vector of corresponding classes to assign later on to our predictions. Furthermore, we can also set two more variables:
- epochs: number of iterations for our neural network while training
- batch_size: number of samples we want to use for each epoch
batch_size=32 epochs=3 class_names = ["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"]
Since I first want to show the process in a very intuitive way, I will work with my images resizing them from 3-channels to 1-channel images (that is, from colors to black and white). By doing so, we will be able to visualize the whole process of convolution, pooling and fully connection (I’m going to explain these three steps later on).
The algorithm we are going to use for this task is a Convolutional Neural Network. I’m not going to dwell a lot on the math behind it, but I want to provide an intuitive idea of how it works.
So, let’s have a look at this picture together with the script below (which I’m going to explain step by step). In the picture, I’ve examined an easier task: we have an alphabet of four letters – A, B, C and D – and our algorithm is asked to recognize our input letter (in our case, a ‘C’). On the other hand, the algorithm built in the script is referring to our dataset, so the output vector will have not four but ten entries. However, the underlying process is the same.
model=tf.keras.Sequential() model.add(tf.keras.layers.Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(32,32,1))) model.add(tf.keras.layers.MaxPooling2D(pool_size=(2,2))) model.add(tf.keras.layers.Flatten()) model.add(tf.keras.layers.Dense(1024,activation='relu')) model.add(tf.keras.layers.Dense(10,activation='softmax'))
Let’s explain each passage:
- Convolution: the image is inspected by a filter, which is able to segment the input image into smaller pieces, returning a so-called feature map (more precisely, it returns as many feature maps as the number of filters used). To get the output of this filtering procedure, we need a so-called activation function. Activation functions map the resulting values, namely, in between 0 to 1 or -1 to 1 and so forth (depending upon the function). In our case, ReLU function simply turns to zero any negative value;
- Pooling: the main goal of this step is reducing the size of our feature maps throughout a function (in this case, we used a ‘max’ function: it returns the highest pixel value among those examined);
- Fully connection: the purpose of the Fully Connected layer is to use those features to classify the input image into various classes based on the training dataset. In this step, after having converted our matrices-shaped images into arrays of numbers, we apply an activation function once again and then obtain, as final output, a vector of probability, as long as the classes’ vector. Indeed, the activation function we used, called ‘softmax’, convert inputs into a probabilities range. The range will be from 0 to 1, and the sum of all the probabilities will be, of course, equal to one. In our case, since it is a multi-classification task, this function returns the probabilities of each class and the target class will have the highest probability.
The main goal of our CNN is to produce an output, in terms of predictions, which is as close as possible to the real one. Furthermore, the algorithm, once evaluated, is able to learn from its past paths by re-weighting some parameters and minimize the error terms. This operation is called backpropagation.
In order to be implemented, backpropagation needs two further elements:
- Loss function: it measures the consistency of the model. It returns greater values when fitted values are far from actual values. A typical loss function, widely used in Linear Regression models, is Mean Squared Errors (MSE). In our case, we will use the Categorical Crossentropy function.
- Optimizer: to minimize errors, weights need to be modified, and we can do so by using a class of functions called Optimization functions. Optimization functions usually calculate the partial derivative of the loss function (called gradient) with respect to weights, and the weights are modified in the opposite direction of the calculated gradient. This cycle is repeated until we reach the minima of the loss function. In our example, we will use Adam Optimizer.
model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(lr=0.001,decay=1e-6), metrics=['accuracy'])
Note that the argument ‘metrics’, as the loss function, is a way to measure the model’s performance. However, it does not affect the training process (while the loss function does).
Now we can train our model and validate it on our test set (always reshaping it into a 1-channel dataset):
#I'm dividing my train and test set by 255, since I want to normalize the value of each pixel (ranging from 0 to 255) model.fit(np.resize(x_train, (50000,32,32,1))/255.0,tf.keras.utils.to_categorical(y_train), batch_size=batch_size, shuffle=True, epochs=epochs, validation_data=(np.resize(x_test, (10000,32,32,1))/255.0,tf.keras.utils.to_categorical(y_test)) )
Please forgive this model for being so poor (the accuracy reached the embarrassing value of 10.74%), but I asked for only 3 iterations and reduced the channels. Indeed, the aim of this first approach is just visualizing the process.
First, we apply our 3×3 filters to our input images. This operation returns 32 features maps, each of 30×30 pixels. Then we reduce our images’ dimensions, from 30×30 to 15×15, using a pool filter size of 2×2 (that means, our pool filter will examine four pixels at the time, returning only one pixel equal to the maximum value of its evaluation).
Now let’s train the model with the 3-channels images. First, we can have a look at the kind of images we are about to analyze:
Now let’s train again our neural network on the 3-channels images (adding some modifications):
model.add(tf.keras.layers.Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(32,32,3))) model.add(tf.keras.layers.MaxPooling2D(pool_size=(2,2))) #I'm adding two Dropout layers to prevent overfitting model.add(tf.keras.layers.Dropout(0.25)) model.add(tf.keras.layers.Flatten()) model.add(tf.keras.layers.Dense(1024,activation='relu')) model.add(tf.keras.layers.Dropout(0.5)) model.add(tf.keras.layers.Dense(10,activation='softmax'))
Let’s compile it, keeping in mind all the considerations made above about loss function and optimizer:
model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(lr=0.001,decay=1e-6), metrics=['accuracy']) model.fit(x_train/255.0,tf.keras.utils.to_categorical(y_train), batch_size=batch_size, shuffle=True, epochs=epochs, validation_data=(x_test/255.0,tf.keras.utils.to_categorical(y_test)) )
We can now evaluate it on our validation test (and make predictions on it):
predictions=model.predict(x_test) scores = model.evaluate(x_test / 255.0, tf.keras.utils.to_categorical(y_test))
Nice, the accuracy raised to 63.87%. Let’s compare some predictions:
#I'm defining a function that plot my predicted image, with true label as title def plot_pred(i,predictions_array,true_label,img): predictions_array,true_label,img=predictions_array[i],true_label[i:i+1],img[i] plt.grid(False) plt.title(class_names[true_label]) plt.xticks() plt.yticks() plt.imshow(img) #I'm defining a function that plot my prediction vector, showing whether my #predicted value is correct (in blue) or incorrect (in red) def plot_bar(i,predictions_array,true_label): predictions_array, true_label = predictions_array[i], true_label[i] plt.grid(False) plt.yticks() plt.xticks(np.arange(10),class_names,rotation=40) thisplot=plt.bar(range(10),predictions_array, color='grey') plt.ylim([0,1]) predicted_label=np.argmax(predictions_array) if predicted_label==true_label: color='blue' else: color='red' thisplot[predicted_label].set_color(color) #plotting both the images plt.figure(figsize=(15,6)) plt.subplot(1,2,1) plot_pred(10, predictions, y_test, x_test) plt.subplot(1,2,2) plot_bar(10, predictions, y_test) plt.show() plt.imshow(img)
Now let’s see what happens when the prediction is wrong:
plt.figure(figsize=(15,6)) plt.subplot(1,2,1) plot_pred(20, predictions, y_test, x_test) plt.subplot(1,2,2) plot_bar(20, predictions, y_test) plt.show()
The nice thing about Tensorflow and Keras is that you can build your ‘homemade’ CNN, changing the number/type of layers depending on the data available. Then, your model by its own will proceed with backpropagation to tune its parameter and minimize the error term.
Well, the CNN we built here is simple and the amount of data is modest, nevertheless we were able to build a good classifier and we might want to implement it with new layers.
I’d like to conclude this article by sharing a very intuitive way to understand how a CNN actually works, by feeding it by yourself and visualizing the whole process of learning.