Tutorial: Learning a digit classifier with the MNIST dataset¶

Introduction¶

The goal of this tutorial is to learn basic machine learning skills. The goal is to make the best digit classifier you can. The dataset we will work on is old but it is a reference benchmark to evaluate new algorithms (and early concepts).

How well our classification algorithms performs ?

The MNIST handwritten digit database is a collection of 70,000 handwritten digits and their corresponding labels (from 0 to 9). The dataset is split into a Training set (60,000 images) and a Validation set (10,000 images). You will train your model on the Training set and test it on the Test set.

`Who is the best at MNIST? <https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#4d4e495354>`__

Requirements¶

We will need Scikit-Learn and Keras + TensorFlow. The MNIST dataset is downloaded by Tensorflow/Keras.
DO read the Scikit-Learn documentation which is exhaustive and completely awesome.
Scikit_Learn is from INRIA, Keras/TF from Google.

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Using TensorFlow backend.

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
# %matplotlib nbagg
# %matplotlib ipympl
# %matplotlib notebook

images_and_labels = list(zip(x_train, y_train))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=1.6, hspace=.35)
for i, (image, label) in enumerate(images_and_labels[:25]):
    plt.subplot(5, 5, i + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray, interpolation='nearest')
    plt.title('N°%i Label: %i' % (i, label))

../../../../_images/0_ML_Tutorial_MNIST_2_0.png

%%timeit -n 1 -r 1
# Explore the first 5 digits in the training dataset
for i, image in enumerate(x_train[0:5]):
    print('Image n°', i, 'Label:', y_train[i])
    plt.imshow(image, cmap='gray')
    plt.show()

Let’s evaluate three classical Machine Learning methods¶

Support Vector Machine (SVM) -> 1_ML_Tutorial_SVM.ipynb
Neural Networks -> 2_ML_Tutorial_NN.ipynb
Convolutionnal Neural Networks (CNN) -> 3_ML_Tutorial_CNN.ipynb

Some Machine Learning ressources¶