Note

This notebook can be downloaded here: 0_ML_Tutorial_MNIST.ipynb

Tutorial: Learning a digit classifier with the MNIST dataset

Introduction

The goal of this tutorial is to learn basic machine learning skills. The goal is to make the best digit classifier you can. The dataset we will work on is old but it is a reference benchmark to evaluate new algorithms (and early concepts).
How well our classification algorithms performs ?

The MNIST handwritten digit database is a collection of 70,000 handwritten digits and their corresponding labels (from 0 to 9). The dataset is split into a Training set (60,000 images) and a Validation set (10,000 images). You will train your model on the Training set and test it on the Test set.

`Who is the best at MNIST? <https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#4d4e495354>`__

Requirements

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
Using TensorFlow backend.
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
# %matplotlib nbagg
# %matplotlib ipympl
# %matplotlib notebook

images_and_labels = list(zip(x_train, y_train))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=1.6, hspace=.35)
for i, (image, label) in enumerate(images_and_labels[:25]):
    plt.subplot(5, 5, i + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray, interpolation='nearest')
    plt.title('N°%i Label: %i' % (i, label))
../../../../_images/0_ML_Tutorial_MNIST_2_0.png
%%timeit -n 1 -r 1
# Explore the first 5 digits in the training dataset
for i, image in enumerate(x_train[0:5]):
    print('Image n°', i, 'Label:', y_train[i])
    plt.imshow(image, cmap='gray')
    plt.show()

Let’s evaluate three classical Machine Learning methods