Convolutional Neural Networks with TensorFlow

ReadMe.md
# 这是一篇“从无到有”的使用TensorFlow搭建一个卷积神经网络并使用流行的入门数据集进行训练测试的教程。
# 包括了从开始到完成每一行代码的讲解,参照教程提供的代码可直接运行测试,可以说是入门学习TensorFlow架构非常合适的教程了。
# 本来想翻译的,但是篇幅有点长,太费时间,就放英文原版吧,语言很通俗,看着其实不费力。
# 修改了部分引起读者困惑的文字/代码。

教程链接
作者:Aditya Sharma
发表日期:March 10th, 2018

In this tutorial, you’ll learn how to construct and implement Convolutional Neural Networks (CNNs) in Python with the TensorFlow framework.
TensorFlow is a famous deep learning framework. In this blog post, you will learn the basics of this extremely popular Python library and understand how to implement these deep, feed-forward artificial neural networks with it.

To be precise, you’ll will be introduced to the following topics in today’s tutorial:

  • You’ll be first introduced to tensors and how they differ from matrices; Once you understand what tensors are then you’ll be introduced to the Tensorflow Framework, within this you will also see that how even a single line of code is implemented via a computational graph in TensorFlow, then you will learn about some of the the package’s concepts that play a major role in you to do deep learning like constants, variables and placeholders,
  • Then, you’ll will be headed to the most interesting part of this tutorial. That is, the implementation of Convolutional Neural Network: first you will try to understand the data. You’ll use Python and its libraries to load, explore and analyze your data. You’ll also preprocess your data: you’ll learn how to visualize your images as a matrix, reshape your data and rescale the images between 0 and 1 if required.
  • With all of this done, you are ready to construct the deep neural network model: you’ll start off by defining the network parameters, then learn how to create wrappers to increase the simplicity of your code, define weights and biases, model the network, define loss and optimizer nodes. Once you have all this in place you are ready for training and testing your model;
  • After your model’s evaluation, you’ll learn more about overfitting and how you can overcome it by adding a dropout layer. You will then again train the model with dropout layers inserted in the network, evaluate the model on test set and compare the results of both the models; Next, you’ll make predictions on the test data, convert the probabilities into class labels and plot few test samples that your model correctly classified and incorrectly classified. You will visualize the classification report which will have precision, recall, f-1 score of all classes present in the test dataset.

Tensors

In layman’s terms, a tensor is a way of representing the data in deep learning. A tensor can be a 1-dimensional, a 2-dimensional, a 3-dimensional array, etc. You can think of a tensor as a multidimensional array. In machine learning and deep learning you have datasets which are high dimensional, in which each dimension represents a different feature of that dataset.

Consider the following example of a dog versus cat classification problem, where the dataset you’re working with has multiple variety of both cats and dogs images. Now, in order to correctly classify a dog or a cat when given an image, the network has to learn discriminative features like color, face structure, ears, eyes, shape of the tail etc.

These features are incorporated by the tensors.

在这里插入图片描述
Tip: if you want to get to know more about tensors, check out DataCamp’s TensorFlow Tutorial for Beginners.

But how are tensors then any different from matrices? You’ll find out in the next section!

Tensors versus Matrices: Differences
A matrix is a two-dimensional grid of size n×m that contains numbers: you can add and subtract matrices of the same size, multiply one matrix with another as long as the sizes are compatible ((n×m)×(m×p)=n×p), and multiply an entire matrix by a constant.

A vector is a matrix with just one row or column (but see below).

A tensor is often thought of as a generalized matrix. That is, it could be

  • a 1-D matrix, like a vector, which is actually such a tensor,
  • a 3-D matrix (something like a cube of numbers),
  • a 0-D matrix (a single number), or
  • a higher dimensional structure that is harder to visualize.
    The dimension of the tensor is called its rank.

Any rank-2 tensor can be represented as a matrix, but not every matrix is really a rank-2 tensor. The numerical values of a tensor’s matrix representation depend on what transformation rules have been applied to the entire system.

TensorFlow: Constants, Variables and Placeholders

TensorFlow is a framework developed by Google on 9th November 2015. It is written in Python, C++ and Cuda. It supports platforms like Linux, Microsoft Windows, macOS, and Android. TensorFlow provides multiple API’s in Python, C++, Java etc. The most widely used API is Python and you will implementing a convolutional neural network using Python API in this tutorial.

The name TensorFlow is derived from the operations, such as adding or multiplying, that artificial neural networks perform on multidimensional data arrays. These arrays are called tensors in this framework, which is slightly different from what you saw earlier.

So why is there a mention of a flow when you’re talking about operations?

Let’s consider a simple equation and its diagram, represented as a computational graph. Note: don’t worry if you don’t get this equation straight away, this is just to help you to understand how the flow takes place while using the TensorFlow framework.

prediction = tf.nn.softmax(tf.matmul(W,x) + b)

在这里插入图片描述
In TensorFlow, every line of code that you write has to go through a computational graph. As in the above figure, you can see that first W and x get multiplied and then comes b which is added to the output of W and x. After adding the output of W and x with b, a softmax function is applied and a final output is generated.

You’ll find that, when you’re working with TensorFlow, constants, variables and placeholders come handy to define the input data, class labels, weights and biases.

  • Constants takes no input, you use them to store constant values. They produce a constant output that it stores.
import tensorflow as tf
a = tf.constant(2.0)
b = tf.constant(3.0)
c = a * b

Here, nodes a and b are constants that store values 2.0 and 3.0. Node c stores the operation that multiplies the nodes a and b, respectively. When you initialize a session and run c, you’ll see that the output that you get back is 6.0:

sess = tf.Session()
sess.run(c)
6.0
  • Placeholders allow you to feed input on the run. Because of this flexibility, placeholders are used which allows your computational graph to take inputs as parameters. Defining a node as a placeholder assures that node, that it is expected to receive a value later or during runtime. Here, “runtime” means that the input is fed to the placeholder when you run your computational graph.
# Creating placeholders
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)

# Assigning addition operation w.r.t. a and b to node add
add = a + b

# Create session object
sess = tf.Session()

# Executing add by passing the values [1, 3] [2, 4] for a and b respectively
output = sess.run(add, {a: [1,3], b: [2, 4]})
print('Adding a and b:', output)
('Adding a and b:', array([ 3.,  7.], dtype=float32))

In this case, you have explicitly provided the data type with tf.float32. Note that this data type is therefore a single precision, which is stored in 32 bits form. However, in cases where you do not do this, just like in the first example, TensorFlow will infer the type of the constant/variable from the initialized value.

Python Deep Learning

  • Variables allow you to modify the graph such that it can produce new outputs with respect to the same inputs. A variable allows you to add such parameters or node to the graph that are trainable. That is, the value can be modified over the period of a time.
#Variables are defined by providing their initial value and type
variable = tf.Variable([0.9,0.7], dtype = tf.float32)

#variable must be initialized before a graph is used for the first time. 
init = tf.global_variables_initializer()
sess.run(init)

Constants are initialized when you call tf.constant, and their value can never change. But, variables are not initialized when you call tf.Variable. To initialize all the variables in TensorFlow, you need to explicitly call the global variable intializer global_variables_initializer(), which initializes all the existing variables in your TensorFlow code, as you can see in the above code chunk.

Variables survive across multiple executions of a graph unlike normal tensors that are only instantiated when a graph is run and are immediately deleted afterwards.

In this section, you have seen that placeholders are used for holding the input data and class labels, whereas variables are used for the purpose of weights and biases. Don’t worry if you have still not been able to develop proper intuition about how a computational graph works, for what placeholders and variables are usually used for in deep learning. You will be address all these topics later on in this tutorial.

Convolutional Neural Network (CNN) in TensorFlow

Fashion-MNIST Dataset

Before you go ahead and load in the data, it’s good to take a look at what you’ll exactly be working with! The Fashion-MNIST dataset contains Zalando’s article images, with 28x28 grayscale images of 65,000 fashion products from 10 categories, and 6,500 images per category. The training set has 55,000 images, and the test set has 10,000 images. You can double check this later when you have loaded in your data! 😉

Fashion-MNIST is similar to the MNIST dataset that you might already know, which you use to classify handwritten digits. That means that the image dimensions, training and test splits are similar.

Tip: if you want to learn how to implement an Multi-Layer Perceptron (MLP) for classification tasks with this latter dataset, go to this tutorial, or if you want to learn about convolutional neural networks and its implementation in a Keras framework, check out this tutorial.

You can find the Fashion-MNIST dataset here. Unlike the Keras or Scikit-Learn packages, TensorFlow has no predefined module to load the Fashion MNIST dataset, though by default it has MNIST dataset. To load the data, you first need to download the data from the above link and then structure the data in a particular folder format as shown below to be able to work with it. Otherwise, Tensorflow will download and use the original MNIST.

Load the data

You first start with importing all the required modules like numpy, matplotlib and most importantly Tensorflow.

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# matplotlib inline
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0" #for training on gpu

After importing all the modules you will now learn how you can load data in TensorFlow, which should be pretty straightforward. The only thing that you should take into account is the one_hot=True argument, which you’ll also find in the line of code below: it converts the categorical class labels to binary vectors.

In one-hot encoding, you convert the categorical data into a vector of numbers. You do this because machine learning algorithms can’t work with categorical data directly. Instead, you generate one boolean column for each category or class. Only one of these columns could take on the value 1 for each sample. That explains the term “one-hot encoding”.

But what does such a one-hot encoded data column look like?

For your problem statement, the one hot encoding will be a row vector, and for each image, it will have a dimension of 1 x 10. It’s important to note here that the vector consists of all zeros except for the class that it represents. There, you’ll find a 1. For example, the ankle boot image that you plotted above has a label of 9, so for all the ankle boot images, the one hot encoding vector would be [0 0 0 0 0 0 0 0 0 1].

Now that all of this is clear, it’s time to import the data!

data = input_data.read_data_sets('data/fashion',one_hot=True)
Extracting data/fashion/train-images-idx3-ubyte.gz
Extracting data/fashion/train-labels-idx1-ubyte.gz
Extracting data/fashion/t10k-images-idx3-ubyte.gz
Extracting data/fashion/t10k-labels-idx1-ubyte.gz

Once you have the training and testing data loaded, you’re all set to analyze the data in order to get some intuition about the dataset that you are going to work with for this tutorial!

Analyze the Data

Before you start any heavy lifting, it’s always a good idea to check out what the images in the dataset look like. First, you can take a programmatical approach and check out their dimensions. Also, take into account that if you want to explore your images, these have already been rescaled between 0 and 1. That means that you would not need to rescale the image pixels again!

# Shapes of training set
print("Training set (images) shape: {shape}".format(shape=data.train.images.shape))
print("Training set (labels) shape: {shape}".format(shape=data.train.labels.shape))

# Shapes of test set
print("Test set (images) shape: {shape}".format(shape=data.test.images.shape))
print("Test set (labels) shape: {shape}".format(shape=data.test.labels.shape))
Training set (images) shape: (55000, 784)
Training set (labels) shape: (55000, 10)
Test set (images) shape: (10000, 784)
Test set (labels) shape: (10000, 10)

From the above output, you can see that the training data has a shape of 55000 x 784: there are 55,000 training samples each of 784-dimensional vector. Similarly, the test data has a shape of 10000 x 784, since there are 10,000 testing samples.

The 784 dimensional vector is nothing but a 28 x 28 dimensional matrix. That’s why you will be reshaping each training and testing sample from a 784 dimensional vector to a 28 x 28 x 1 dimensional matrix in order to feed the samples in to the CNN model.

For simplicity, let’s create a dictionary that will have class names with their corresponding categorical class labels.

# Create dictionary of target classes
label_dict = {
 0: 'T-shirt/top',
 1: 'Trouser',
 2: 'Pullover',
 3: 'Dress',
 4: 'Coat',
 5: 'Sandal',
 6: 'Shirt',
 7: 'Sneaker',
 8: 'Bag',
 9: 'Ankle boot',
}

Also, let’s take a look at the images in your dataset:

plt.figure(figsize=[5,5])

# Display the first image in training data
plt.subplot(121)
curr_img = np.reshape(data.train.images[0], (28,28))
curr_lbl = np.argmax(data.train.labels[0,:])
plt.imshow(curr_img, cmap='gray')
plt.title("(Label: " + str(label_dict[curr_lbl]) + ")")

# Display the first image in testing data
plt.subplot(122)
curr_img = np.reshape(data.test.images[0], (28,28))
curr_lbl = np.argmax(data.test.labels[0,:])
plt.imshow(curr_img, cmap='gray')
plt.title("(Label: " + str(label_dict[curr_lbl]) + ")")
plt.show()
<matplotlib.text.Text at 0x7f3d17e38cd0>

在这里插入图片描述

The output of above two plots are one of the sample images from both training and testing data, and these images are assigned a class label of 4 (Coat) and 9 (Ankle boot). Similarly, other fashion products will have different labels, but similar products will have same labels. This means that all the 6,500 ankle boot images will have a class label of 9.

Data Preprocessing

The images are of size 28 x 28 (or a 784-dimensional vector).

The images are already rescaled between 0 and 1 so you don’t need to rescale them again, but to be sure let’s visualize an image from training dataset as a matrix. Along with that let’s also print the maximum and minimum value of the matrix.

print (data.train.images[0])
array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.00784314, 0.0509804 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.00784314, 0.00392157, 0.        , 0.        ,
       0.5137255 , 0.92549026, 0.909804  , 0.87843144, 0.2901961 ,
       0.        , 0.        , 0.00392157, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.00392157, 0.        ,
       0.        , 0.        , 0.41960788, 0.9176471 , 0.87843144,
       0.8470589 , 0.8980393 , 0.8980393 , 0.21568629, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.00784314, 0.        , 0.        , 0.36078432, 0.8000001 ,
       0.8352942 , 0.8431373 , 0.882353  , 0.8470589 , 0.9215687 ,
       0.80392164, 0.8941177 , 0.7019608 , 0.2509804 , 0.        ,
       0.        , 0.00784314, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.00392157, 0.        , 0.        ,
       0.75294125, 0.8980393 , 0.854902  , 0.8470589 , 0.78823537,
       0.90196085, 1.        , 0.882353  , 0.8196079 , 0.8352942 ,
       0.8431373 , 0.89019614, 0.4901961 , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.01568628, 0.        , 0.09411766, 0.909804  , 0.8078432 ,
       0.8313726 , 0.8941177 , 0.8235295 , 0.8000001 , 0.86666673,
       0.76470596, 0.85098046, 0.8470589 , 0.8078432 , 0.8470589 ,
       0.8000001 , 0.        , 0.        , 0.00784314, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.01176471, 0.        ,
       0.3921569 , 0.93725497, 0.85098046, 0.8117648 , 0.86274517,
       0.87843144, 0.83921576, 0.8431373 , 0.8313726 , 0.8588236 ,
       0.8196079 , 0.8352942 , 0.8313726 , 0.90196085, 0.15686275,
       0.        , 0.01176471, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.6745098 , 0.9333334 ,
       0.86666673, 0.882353  , 0.854902  , 0.86274517, 0.86666673,
       0.91372555, 0.87843144, 0.8235295 , 0.8431373 , 0.86666673,
       0.83921576, 0.92549026, 0.40784317, 0.        , 0.00784314,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.86274517, 0.9215687 , 0.87843144, 0.882353  ,
       0.8705883 , 0.854902  , 0.85098046, 0.7843138 , 0.8745099 ,
       0.8431373 , 0.8588236 , 0.8705883 , 0.85098046, 0.91372555,
       0.6       , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.82745105,
       0.90196085, 0.8941177 , 0.8862746 , 0.882353  , 0.86666673,
       0.8705883 , 0.85098046, 0.83921576, 0.86274517, 0.8588236 ,
       0.8470589 , 0.8588236 , 0.8980393 , 0.7843138 , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.01568628, 0.8941177 , 0.8862746 , 0.90196085,
       0.882353  , 0.87843144, 0.882353  , 0.8745099 , 0.8352942 ,
       0.8588236 , 0.86666673, 0.8588236 , 0.854902  , 0.8705883 ,
       0.8862746 , 0.9176471 , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.227451  ,
       0.93725497, 0.87843144, 0.91372555, 0.882353  , 0.8745099 ,
       0.8745099 , 0.86666673, 0.83921576, 0.8745099 , 0.8588236 ,
       0.85098046, 0.854902  , 0.86274517, 0.86666673, 0.8431373 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.37254903, 0.9568628 , 0.8705883 ,
       0.9058824 , 0.8862746 , 0.8745099 , 0.87843144, 0.87843144,
       0.85098046, 0.86274517, 0.854902  , 0.8588236 , 0.86666673,
       0.8588236 , 0.85098046, 0.89019614, 0.14901961, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.52156866, 0.9490197 , 0.8705883 , 0.9490197 , 0.89019614,
       0.87843144, 0.8862746 , 0.89019614, 0.83921576, 0.86666673,
       0.86274517, 0.8588236 , 0.8705883 , 0.909804  , 0.83921576,
       0.9215687 , 0.27450982, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.70980394, 0.9294118 ,
       0.87843144, 0.8745099 , 0.909804  , 0.8745099 , 0.882353  ,
       0.89019614, 0.85098046, 0.8745099 , 0.8588236 , 0.8588236 ,
       0.86666673, 0.8431373 , 0.8431373 , 0.92549026, 0.42352945,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.854902  , 0.91372555, 0.90196085, 0.6431373 ,
       0.94117653, 0.8745099 , 0.882353  , 0.8862746 , 0.854902  ,
       0.8745099 , 0.8470589 , 0.86666673, 0.86274517, 0.61960787,
       0.86274517, 0.8980393 , 0.62352943, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.95294124,
       0.909804  , 0.8941177 , 0.49411768, 0.9843138 , 0.87843144,
       0.882353  , 0.90196085, 0.8862746 , 0.8745099 , 0.854902  ,
       0.86274517, 0.8980393 , 0.47450984, 0.91372555, 0.8941177 ,
       0.7607844 , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.8588236 , 0.9058824 , 0.8000001 ,
       0.427451  , 1.        , 0.8588236 , 0.89019614, 0.8862746 ,
       0.7803922 , 0.882353  , 0.8745099 , 0.8431373 , 0.9450981 ,
       0.36078432, 0.8980393 , 0.882353  , 0.8352942 , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.01960784,
       0.8941177 , 0.90196085, 0.7372549 , 0.4901961 , 1.        ,
       0.85098046, 0.8862746 , 0.90196085, 0.8352942 , 0.882353  ,
       0.8705883 , 0.83921576, 0.9921569 , 0.36862746, 0.8588236 ,
       0.87843144, 0.9294118 , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.12156864, 0.91372555, 0.91372555,
       0.68235296, 0.5647059 , 1.        , 0.8470589 , 0.87843144,
       0.91372555, 0.8705883 , 0.882353  , 0.87843144, 0.8431373 ,
       0.9960785 , 0.41960788, 0.8196079 , 0.8705883 , 0.8431373 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.3529412 , 0.9058824 , 0.909804  , 0.63529414, 0.59607846,
       1.        , 0.854902  , 0.882353  , 0.91372555, 0.854902  ,
       0.8745099 , 0.87843144, 0.8352942 , 1.        , 0.43529415,
       0.7568628 , 0.87843144, 0.86666673, 0.19607845, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.6784314 , 0.9450981 ,
       0.93725497, 0.6431373 , 0.61960787, 0.9960785 , 0.86274517,
       0.882353  , 0.9176471 , 0.85098046, 0.8705883 , 0.8705883 ,
       0.8352942 , 0.9960785 , 0.45882356, 0.7843138 , 0.8941177 ,
       0.91372555, 0.65882355, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.4431373 , 0.82745105, 1.        , 0.6313726 ,
       0.6862745 , 0.9960785 , 0.8588236 , 0.8941177 , 0.9176471 ,
       0.86666673, 0.8745099 , 0.87843144, 0.8352942 , 0.9960785 ,
       0.5137255 , 0.7960785 , 0.82745105, 0.8000001 , 0.18431373,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.8196079 , 0.9215687 ,
       0.8588236 , 0.8941177 , 0.9176471 , 0.86666673, 0.87843144,
       0.8745099 , 0.8470589 , 0.9960785 , 0.5882353 , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.87843144, 0.9176471 , 0.86666673, 0.8941177 ,
       0.9176471 , 0.86666673, 0.8705883 , 0.8745099 , 0.86274517,
       0.9333334 , 0.6862745 , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.00784314, 0.        , 0.        , 0.91372555,
       0.90196085, 0.8745099 , 0.882353  , 0.909804  , 0.86274517,
       0.8705883 , 0.87843144, 0.86274517, 0.9215687 , 0.72156864,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.00392157,
       0.        , 0.        , 1.        , 0.9450981 , 0.8980393 ,
       0.9333334 , 0.93725497, 0.882353  , 0.9058824 , 0.92549026,
       0.8941177 , 0.9725491 , 0.86666673, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.37647063, 0.6745098 , 0.7686275 , 0.81568635, 0.8705883 ,
       0.85098046, 0.8196079 , 0.7843138 , 0.75294125, 0.64705884,
       0.26666668, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        ], dtype=float32)
print (np.max(data.train.images[0]))
1.0
print (np.min(data.train.images[0]))
0.0

Let us reshape the images so that it’s of size 28 x 28 x 1, and feed this as an input to the network.

# Reshape training and testing image
train_X = data.train.images.reshape(-1, 28, 28, 1)
test_X = data.test.images.reshape(-1,28,28,1)
print (train_X.shape, test_X.shape)
((55000, 28, 28, 1), (10000, 28, 28, 1))

You need not reshape the labels since they already have the correct dimensions, but let us put the training and testing labels in separate variables and also print their respective shapes just be on the safer side.

train_y = data.train.labels
test_y = data.test.labels
print (train_y.shape, test_y.shape)
((55000, 10), (10000, 10))

The Deep Neural Network

You’ll use three convolutional layers:

  • The first layer will have 32-3 x 3 filters,
  • The second layer will have 64-3 x 3 filters and
  • The third layer will have 128-3 x 3 filters.
    In addition, there are three max-pooling layers each of size 2 x 2.

在这里插入图片描述

You start off with defining the training iterations training_iters, the learning rate learning_rate and the batch size batch_size. Keep in mind that all these are hyperparameters and that these don’t have fixed values, as these differ for every problem statement.

Nevertheless, here’s what you usually can expect:

  • Training iterations indicate the number of times you train your network,
  • It is a good practice to use a learning rate of 1e-3, learning rate is a factor that is multiplied with the weights based on which the weights get updated and this indeed helps in reducing the cost/loss/cross entropy and ultimately in converging or reaching the local optima. The learning rate should neither be too high or too low it should be a balanced rate and
  • The batch size means that your training images will be divided in a fixed batch size and at every batch it will take a fixed number of images and train them. It’s recommended to use a batch size in the power of 2, since the number of physical processor is often a power of 2, using a number of virtual processor different from a power of 2 leads to poor performance. Also, taking a very large batch size can lead to memory errors so you have to make sure that the machine you run your code on has sufficient RAM to handle specified batch size.
training_iters = 20
learning_rate = 0.001 
batch_size = 128

Network Parameters

Next, you need to define the network parameters. Firstly, you define the number of inputs. This is 784 since the image is initially loaded as a 784-dimensional vector. Later, you will see that how you will reshape the 784-dimensional vector to a 28 x 28 x 1 matrix. Secondly, you’ll also define the number of classes, which is nothing else than the number of class labels.

# MNIST data input (img shape: 28*28)
n_input = 28

# MNIST total classes (0-9 digits)
n_classes = 10

Now is the time to use those placeholders, about which you read previously in this tutorial. You will define an input placeholder x, which will have a dimension of None x 784 and the output placeholder with a dimension of None x 10. To reiterate, placeholders allow you to do operations and build your computation graph without feeding in data.

Similarly, y will hold the label of the training images in form matrix which will be a None*10 matrix.

The row dimension is None. That’s because you have defined batch_size, which tells placeholders that they will receive this dimension at the time when you will feed in the data to them. Since you set the batch size to 128, this will be the row dimension of the placeholders.

#both placeholders are of type float
x = tf.placeholder("float", [None, 28,28,1])
y = tf.placeholder("float", [None, n_classes])

Creating wrappers for simplicity

In your network architecture model, you will have multiple convolution and max-pooling layers. In such cases, it’s always a better idea to define convolution and max-pooling functions, so that you can call them as many times you want to use them in your network.

  • In the conv2d() function you pass 4 arguments: input x, weights W, bias b and strides. This last argument is by default set to 1, but you can always play with it to see how the network performs. The first and last stride must always be 1, because the first is for the image-number and the last is for the input-channel (since the image is a gray-scale image which has only one channel). After applying the convolution, you will add bias and apply an activation function that is called Rectified Linear Unit (ReLU).
  • The max-pooling function is simple: it has the input x and a kernel size k, which is set to be 2. This means that the max-pooling filter will be a square matrix with dimensions 2 x 2 and the stride by which the filter will move in is also 2.

You will padding equal to same which ensures that while performing the convolution operations, the boundary pixels of the image are not left out, so padding equal to same will basically adds zeros at the boundaries of the input and allow the convolution filter to access the boundary pixels as well.

Similarly, in max-pooling operation padding equal to same will add zeros. Later, when you will define the weights and the biases you will notice that an input of size 28 x 28 is downsampled to 4 x 4 after applying three max-pooling layers.

def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x) 

def maxpool2d(x, k=2):
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding='SAME')

After you have defined the conv2d and maxpool2d wrappers, now you can now define your weights and biases variables. So, let’s get started!

But first, let’s understand each weight and bias parameter step by step. You will create two dictionaries, one for weight and the second for the bias parameter.

  • If you can recall from the above figure that the first convolution layer has 32-3x3 filters, so the first key (wc1) in the weight dictionary has an argument shape that takes a tuple with 4 values: the first and are the filter size, while the third is the number of channels in the input image and the last represents the number of convolution filters you want in the first convolution layer. The first key in biases dictionary, bc1, will have 32 bias parameters.

  • Similarly, the second key (wc2) of the weight dictionary has a shape parameter that will take a tuple with 4 values: the first and second again refer to the filter size, and the third represents the number of channels from the previous output. Since you pass 32 convolution filters on the input image, you will have 32 channels as an output from the first convolution layer operation. The last represents the number of filters you want in the second convolution filter. Note that the second key in biases dictionary, bc2, will have 64 parameters.

You will do the same for the third convolution layer.

  • Now, it’s important to understand the fourth key (wd1). After applying 3 convolution and max-pooling operations, you are downsampling the input image from 28 x 28 x 1 to 4 x 4 x 1 and now you need to flatten this downsampled output to feed this as input to the fully connected layer. That’s why you do the multiplication operation 44128 44128 44128, which is the output of the previous layer or number of channels that are outputted by the convolution layer 3. The second element of the tuple that you pass to shape has number of neurons that you want in the fully connected layer. Similarly, in biases dictionary, the fourth key bd1 has 128 parameters.
    You will follow the same logic for the last fully connected layer, in which the number of neurons will be equivalent to the number of classes.
weights = {
    'wc1': tf.get_variable('W0', shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()), 
    'wc2': tf.get_variable('W1', shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()), 
    'wc3': tf.get_variable('W2', shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()), 
    'wd1': tf.get_variable('W3', shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()), 
    'out': tf.get_variable('W6', shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()), 
}
biases = {
    'bc1': tf.get_variable('B0', shape=(32), initializer=tf.contrib.layers.xavier_initializer()),
    'bc2': tf.get_variable('B1', shape=(64), initializer=tf.contrib.layers.xavier_initializer()),
    'bc3': tf.get_variable('B2', shape=(128), initializer=tf.contrib.layers.xavier_initializer()),
    'bd1': tf.get_variable('B3', shape=(128), initializer=tf.contrib.layers.xavier_initializer()),
    'out': tf.get_variable('B4', shape=(10), initializer=tf.contrib.layers.xavier_initializer()),
}

Now, it’s time to define the network architecture! Unfortunately, this is not as simple as you do it in the Keras framework!

The conv_net() function takes 3 arguments as an input: the input x and the weights and biases dictionaries. Again, let’s go through the construction of the network step by step:

  • Firstly, you reshape the 784-dimensional input vector to a 28 x 28 x 1 matrix. As you had seen earlier, the images are loaded as a 784-dimensional vector but you will feed the input to your model as a matrix of size 28 x 28 x 1. The -1 in the reshape() function means that it will infer the first dimension on its own but the rest of the dimension are fixed, that is, 28 x 28 x 1.
  • Next, as shown in the figure of the architecture of the model, you will define conv1 which takes input as an image, weights wc1 and biases bc1. Next, you apply max-pooling on the output of conv1 and you will basically perform a process analogous to this until conv3.
  • Since your task is to classify, given an image it belongs to which class label. So, after you pass through all the convolution and max-pooling layers, you will flatten the output of conv3. Next, you’ll connect the flattened conv3 neurons with each and every neuron in the next layer. Then you will apply activation function on the output of the fully connected layer fc1.
  • Finally, in the last layer, you will have 10 neurons since you have to classify 10 labels. That means that you will connect all the neurons of fc1 in the output layer with 10 neurons in the last layer.
def conv_net(x, weights, biases):  

    # here we call the conv2d function we had defined above and pass the input image x, weights wc1 and bias bc1.
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 14*14 matrix.
    conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    # here we call the conv2d function we had defined above and pass the input image x, weights wc2 and bias bc2.
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 7*7 matrix.
    conv2 = maxpool2d(conv2, k=2)

    conv3 = conv2d(conv2, weights['wc3'], biases['bc3'])
    # Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 4*4.
    conv3 = maxpool2d(conv3, k=2)


    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    fc1 = tf.reshape(conv3, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Output, class prediction
    # finally we multiply the fully connected layer with the weights and add a bias term. 
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

Loss and Optimizer Nodes

You will start with constructing a model and call the conv_net() function by passing in input x, weights and biases. Since this is a multi-class classification problem, you will use softmax activation on the output layer. This will give you probabilities for each class label. The loss function you use is cross entropy.

The reason you use cross entropy as a loss function is because the cross-entropy function’s value is always positive, and tends toward zero as the neuron gets better at computing the desired output, y, for all training inputs, x. These are both properties you would intuitively expect for a cost function. It avoids the problem of learning slowing down which means that if the weights and biases are initialized in a wrong fashion even then it helps in recovering faster and does not hamper much the training phase.

In TensorFlow, you define both the activation and the cross entropy loss functions in one line. You pass two parameters which are the predicted output and the ground truth label y. You will then take the mean (reduce_mean) over all the batches to get a single loss/cost value.

Next, you define one of the most popular optimization algorithms: the Adam optimizer. You can read more about the optimizer from here and you specify the learning rate with explicitly stating minimize cost that you had calculated in the previous step.

pred = conv_net(x, weights, biases)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

Evaluate Model Node

To test your model, let’s define two more nodes: correct_prediction and accuracy. It will evaluate your model after every training iteration which will help you to keep track of the performance of your model. Since after every iteration the model is tested on the 10,000 testing images, it will not have seen in the training phase.

You can always save the graph and run the testing part later as well. But for now, you will test within the session.

#Here you check whether the index of the maximum value of the predicted image is equal to the actual labelled image. and both will be a column vector.
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))

#calculate accuracy across all the given images and average them out. 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Remember that your weights and biases are variables and that you have to initialize them before you can make use of them. So let’s do that with the following line of code:

# Initializing the variables
init = tf.global_variables_initializer()

Training and Testing the Model

When you train and test your model in TensorFlow, you go through the following steps:

  • You start off with launching the graph. This is a class that runs all the TensorFlow operations and launches the graph in a session. All the operations have to be within the indentation.
  • Then, you run the session, which will execute the variables that were initialized in the previous step and evaluates the tensor.
  • Next, you define a for loop that runs for the number of training iterations you had specified in the beginning. Right after that, you’ll initiate a second for loop, which is for the number of batches that you will have based on the batch size you chose, so you divide the total number of images by the batch size.
  • You will then input the images based on the batch size you pass in batch_x and their respective labels in batch_y.
  • Now is the most important step. Just like you ran the initializer after creating the graph, now you feed the placeholders x and y the actual data in a dictionary and run the session by passing the cost and the accuracy that you had defined earlier. It returns the loss (cost) and accuracy.
  • You can print the loss and training accuracy after each epoch (training iteration) is completed.

After each training iteration is completed, you run only the accuracy by passing all the 10000 test images and labels. This will give you an idea of how accurately your model is performing while it is training.

It’s usually recommended to do the testing once your model is trained completely and validate only while it is in training phase after each epoch. However, let’s stick with this approach for now.

with tf.Session() as sess:
    sess.run(init) 
    train_loss = []
    test_loss = []
    train_accuracy = []
    test_accuracy = []
    summary_writer = tf.summary.FileWriter('./Output', sess.graph)
    for i in range(training_iters):
        for batch in range(len(train_X)//batch_size):
            batch_x = train_X[batch*batch_size:min((batch+1)*batch_size,len(train_X))]
            batch_y = train_y[batch*batch_size:min((batch+1)*batch_size,len(train_y))]    
            # Run optimization op (backprop).
                # Calculate batch loss and accuracy
            opt = sess.run(optimizer, feed_dict={x: batch_x,
                                                              y: batch_y})
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y})
        print("Iter " + str(i) + ", Loss= " + \
                      "{:.6f}".format(loss) + ", Training Accuracy= " + \
                      "{:.5f}".format(acc))
        print("Optimization Finished!")

        # Calculate accuracy for all 10000 mnist test images
        test_acc,valid_loss = sess.run([accuracy,cost], feed_dict={x: test_X, y: test_y})
        train_loss.append(loss)
        test_loss.append(valid_loss)
        train_accuracy.append(acc)
        test_accuracy.append(test_acc)
        print("Testing Accuracy:","{:.5f}".format(test_acc))
    summary_writer.close()
Iter 0, Loss= 0.338081, Training Accuracy= 0.87500
Optimization Finished!
('Testing Accuracy:', '0.83890')
Iter 1, Loss= 0.210727, Training Accuracy= 0.91406
Optimization Finished!
('Testing Accuracy:', '0.87810')
Iter 2, Loss= 0.169724, Training Accuracy= 0.95312
Optimization Finished!
('Testing Accuracy:', '0.89260')
Iter 3, Loss= 0.154453, Training Accuracy= 0.93750
Optimization Finished!
('Testing Accuracy:', '0.89600')
Iter 4, Loss= 0.143760, Training Accuracy= 0.93750
Optimization Finished!
('Testing Accuracy:', '0.89610')
Iter 5, Loss= 0.142700, Training Accuracy= 0.93750
Optimization Finished!
('Testing Accuracy:', '0.89680')
Iter 6, Loss= 0.114542, Training Accuracy= 0.94531
Optimization Finished!
('Testing Accuracy:', '0.90190')
Iter 7, Loss= 0.104471, Training Accuracy= 0.94531
Optimization Finished!
('Testing Accuracy:', '0.90100')
Iter 8, Loss= 0.089115, Training Accuracy= 0.96094
Optimization Finished!
('Testing Accuracy:', '0.90360')
Iter 9, Loss= 0.090392, Training Accuracy= 0.96094
Optimization Finished!
('Testing Accuracy:', '0.90420')
Iter 10, Loss= 0.066802, Training Accuracy= 0.98438
Optimization Finished!
('Testing Accuracy:', '0.89960')
Iter 11, Loss= 0.062734, Training Accuracy= 0.98438
Optimization Finished!
('Testing Accuracy:', '0.89870')

......

Iter 196, Loss= 0.000044, Training Accuracy= 1.00000
Optimization Finished!
('Testing Accuracy:', '0.91750')
Iter 197, Loss= 0.000633, Training Accuracy= 1.00000
Optimization Finished!
('Testing Accuracy:', '0.91110')
Iter 198, Loss= 0.000028, Training Accuracy= 1.00000
Optimization Finished!
('Testing Accuracy:', '0.91830')
Iter 199, Loss= 0.000206, Training Accuracy= 1.00000
Optimization Finished!
('Testing Accuracy:', '0.91870')

The test accuracy looks impressive. It turns out that your classifier does better than the benchmark that was reported here, which is an SVM classifier with mean accuracy of 0.897. Also, the model does well compared to some of the deep learning models mentioned on the GitHub profile of the creators of fashion-MNIST dataset.

However, you saw that the model looked like it was overfitting since the training accuracy is more than the testing accuracy. Are these results really all that good?

Let’s put your model evaluation into perspective and plot the accuracy and loss plots between training and validation data:

plt.plot(range(len(train_loss)), train_loss, 'b', label='Training loss')
plt.plot(range(len(train_loss)), test_loss, 'r', label='Test loss')
plt.title('Training and Test loss')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
<matplotlib.figure.Figure at 0x7feac8194250>

在这里插入图片描述

plt.plot(range(len(train_loss)), train_accuracy, 'b', label='Training Accuracy')
plt.plot(range(len(train_loss)), test_accuracy, 'r', label='Test Accuracy')
plt.title('Training and Test Accuracy')
plt.xlabel('Epochs ',fontsize=16)
plt.ylabel('Loss',fontsize=16)
plt.legend()
plt.figure()
plt.show()
<matplotlib.figure.Figure at 0x7feac80419d0>

在这里插入图片描述

From the above two plots, you can see that the test accuracy almost became stagnant after 50-60 epochs and rarely increased at certain epochs. In the beginning, the validation accuracy was linearly increasing with loss, but then it did not increase much.

The validation loss shows that this is the sign of overfitting, similar to test accuracy it linearly decreased but after 25-30 epochs, it started to increase. This means that the model tried to memorize the data and succeeded.

This was it for this tutorial, but there is a task for you all:

  • Your task is to reduce the overfitting of the above model, by introducing dropout technique. For simplicity, you may like to follow along with the tutorial Convolutional Neural Networks in Python with Keras, even though it is in keras, but still the accuracy and loss heuristics are pretty much the same. So, following along with this tutorial will help you to add dropout layers in your current model. Since, both of the tutorial have exactly similar architecture.
  • Secondly, try to improve the testing accuracy, may be by deepening the network a bit, or adding learning rate decay for faster convergence, or try playing with the optimizer and so on!

Go Further and Master Deep Learning with TensorFlow!

This tutorial was good start to understanding how tensorFlow works underneath the hood along with an implementation of convolutional neural networks in Python. If you were able to follow along easily or even with little more efforts, well done! Try doing some experiments maybe with same model architecture but using different types of public datasets available. You could also try playing with different weight intializers, may be deepen the network architecture, change learning rate etc. and see how your network performs by changing these parameters. But try changing them one at a time only then you will get more intuition about these parameters.

There is still a lot to cover, so why not take DataCamp’s Deep Learning in Python course? In the meantime, also make sure to check out the TensorFlow documentation, if you haven’t done so already. You will find more examples and information on all functions, arguments, more layers, etc. It will undoubtedly be an indispensable resource when you’re learning how to work with neural networks in Python!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值