tensorflow学习笔记之(三)—— Mnist For Experts

(首发日期:2018年01月01日23:20:30 更新日期:2018年01月20日06:51:33)


####英文版tensorflow-MNIST For ML Beginner

中文版tensorflow mnist for beginner


tensorflow学习笔记之(二)—— Mnist For Beginner

本文在CSDN blog当中的链接:

1. 初始化库、导入数据集合(包括了训练和测试数据)

"""A deep MNIST classifier using convolutional layers.

See extensive documentation at
# Disable linter warnings to maintain consistency with tutorial.
# pylint: disable=invalid-name
# pylint: disable=g-bad-import-order

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import sys
import tempfile

from tensorflow.examples.tutorials.mnist import input_data
#from tensorflow.python import debug as tfdbg
import tensorflow as tf

FLAGS = None

parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str, default='MNIST_data/', help='Directory for storing input data')
FLAGS, unparsed = parser.parse_known_args()
# Import data
mnist = input_data.read_data_sets(FLAGS.data_dir)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

2. 深度神经网络函数


1. Neural Networks and Deep Learning
2. 卷积神经网络CNN(基本理论)




2.1 卷基层1(“conv1”)

  • reshape:将 x [ − 1 , 784 ] x_{[-1,784]} x[1,784]转成 x i m a g e [ − 1 , 28 , 28 , 1 ] x_{image[-1,28,28,1]} ximage[1,28,28,1],即从一数组[874]转成了一个三维的数组[28,28,1],也就是一个28*28色深为1的图片,$x_{image}$4个参数的意义为[batch, in_height, in_width, in_channels]。
  • 参数生成:这里的“参数”包括了权值 W c o n v 1 [ 5 , 5 , 1 , 32 ] W_{conv1[5,5,1,32]} Wconv1[5,5,1,32]以及偏置量 b c o n v 1 [ 32 ] b_{conv1[32]} bconv1[32]。关于权值,我从以前模式识别中对图像特征提取的角度理解,原本以为是人工选择的具有一定特征的“模板”,也就是经常说的一些检测算子,可是通过阅读代码之后发现不是这回事,这个权值矩阵完全是随机生成的:参看函数 w e i g h t v a r i a b l e ( ) weight_variable() weightvariable()的源码:
def weight_variable(shape):
  """weight_variable generates a weight variable of a given shape."""
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

用"?"看函数“tf.truncated_normal”说明:“Outputs random values from a truncated normal distribution”


  • 卷积:这部分详细分析一下,看看卷积函数里面到底干了些啥。使用?tf.nn.conv2d运行之后看到其内容摘要如下:

    • 函数conv2d是将输入与滤波器进行卷积操作,要求输入滤波器都是四维数据,各张量的shape如下:
      • 输入:[batch, in_height, in_width, in_channels]

      • 滤波器(核):[filter_height, filter_width, in_channels, out_channels]


    • 对输入输出整形:
      • 滤波器张量:从[filter_height, filter_width, in_channels, out_channels]整为[filter_height * filter_width * in_channels, output_channels],也就是说从[5,5,1,32]整为[551,32];
      • 输入张量:从[batch, in_height, in_width, in_channels]整为:[batch, out_height, out_width,
        filter_height * filter_width * in_channels],也就是说从[N,28,28,1]整为[N,28,28,551]
        这样这两个张量点乘( x ⋅ W x \cdot W xW)之后得到的shape: [ N , 28 , 28 , 5 ∗ 5 ∗ 1 ] ⋅ [ 5 ∗ 5 ∗ 1 , 32 ] = [ N , 28 , 28 , 32 ] [N,28,28,5*5*1]\cdot [5*5*1,32]=[N,28,28,32] [N,28,28,551][551,32]=[N,28,28,32]
    • 图像卷积操作:
    • 卷积函数参数详解:
      • 输入:四维,数据类型half或者float32之一,结构详见最后一项参数“data_format”
      • 滤波器(核):四维,数据类型同输入一致,shape[filter_height, filter_width, in_channels, out_channels];
      • strides:滑动窗口针对每一维的步进长度,不知道为啥不用step表示,各维意义也参看最后一项参数“data_format”
      • padding: A string from: "SAME", "VALID".The type of padding algorithm to use.
      • use_cudnn_on_gpu: An optional bool. Defaults to True.开启GPU支持。
      • default format "NHWC" , the data is stored in the order of:[batch, height, width, channels].
        Alternatively, the format could be ***“NCHW”***, the data storage order of:[batch, channels, height, width].



     对于$y=x \cdot w+b$,其中$x=[x_1,x_2,...,x_n],y=[y_1,y_2,...,y_m],b=[b_1,b_2,...,b_m]$

$\left( \begin{array}{ccc}
y_1,y_2,…,y_m\end{array} \right) =
\left( \begin{array}{ccc}
\end{array} \right)\cdot
\left( \begin{array}{ccc}
w_{1,1} & w_{1,2} & … & w_{1,m}\
w_{2,1} & w_{2,2} & … & w_{2,m}\
w_{n,1} & w_{n,2} & … & w_{n,m}\end{array} \right)
\left( \begin{array}{ccc}
\end{array} \right)
$\left( \begin{array}{ccc}
y_1,y_2,…,y_m\end{array} \right) =
\left( \begin{array}{ccc}
\end{array} \right)\cdot
\left( \begin{array}{ccc}
w_{1,1} & w_{1,2} & … & w_{1,m}\
w_{2,1} & w_{2,2} & … & w_{2,m}\
w_{n,1} & w_{n,2} & … & w_{n,m}\
b_{1} & b_{2} & … & b_{m}\end{array} \right)
$ \vec Y =\vec X \cdot \vec {W_B}$
损失的平方为: E 2 = ( Y ⃗ − Y r ⃗ ) 2 E^2=(\vec Y - \vec {Y_r})^2 E2=(Y Yr )2,参数矩阵$ \vec {W_B} 取 什 么 值 的 时 候 , 取什么值的时候, E^2 最 小 ( 其 中 最小(其中 \vec {Y_r} 是 估 计 是估计 \vec Y 的 真 实 值 ) ? 显 然 是 的真实值)?显然是 \vec Y = \vec {Y_r} 的 时 候 的时候 E^2 达 到 极 小 值 , 也 就 是 达到极小值,也就是 \vec {Y_r} = \vec X \cdot \vec {W_B}$ ——(公式1)的时候,那么由于 Y r ⃗ \vec {Y_r} Yr 和$ \vec X$是已知的,所以现在问题就成了如何让 W B ⃗ \vec {W_B} WB 满足(公式1)。梯度下降法!找 E 2 E^2 E2关于 W B ⃗ \vec {W_B} WB 的下降方向。梯度:$dT=\frac{dE^2}{dW_B}=\frac{(\vec {Y_r} - \vec X \cdot \vec {W_B})^2}{dW_B}=2(\vec {Y_r} - \vec X \cdot \vec {W_B})* \vec X $,于是 $\vec {W_B} = \vec {W_B} - dTr=\vec {W_B}-2(\vec {Y_r} - \vec X \cdot \vec {W_B}) \vec X — — ( 公 式 2 ) 其 中 r 是 下 降 速 度 , 可 以 根 据 情 况 人 工 修 正 设 置 , 我 们 目 前 设 置 为 r = 0.01 。 在 上 面 公 式 2 中 , ——(公式2)其中r是下降速度,可以根据情况人工修正设置,我们目前设置为r=0.01。在上面公式2中, 2rr=0.012X,Y_r 以 及 前 一 状 态 的 以及前一状态的 \vec {W_B} 已 知 , 所 以 可 以 求 出 下 一 步 的 已知,所以可以求出下一步的 \vec {W_B} 。 观 众 : “ 然 而 , 你 还 是 没 有 说 清 楚 r e l u 干 了 神 马 ! ” 在 下 : “ 上 面 的 参 考 文 章 也 没 有 说 清 楚 啊 ! 不 过 按 照 我 看 , 。 观众:“然而,你还是没有说清楚relu干了神马!” 在下:“上面的参考文章也没有说清楚啊!不过按照我看, relu\vec {Y_r} = \vec X \cdot \vec {W_B}$ ——(公式1)实际就是对基x进行加权 W B W_B WB来拟合基x长成的空间当中的任意一点,简化到2维就是说,使用一组基 x ⃗ 1 = ( 0 , 1 ) , x ⃗ 2 = ( 1 , 0 ) {\vec x_1 =(0,1),\vec x_2=(1,0)} x 1=0,1x 2=1,0就能通过一组加权 W B {W_B} WB来表示任何一个此2维空间当中的一个向量 ( x , y ) = : {(x,y)=}: (x,y)=:

$\left( \begin{array}{ccc}
x,y\end{array} \right) =
\left( \begin{array}{ccc}
\vec x_1 , \vec x_2
\end{array} \right)\cdot
\left( \begin{array}{ccc}
w_{1,1} & w_{1,2} \
w_{2,1} & w_{2,2} \end{array} \right)=
\left( \begin{array}{ccc}
\vec x_1 , \vec x_2
\end{array} \right)\cdot
\left( \begin{array}{ccc}
x & 0 \
0 & y \end{array} \right)=
x*\vec x_1+y*\vec x_2

在代码当中使用了两个卷积,第一个是对输入的样本图片(reshape之后为[-1,28,28,1])采用32个[5,5,1]的卷积核进行卷积,另一个是对前一卷积层的池化输出(shape为:[-1,14,14,32])采用 64个[5,5,32]的卷积核进行卷积,得到卷积输出shape为:[-1,14,14,64],池化之后为[-1,7,7,64]。


“池化”实际上就是压缩采样,用一个信息单元的信息来代表几个信息单元的信息。池化函数:tf.nn.max_pool(value, ksize, strides, padding, data_format=‘NHWC’, name=None),其中:
- value:[batch, height, width, channels]and tf.float32 例如:[50,28,28,1]
- ksize:输入张量的每一维的窗口(滑动窗口)尺寸,例如:[1,2,2,1]这样一个窗口
- strides: 滑动窗口每一维的步进量,例如:[1,2,2,1]各维度的步进
- padding: string ,‘VALID’ or ‘SAME’
- data_format:A string. ‘NHWC’ and ‘NCHW’ are supported.
- name:…


def deepnn(x):
    """deepnn builds the graph for a deep net for classifying digits.

    x: an input tensor with the dimensions (N_examples, 784), where 784 is the
    number of pixels in a standard MNIST image.

    A tuple (y, keep_prob). y is a tensor of shape (N_examples, 10), with values
    equal to the logits of classifying the digit into one of 10 classes (the
    digits 0-9). keep_prob is a scalar placeholder for the probability of
    # Reshape to use within a convolutional neural net.
    # Last dimension is for "features" - there is only one here, since images are
    # grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.
    with tf.name_scope('reshape'):
          x_image = tf.reshape(x, [-1, 28, 28, 1])       

      # First convolutional layer - maps one grayscale image to 32 feature maps.
    with tf.name_scope('conv1'):
        #weight_variable generates a weight variable of a given shape.
        W_conv1 = weight_variable([5, 5, 1, 32])#随机(截断正太分布)生成W_conv1
        b_conv1 = bias_variable([32])
        h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
       # print(W_conv1.name)

      # Pooling layer - downsamples by 2X.
    with tf.name_scope('pool1'):
        h_pool1 = max_pool_2x2(h_conv1)

      # Second convolutional layer -- maps 32 feature maps to 64.
    with tf.name_scope('conv2'):
        W_conv2 = weight_variable([5, 5, 32, 64])
        b_conv2 = bias_variable([64])
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

      # Second pooling layer.
    with tf.name_scope('pool2'):
        h_pool2 = max_pool_2x2(h_conv2)

      # Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image
      # is down to 7x7x64 feature maps -- maps this to 1024 features.
    with tf.name_scope('fc1'):
        W_fc1 = weight_variable([7 * 7 * 64, 1024])
        b_fc1 = bias_variable([1024])

        h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
        h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

      # Dropout - controls the complexity of the model, prevents co-adaptation of
      # features.
    with tf.name_scope('dropout'):
        keep_prob = tf.placeholder(tf.float32)
        h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

      # Map the 1024 features to 10 classes, one for each digit
    with tf.name_scope('fc2'):
        W_fc2 = weight_variable([1024, 10])
        b_fc2 = bias_variable([10])
        y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
    return y_conv, keep_prob

def conv2d(x, W):
  """conv2d returns a 2d convolution layer with full stride."""
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  """max_pool_2x2 downsamples a feature map by 2X."""
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

def weight_variable(shape):
  """weight_variable generates a weight variable of a given shape."""
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  """bias_variable generates a bias variable of a given shape."""
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.int64, [None])#注意shape
y_conv, keep_prob = deepnn(x)


Created with Raphaël 2.2.0 N=50 x{,784},y_{N} reshape:x_img{N,28,28,1} Random set:W_c1{5,5,1,32},b_c1{32} r_c1{N,28,28,32}=conv1(x_img,W_c1)+b_c1 h_c1{N,28,28,32}=Relu(r_c1) h_p1{N,14,14,32}=pool() Random set:W_c2{5,5,32,64},b_c2{64} r_c2{N,14,14,64}=conv1(h_p1,W_c2)+b_c2 h_c2{N,14,14,64}=Relu(r_c2) h_p2{N,7,7,64}=pool() reshape:h_p2f{N,7×7×64} Random set:W_fc1{7×7×64,1024},b_fc1{1024} r_fc1{N,1024}=h_p2f * W_fc1+b_fc1 h_fc1{N,1024}=Relu(r_fc1) Rndset:W_fc2[1024,10],b_fc2[10] y{N,10}=h_fc1*W_fc2+b_fc2 losst=tf.losses.sparse_softmax_cross_entropy(y_,y) cross_entropy=mean() optimizer.minimize(cross_entropy) <loss_th ? End 修正所有参数W,b yes no



绘制失败,但是在网上的blog当中绘制是成功的:tensorflow学习笔记之(三)—— Mnist For Experts


with tf.Session() as sess:
    batch = mnist.train.next_batch(50)
    print("x= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1]}))
    print("x_shape= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1],keep_prob: 0.5}).shape)
    #x.eval(feed_dict={x: batch[0], y_: batch[1]}).shape

    print("y_= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}))
    print("Y_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}).shape)
(50, 784)


TensorShape([Dimension(None), Dimension(784)])

x=  [[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
x_shape=  (50, 784)


y_=  [4 4 1 5 2 6 3 0 9 5 4 5 3 3 1 4 4 5 6 5 8 2 9 6 7 5 6 7 5 9 4 6 8 9 1 1 0
 9 8 9 6 2 7 8 5 1 3 9 1 9]
Y_shape=  (50,)

mysession = tf.Session()
#mysession = tfdbg.LocalCLIDebugWrapperSession(mysession)
batch = mnist.train.next_batch(50)
print("y_conv= ",mysession.run(y_conv[0], feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}))
y_conv=  [-7.92335463 -4.57033014  7.43492842 -2.24800444  3.5418272   1.91789472
  0.16387397  9.29159927 -3.02358389  4.79558516]
with tf.Session() as sess:
    batch = mnist.train.next_batch(50)    
    print("y_conv= ",sess.run(y_conv[0], feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}))
    #print("y_conv_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}).shape)
    print("keep_prob= ",sess.run(keep_prob, feed_dict={x: batch[0], y_: batch[1],keep_prob:0.5}))

TensorShape([Dimension(None), Dimension(10)])

y_conv=  [ -4.11408997   0.14374971  -0.55855268   1.44443858  -1.96026313
  -3.75602436   5.98966217   0.26240706 -15.8025341   -7.37206697]


keep_prob=  0.5



关于name_scope的使用,参看name与variable scope这个学习笔记。

# Define loss and optimizer

# Build the graph for the deep net

with tf.name_scope('loss'):
    cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y_conv)
cross_entropy = tf.reduce_mean(cross_entropy)

with tf.name_scope('adam_optimizer'):
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

with tf.name_scope('accuracy'):
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), y_)
    correct_prediction = tf.cast(correct_prediction, tf.float32)
accuracy = tf.reduce_mean(correct_prediction)

graph_location = tempfile.mkdtemp()
print('Saving graph to: %s' % graph_location)
train_writer = tf.summary.FileWriter(graph_location)
Saving graph to: /tmp/tmpxvy7zlf5
with tf.Session() as sess:
    batch = mnist.train.next_batch(50)
    print("x= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1]}))
    print("x_shape= ",sess.run(x, feed_dict={x: batch[0], y_: batch[1]}).shape)
    #x.eval(feed_dict={x: batch[0], y_: batch[1]}).shape

    print("y_= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}))
    print("Y_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}).shape)
    #print("y_conv= ",sess.run(y_conv, feed_dict={x: batch[0], y_: batch[1]}))
    #print("y_conv_shape= ",sess.run(y_, feed_dict={x: batch[0], y_: batch[1]}).shape)
   # print("keep_prob= ",sess.run(keep_prob, feed_dict={x: batch[0], y_: batch[1]}))

(50, 784)


TensorShape([Dimension(None), Dimension(784)])

x=  [[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
x_shape=  (50, 784)


y_=  [8 3 0 6 2 5 3 0 3 6 0 6 3 2 5 1 0 9 0 6 0 7 9 2 4 8 1 2 9 3 8 7 2 2 4 6 2
 1 9 1 6 2 8 4 5 7 8 7 1 7]
Y_shape=  (50,)

TensorShape([Dimension(None), Dimension(10)])

  with tf.Session() as sess:
    for i in range(20000):
      batch = mnist.train.next_batch(50)
      if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x: batch[0], y_: batch[1], keep_prob: 1.0})
        print('step %d, training accuracy %g' % (i, train_accuracy))
      train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    print('test accuracy %g' % accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
step 0, training accuracy 0.1
step 100, training accuracy 0.84
step 200, training accuracy 0.94
step 300, training accuracy 0.86
step 400, training accuracy 0.9
step 500, training accuracy 0.98
step 600, training accuracy 0.92
step 700, training accuracy 0.98
step 800, training accuracy 0.98
step 900, training accuracy 0.98
step 1000, training accuracy 1
step 1100, training accuracy 1
step 1200, training accuracy 0.98
step 1300, training accuracy 0.96
step 1400, training accuracy 1
step 1500, training accuracy 0.94
step 1600, training accuracy 0.98
step 1700, training accuracy 0.98
step 1800, training accuracy 0.9
step 1900, training accuracy 0.98
step 2000, training accuracy 1
step 2100, training accuracy 0.98
step 2200, training accuracy 0.96
step 2300, training accuracy 1
step 2400, training accuracy 0.94
step 2500, training accuracy 0.96
step 2600, training accuracy 0.98
step 2700, training accuracy 0.98
step 2800, training accuracy 0.98
step 2900, training accuracy 1
step 3000, training accuracy 1
step 3100, training accuracy 0.96
step 3200, training accuracy 1
step 3300, training accuracy 1
step 3400, training accuracy 0.96
step 3500, training accuracy 1
step 3600, training accuracy 0.96
step 3700, training accuracy 0.98
step 3800, training accuracy 1


