TensorFlow学习笔记(1)对多层卷积网络识别MNIST(MNIST进阶)的理解

这几天在学TensorFlow,从mnist开始,有不少迷惑的地方,先把理解记下来
附上完整代码在这里

1. Weight Initialization

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

这里 truncated_normal方法的定义为

def truncated_normal(shape,
                     mean=0.0,
                     stddev=1.0,
                     dtype=dtypes.float32,
                     seed=None,
                     name=None):
  """Outputs random values from a truncated normal distribution.

  The generated values follow a normal distribution with specified mean and  standard deviation, except that values whose magnitude is more than 2 standard  deviations from the mean are dropped and re-picked.

  Args:
    shape: A 1-D integer Tensor or Python array. The shape of the output tensor.
    mean: A 0-D Tensor or Python value of type `dtype`. The mean of the truncated normal distribution.
    stddev: A 0-D Tensor or Python value of type `dtype`. The standard deviation of the truncated normal distribution.
    dtype: The type of the output.
    seed: A Python integer. Used to create a random seed for the distribution.
      See
      @{tf.set_random_seed}
      for behavior.
    name: A name for the operation (optional).

  Returns:
    A tensor of the specified shape filled with random truncated normal values.
  """

也就是说这个函数生成的值遵循具有特定平均值和标准偏差的正态分布,如果其数值大于平均值2个标准偏差,将被丢弃并重新挑选。那么6个参数的作用就一目了然了。
也就是说,初始W是随机值,b是定值

2. Convolution and Pooling

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

这里作用是卷积和池化
conv2d方法:

def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", name=None):

"""
Args:
input: A `Tensor`. Must be one of the following types: `half`, `float32`.A 4-D tensor. The dimension order is interpreted according to the value of `data_format`, see below for details.
filter: A `Tensor`. Must have the same type as `input`.A 4-D tensor of shape `[filter_height, filter_width, in_channels, out_channels]`
strides: A list of `ints`.1-D tensor of length 4.  The stride of the sliding window for each dimension of `input`. The dimension order is determined by the value of `data_format`, see below for details.
padding: A `string` from: `"SAME", "VALID"`.The type of padding algorithm to use.
use_cudnn_on_gpu: An optional `bool`. Defaults to `True`.
data_format: An optional `string` from: `"NHWC", "NCHW"`. Defaults to `"NHWC"`.
Specify the data format of the input and output data. With the default format "NHWC", the data is stored in the order of:
          [batch, height, width, channels].
Alternatively, the format could be "NCHW", the data storage order of:
          [batch, channels, height, width].
name: A name for the operation (optional).
Returns:
    A `Tensor`. Has the same type as `input`.
    A 4-D tensor. The dimension order is determined by the value of `data_format`, see below for details.
"""

这里说conv2d是一个计算两个四维张量卷积的函数,这两个四维张量的格式是
x= [batch, in_height, in_width, in_channels] # 输入图像
W= [filter_height, filter_width, in_channels, out_channels] # 卷积核
batch : 一次输入图的个数
in/out_channel : 输入/出通道
图像卷积的计算就是把卷积核翻转180°,再与图像一一对应相乘,卷积核在每个维度滑动的步长就由strides控制,文档里说了对于图片,因为只有两维,通常strides取[1,stride,stride,1]
max_pool方法:

def max_pool(value, ksize, strides, padding, data_format="NHWC", name=None):
  """Performs the max pooling on the input.

  Args:
    value: A 4-D `Tensor` of the format specified by `data_format`.
    ksize: A 1-D int Tensor of 4 elements.  The size of the window for
      each dimension of the input tensor.
    strides: A 1-D int Tensor of 4 elements.  The stride of the sliding
      window for each dimension of the input tensor.
    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
      See the @{tf.nn.convolution$comment here}
    data_format: A string. 'NHWC', 'NCHW' and 'NCHW_VECT_C' are supported.
    name: Optional name for the operation.

  Returns:
    A `Tensor` of format specified by `data_format`.
    The max pooled output tensor.
  """

池化与卷积类似也就很容易看懂了

3. First Convolutional Layer

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1, 28, 28, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

这里预定义的W 的shape为[5, 5, 1, 32],对照2里面说的,那么卷积核是5*5大小,1通道输入,32通道输出的
b则是[shape为 [32] 的一个向量
x_image 为x reshape成[-1, 28, 28, 1]的矩阵,图像大小为28*28,1为通道数,-1将自动匹配数值,也就是1了(28*28=784,正好是原输入)
???为什么是32

4. Second Convolutional Layer

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

同上,为什么是64?

5. Densely Connected Layer

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

同上,为什么是1024?
这里因为代码的padding设置为了SAME,原图是28×28×1,第一次池化后为14×14×32,第二次池化后为7×7×64,全链接后,变成了1×3136的向量

6. Dropout

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

dropout函数用于防止过拟合

中文官方文档中说:我们用一个placeholder来代表一个神经元的输出在dropout中保持不变的概率。这样我们可以在训练过程中启用dropout,在测试过程中关闭dropout。 TensorFlow的tf.nn.dropout操作除了可以屏蔽神经元的输出外,还会自动处理神经元输出值的scale。所以用dropout的时候可以不用考虑scale

dropout在Geoffrey Hinton的这篇大作中被发现,具体操作过程可以看一看,简单的来说:
Dropout is a radically different technique for regularization. Unlike L1 and L2 regularization, dropout doesn’t rely on modifying the cost function. Instead, in dropout we modify the network itself. Let me describe the basic mechanics of how dropout works, before getting into why it works, and what the results are.Suppose we’re trying to train a network.
In particular, suppose we have a training input xx and corresponding desired output yy. Ordinarily, we’d train by forward-propagating xxthrough the network, and then backpropagating to determine the contribution to the gradient. With dropout, this process is modified. We start by randomly (and temporarily) deleting half the hidden neurons in the network, while leaving the input and output neurons untouched. After doing this, we’ll end up with a network along the following lines.
We forward-propagate the input xx through the modified network, and then backpropagate the result, also through the modified network. After doing this over a mini-batch of examples, we update the appropriate weights and biases. We then repeat the process, first restoring the dropout neurons, then choosing a new random subset of hidden neurons to delete, estimating the gradient for a different mini-batch, and updating the weights and biases in the network.
By repeating this process over and over, our network will learn a set of weights and biases. Of course, those weights and biases will have been learnt under conditions in which half the hidden neurons were dropped out. When we actually run the full network that means that twice as many hidden neurons will be active. To compensate for that, we halve the weights outgoing from the hidden neurons.
This dropout procedure may seem strange and ad hoc. Why would we expect it to help with regularization? To explain what’s going on, I’d like you to briefly stop thinking about dropout, and instead imagine training neural networks in the standard way (no dropout). In particular, imagine we train several different neural networks, all using the same training data. Of course, the networks may not start out identical, and as a result after training they may sometimes give different results. When that happens we could use some kind of averaging or voting scheme to decide which output to accept. For instance, if we have trained five networks, and three of them are classifying a digit as a “3”, then it probably really is a “3”. The other two networks are probably just making a mistake. This kind of averaging scheme is often found to be a powerful (though expensive) way of reducing overfitting. The reason is that the different networks may overfit in different ways, and averaging may help eliminate that kind of overfitting.
What’s this got to do with dropout? Heuristically, when we dropout different sets of neurons, it’s rather like we’re training different neural networks. And so the dropout procedure is like averaging the effects of a very large number of different networks. The different networks will overfit in different ways, and so, hopefully, the net effect of dropout will be to reduce overfitting.

7. Readout Layer

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

这里就输出结果了

8. Train and Evaluate the Model

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
      train_accuracy = accuracy.eval(feed_dict={
          x: batch[0], y_: batch[1], keep_prob: 1.0})
      print('step %d, training accuracy %g' % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

  print('test accuracy %g' % accuracy.eval(feed_dict={
      x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

循环20000次,每次50张图片,每100的倍数输出正确率,训练完在输出test集的正确率
这里每次50张有讲究吗?

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,下面是使用 TensorFlow 设计一个卷积神经网络识别 MNIST 手写数字的代码。 首先,我们需要导入 TensorFlowMNIST 数据集: ```python import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) ``` 然后,我们定义卷积神经网络的输入和输出。输入是一个 28x28 的灰度图像,输出是一个长度为 10 的向量,表示对应的数字的概率: ```python x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) ``` 接下来,我们定义卷积神经网络的结构。这里我们使用两个卷积层和两个池化层,然后连接两个全连接层。具体的结构如下: - 第一个卷积层:32 个 5x5 的卷积核,步长为 1,使用 ReLU 激活函数。 - 第一个池化层:2x2 的池化核,步长为 2。 - 第二个卷积层:64 个 5x5 的卷积核,步长为 1,使用 ReLU 激活函数。 - 第二个池化层:2x2 的池化核,步长为 2。 - 第一个全连接层:1024 个神经元,使用 ReLU 激活函数。 - 第二个全连接层:10 个神经元,使用 Softmax 激活函数。 ```python x_image = tf.reshape(x, [-1, 28, 28, 1]) # 第一个卷积层 W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1)) b_conv1 = tf.Variable(tf.constant(0.1, shape=[32])) h_conv1 = tf.nn.relu(tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1) # 第一个池化层 h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') # 第二个卷积层 W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1)) b_conv2 = tf.Variable(tf.constant(0.1, shape=[64])) h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2) # 第二个池化层 h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') # 第一个全连接层 W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1)) b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024])) h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) # 第二个全连接层 W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1)) b_fc2 = tf.Variable(tf.constant(0.1, shape=[10])) y_pred = tf.nn.softmax(tf.matmul(h_fc1, W_fc2) + b_fc2) ``` 接下来,我们定义损失函数和优化器。这里我们使用交叉熵作为损失函数,使用 Adam 优化器进行梯度下降: ```python cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(y_pred), reduction_indices=[1])) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) ``` 最后,我们定义评估模型的方法。我们使用准确率作为评估指标: ```python correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) ``` 现在,我们可以开始训练模型了。我们先初始化 TensorFlow 的会话,并进行 10000 次迭代,每迭代 100 次就输出一次模型在验证集上的准确率: ```python sess = tf.Session() sess.run(tf.global_variables_initializer()) for i in range(10000): batch = mnist.train.next_batch(50) if i % 100 == 0: train_accuracy = accuracy.eval(session=sess, feed_dict={x: batch[0], y: batch[1]}) print("step %d, training accuracy %g" % (i, train_accuracy)) train_step.run(session=sess, feed_dict={x: batch[0], y: batch[1]}) print("test accuracy %g" % accuracy.eval(session=sess, feed_dict={x: mnist.test.images, y: mnist.test.labels})) ``` 完整的代码如下所示:
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值