Tensorflow神经网络，mnist，CPU和GPU同时跑通，一轮96%，20轮99%

最新推荐文章于 2022-10-03 21:27:35 发布

飞凡可期

最新推荐文章于 2022-10-03 21:27:35 发布

阅读量898

点赞数 1

分类专栏： python 2019 神经网络文章标签： GPU 神经网络 MNISt 99%

本文链接：https://blog.csdn.net/PerfeyCui/article/details/102848299

版权

python 同时被 3 个专栏收录

44 篇文章 0 订阅

订阅专栏

2019

9 篇文章 0 订阅

订阅专栏

神经网络

4 篇文章 0 订阅

订阅专栏

两张网络

名称	算法	学习速率	目标函数	层数	特点	准确率	CPU时间	GUP时间
简单NN	梯度下降	1/100	交叉熵	1	最简单最快	96%	32min	9s
双层CNN	Adam 最速下降	1/10000	交叉熵	2层卷积	池化	99.28%	16min20s	1min27s

所用CPU为IntelCorei7-8700 CPU@3.2GHz
所用GPU为NVIDIA RTX6000 24GB

GPU改动，踩坑

CPU改gpu只用一句：

os.environ['CUDA_VISBLE_DEVICES'] = '0'

环境对gpu0可见就可以了。非常便利，但注意不要有重复输入出，会莫名其妙运行不了GPU；
类似

x = sess.run(x)

大忌

上代码

CPU版本：

import tensorflow as tf
import numpy as np
import os
import time
ISOTIMEFORMAT = '%Y-%m-%d-%H-%M-%S'
localtime = time.strftime(ISOTIMEFORMAT, time.localtime())

from tensorflow.examples.tutorials.mnist import input_data
# import input_data

os.environ['CUDA_VISBLE_DEVICES'] = '0'

# tf.compat.v1.disable_eager_execution() # TF2.0需要关闭这个eager模式，并且所有Session都要compat.v1.加一个兼容标注；
# 输入数据 mnist数据集，55000张图片；
mnist=input_data.read_data_sets(r'C:\Users\c00525771\Documents\Python Scripts\CNN\MNIST_data',one_hot=True)#参数一：文件目录。参数二：是否为one_hot向量

# interactiveSession 交互会话
config = tf.ConfigProto(log_device_placement = True, allow_soft_placement = True) #tf.compat.v1.ConfigProto #tf.config.experimental
config.gpu_options.allow_growth = True


#config = tf.ConfigProto(log_device_placement = True, allow_soft_placement = True)
# with tf.device('/gpu:0'):
sess = tf.Session(config = config) #InteractiveSession #Session
#tf.Session(config = tf.ConfigProto(log_device_placement = True))
#tf.InteractiveSession()

"""计算图
为了在Python中进行高效的数值计算，我们通常会使用像NumPy一类的库，将一些诸如矩阵乘法的耗时操作在Python环境的外部来计算，这些计算通常会通过其它语言并用更为高效的代码来实现。

但遗憾的是，每一个操作切换回Python环境时仍需要不小的开销。如果你想在GPU或者分布式环境中计算时，这一开销更加可怖，这一开销主要可能是用来进行数据迁移。

TensorFlow也是在Python外部完成其主要工作，但是进行了改进以避免这种开销。其并没有采用在Python外部独立运行某个耗时操作的方式，而是先让我们描述一个交互操作图，然后完全将其运行在Python外部。这与Theano或Torch的做法类似。

因此Python代码的目的是用来构建这个可以在外部运行的计算图，以及安排计算图的哪一部分应该被运行。详情请查看基本用法中的计算图表一节。

--- 计算图的含义是：矩阵乘法等耗时的操作应该转换到python外用更高效语言（如C）实现，但是切换本身很耗时。所以排列好，整体外置来运算，这是session层层建立的原因；
不是每一步都到外部，机制上是整个委托外部，内部结算和观察；
"""

# softmax回归模型建立；
'建立输入出'
in_size = 28*28
out_size = 10
x = tf.placeholder("float", shape = [None, in_size]) # 784长张量作为输入 input，
# None易混淆-表示的不是“无”而是不定，相当于 NotSure， Scalable （输入10,1000,10000个784 Tensor都可以接受）
y_ = tf.placeholder("float", shape = [None, out_size]) #10个种类输出（就是keyhole是10bit） #真实的输出

'建立中间量'
W = tf.Variable(tf.zeros([in_size, out_size]))
b = tf.Variable(tf.zeros([out_size]))

'初始化'
sess.run(tf.compat.v1.initialize_all_variables()) #这里all variables为函数，而非一个变量集合，注意；

'类别预测与loss'
y = tf.nn.softmax(tf.matmul(x, W)+b) # 关键的预测函数 y = Wx + b  nural network 's softmax functions;
cross_entropy = -tf.reduce_sum(y_*tf.log(y)) # -Sigma{ y_obs*log(y_pred) }；

'训练模型'
train_step = tf.train.GradientDescentOptimizer(learning_rate = 0.01).minimize(cross_entropy)
# 最优化梯度下降算法，-s*delta（fx），s learning_rate + fx-object function (cross_entropy, ls, MMSE)， 算法：最速下降、牛顿法
# #gradentDescentOptimizer 梯度下降优化法，梯度下降优化 需要迭代次数，会有反复性， 更新权值； x = -s*delta(f(x))； xnew；
# operate train_step to iteratively update train_step
for i in range(1000):
    batch = mnist.train.next_batch(50)
    train_step.run(feed_dict = {x: batch[0], y_:batch[1]}) #反馈字典： 真实x值batch x： 真实y值y_ batch y_
    # 注意，在计算图中，你可以用feed_dict来替代任何张量，并不仅限于替换占位符。

'评估模型'
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))  # 判断最大索引值是否相等来看正确，错误；
# 注意y_是一批数据，每行是同一份输出 none行，out_size列； 沿着axis = 1;
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float")) #转bool类型为 float类型；
print(accuracy.eval(feed_dict = {x:mnist.test.images, y_: mnist.test.labels}) )


""'构建一个多层卷积网络'
'权重初始化'# b +XW； 故 0.01 mean Value + 0.01*normVariable
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1) #截断正态分布，norm（0，stddev = 0.01)， 但是大于2*stddec 的值被切掉了，保证平稳性；
    return tf.Variable(initial)
def bias_variable(shape):
    initial = tf.constant(0.1, shape = shape)
    return tf.Variable(initial)

'卷积和池化' # 超参数1： 步幅长度，边距是不是1或2， 池化核大小
#卷积函数和池化函数，步幅度stride都设置为1,0边距，池化核2x2最简版；
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides = [1, 1, 1, 1], padding = 'SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')

'一层 二层卷积层，密集连接层'  # 超参数2：层数，池化要不要，各卷积核5x5？，深度32？，
W_conv1 = weight_variable([5, 5, 1, 32]) # 卷积核mxn x 通道数k x 卷积深度 p；
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# layer2：
W_conv2 = weight_variable([5, 5, 32, 64]) #64是32的每个输出2个吗？
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
#注意数据流动，流动过程中处理的相应变化；；  (这里池化因为2x2，长宽均缩小一半每次)
# x-(reshape 28x28)->x_image -> conv1+pool--(5x5x1x32, 2x2x1pool)-->pool1 -(conv2+pool2 5x5x32x64)->

#whole link 超参数3：全连接的大小
w_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)

'DropOut'
#减少过拟合，加dropout， placeholder代表神经元输出在dropout中保持不变；
# 训练中drouout，测试关闭dropOut的； 附带处理scale，缩小放大值使得正好码？
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) #丢弃功率，丢掉h——fc1后的输出（丢的不是超参数，而是输入 训练值）


'输出层'
w_fc2 = weight_variable([1024, out_size])
b_fc2 = bias_variable([out_size])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc2) + b_fc2)

'训练和评估模型'
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(learning_rate =  1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,'float32'))

#启动流程和启动循环，每100个输出一次进展，step多少，准确率多少，训练的测试的准确率
sess.run(tf.initialize_all_variables())
for i in range(1000): #20
    batch = mnist.train.next_batch(50) # batch 大小一个超参数4
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict = {x : batch[0], y_:batch[1], keep_prob : 1.0})
        print("step %d, training accuracy %g"%(i, train_accuracy))
    train_step.run(feed_dict= {x:batch[0], y_:batch[1], keep_prob: 0.5})

print("test accuracy%g" % accuracy.eval(feed_dict = {x:mnist.test.images, y_:mnist.test.labels, keep_prob:1}))
print("End of Time:", localtime)

sum

触动很大，真是飞一般速度；飞凡可期。本来觉得NN时代因为没法落地要沉寂一段了。现在发现其利器没有完全释放，值得深挖的。
进一步实践3层CNN，CPU跑2天（48+小时），GPU跑10分钟。Shocking
尤其现在各地，包括菊厂这样的通信类厂商都开始昂腾AI芯片的研发和商用了。继续埋头拉车吧~

飞凡可期

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Tensorflow神经网络，mnist，CPU和GPU同时跑通，一轮96%，20轮99%

两张网络名称算法学习速率目标函数层数特点准确率CPU时间GUP时间简单NN梯度下降1/100交叉熵1最简单最快96%<1<1双层CNNAdam 最速下降1/10000交叉熵2层卷积池化99.28%16分钟2分钟所用CPU为IntelCorei7-8700 CPU@3.2GHz所用GPU为NVIDIA R...
复制链接

扫一扫