Tensorflow-gpu配置
一、本机配置
win10
GEFORCE GTX1050Ti
Inter® Core™ i5-7300HQ CPU
二、下载准备
1.更新显卡驱动 点此更新
搜索下载并安装
2显卡设置
.
- 下载VS2015社区版 点此下载
- 下载cuda9.0.176;选择local离线安装包;下载 Patch 1 Patch 2 Patch 3补丁
- 下载cuDNN;选择Download cuDNN v7.4.1 (Nov 8, 2018), for CUDA 9.0
- 下载Anaconda3 清华镜像 官网
三、安装
1.安装vs2015社区版,选择C++组件
如果有异常可以在配置界面修复
2.安装cuda_9.0.176_win10.exe
安装cuda之前一定要把vs2015安装好
先不要点下一步,打开临时解压目录C:\Users\YY\AppData\Local\Temp\CUDA找到这个文件夹CUDAVisualStudioIntegration复制到桌面保存。
"CUDAVisualStudioIntegration\extras\visual_studio_integration\MSBuildExtensions"下的所有文件直接复制到“C:\Program Files(x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations’‘文件夹下面
最后依次安装cuda_9.0.176.1_windows.exe 、cuda_9.0.176.2_windows.exe 、 cuda_9.0.176.3_windows.exe
3.安装cudnn-9.0-windows10-x64-v7.4.1.5 .zip
把解压的三个文件夹放在C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0下
四、验证cuda cudnn是否安装成功
1.检查系统环境变量中是否包含,如果没有就加进去
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\extras\CUPTI\libx64
2.打开命令提示符cmd,输入:nvcc -V
显示cuda版本号
3.利用VS2015编译测试文件
打开C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0,找到对应VS版本的sample,本例中为Samples_vs2015.sln,双击打开:选择Release,X64
右键1_Utilities,点击build(build);生成: 成功 5 个,失败 0 个,最新 0 个,跳过 0 个
至此,“C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\bin\win64\Release”文件夹中会出现我们需要的deviceQuery和bandwidthTest。如下图:
4.验证deviceQuery 和 bandwidthTest
打开cmd:定位到 C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\bin\win64\Release目录,分别输入 deviceQuery ,bandwidthTest 并运行,出现如下类似信息便说明CUDA安装成功。
5.安装Anaconda3
安装时添加到系统环境变量
6.安装tensorflow-gpu
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade tensorflow-gpu==1.8
五、测试
demo.py
# coding=gbk
from datetime import datetime
import math
import time
import tensorflow as tf
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
batch_size = 32
num_batches = 100
# 该函数用来显示网络每一层的结构,展示tensor的尺寸
def print_activations(t):
print(t.op.name, ' ', t.get_shape().as_list())
# with tf.name_scope('conv1') as scope # 可以将scope之内的variable自动命名为conv1/xxx,便于区分不同组件
def inference(images):
parameters = []
# 第一个卷积层
with tf.name_scope('conv1') as scope:
# 卷积核、截断正态分布
kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64],
dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
# 可训练
biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope)
print_activations(conv1)
parameters += [kernel, biases]
# 再加LRN和最大池化层,除了AlexNet,基本放弃了LRN,说是效果不明显,还会减速?
lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn1')
pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool1')
print_activations(pool1)
# 第二个卷积层,只有部分参数不同
with tf.name_scope('conv2') as scope:
kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv2)
# 稍微处理一下
lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn2')
pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool2')
print_activations(pool2)
# 第三个
with tf.name_scope('conv3') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv3 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv3)
# 第四层
with tf.name_scope('conv4') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv4 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv4)
# 第五个
with tf.name_scope('conv5') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')
conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv5 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
print_activations(conv5)
# 之后还有最大化池层
pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5')
print_activations(pool5)
return pool5, parameters
# 全连接层
# 评估每轮计算时间,第一个输入是tf得Session,第二个是运算算子,第三个是测试名称
# 头几轮有显存加载,cache命中等问题,可以考虑只计算第10次以后的
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10
total_duration = 0.0
total_duration_squared = 0.0
# 进行num_batches+num_steps_burn_in次迭代
# 用time.time()记录时间,热身过后,开始显示时间
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target)
duration = time.time() - start_time
if i >= num_steps_burn_in:
if not i % 10:
print('%s:step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
# 计算每轮迭代品均耗时和标准差sd
mn = total_duration / num_batches
vr = total_duration_squared / num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' % (datetime.now(), info_string, num_batches, mn, sd))
def run_benchmark():
# 首先定义默认的Graph
with tf.Graph().as_default():
# 并不实用ImageNet训练,知识随机计算耗时
image_size = 224
images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
pool5, parameters = inference(images)
init = tf.global_variables_initializer()
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))
sess.run(init)
# 下面直接用pool5传入训练(没有全连接层)
# 只是做做样子,并不是真的计算
time_tensorflow_run(sess, pool5, "Forward")
# 瞎弄的,伪装
objective = tf.nn.l2_loss(pool5)
grad = tf.gradients(objective, parameters)
time_tensorflow_run(sess, grad, "Forward-backward")
run_benchmark()
在Anaconda Promt中定位到demo.py所在位置,输入python demo.py 运行demo
注释一下这两行代码,使用GPU训练,否则使用cpu训练
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
本文转载于
https://blog.csdn.net/Petrichoryi/article/details/107772945?spm=1001.2014.3001.5501
如有侵权请联系删除