tensorflow之GPU加速的理解

最新推荐文章于 2024-04-01 12:40:12 发布

hh_2018

最新推荐文章于 2024-04-01 12:40:12 发布

阅读量5k

点赞数

分类专栏： tensorflow GPU加速文章标签： tensorflow CPU加速人工智能队列

本文链接：https://blog.csdn.net/hh_2018/article/details/80808515

版权

tensorflow 同时被 2 个专栏收录

19 篇文章 1 订阅

订阅专栏

GPU加速

1 篇文章 0 订阅

订阅专栏

最近在整理模型加速的问题，使用到了GPU，但是有时候发现GPU的速度尽然比CPU还低，后来查找一些相关的资料后发现可能是由于两方面原因造成的：1. GPU的计算能力不行（毕竟对于笔记本自带的GPU而言其性能并不一定比CPU强）；2. GPU和CPU之间存在通讯问题，即数据的读取要在CPU中进行，读取后的数据再送入GPU中进行处理。

针对第2个问题，考虑以队列的方式来解决，具体原因为：当数据在队列中传入的时候可以采用并行的方式进行，即当图在处理第一张图片的时候，第二张图片已经传进去了，这样在处理第二张的时候就不用考虑CPU传入GPU的时间限制。

具体涉及的代码如下：

filename = os.listdir(args.input)
filelist = [os.path.join(args.input, file) for file in filename]
# 构建文件名队列
file_q = tf.train.string_input_producer(filelist, shuffle=False)
# 构建读取器
reader = tf.WholeFileReader()
# 读取内容
key, value = reader.read(file_q)
# 构建解码器
image = tf.image.decode_jpeg(value)
# print(image)
# 统一图片大小设置长宽
resize_image = tf.image.resize_images(image, [height, width], method=1)
# 图片进行归一化
float_image = tf.image.per_image_standardization(resize_image)
# 指定通道大小
float_image.set_shape([height, width, 3])
#float_image = tf.cast(float_image, tf.float32)
# 构建批量处理管道
image_batch, key_batch = tf.train.batch([float_image, key], batch_size=1, num_threads=1, capacity=100,
enqueue_many=False)
with tf.Session() as sess:
pnet, rnet, onet = detect_face.create_mtcnn(sess, None)
# 构建线程协调器
coord = tf.train.Coordinator()
# 开启线程
threads = tf.train.start_queue_runners(sess, coord=coord)

image_batch, key_batch = sess.run([image_batch, key_batch])

print(sess.run(image).shape)
print(sess.run(tf.image.extract_jpeg_shape(value)))
for i in range(image_batch.shape[0]):
# print(key_batch[i])
image = image_batch[i, :, :, :]
print(image.shape)
print(image.dtype)
start = time.time()
bounding_boxes = detect_face.detect_face(image, minsize, pnet, rnet, onet, threshold, factor)
end = time.time()
nrof_faces = bounding_boxes.shape[0]
print(end-start)
print('Total %d face(s) detected' % nrof_faces)
coord.request_stop()
coord.join(threads)

其中涉及到以几个函数：

1.tf.train.string_input_producer: 该函数默认的是输入一个string的列表，然后该string列表将会产生一个队列。默认的情况是乱序产生的，当设置shuffle=False时，会以正常顺序读取数据。

2. tf.image.decode_jpeg：将图片解码成一个张量。通过sess.run可以输出该张量的值

3. tf.train.batch：对数据进行分批处理，其中第一个参数表示需要分批的张量，可以是一个张量的列表，表示对每个张量都需要进行分批处理，产生对应的批数据个数。

4. image_batch, key_batch = sess.run([image_batch, key_batch])：该指令尤为重要，因为分批处理后的数据只是张量的形式，如果没有sess.run无法运行。所以该指令是把数据变为可以在后续程序中使用的具体的数组。另外使用key和value的目的是为了可以知道自己处理的是哪张图片，此时key和value是一一对应的。

注意：如果分开则情况不一样，例如：image_batch = sess.run(image_batch)

key_batch = sess.run(key_batch)

此时得出的key和value并不对应。主要是因为第一次执行sess.run时对队列进行了一遍处理，当在执行一次时，队列的指针指在第一次处理后的位置继续进行。所以不对应。（key,value都是在队列中产生的）