0.简介
即使随着现在得发展,CPU的处理速度提高和SSD的出现提高了数据的读取速度,但是对于超大规模的训练过程(ImageNet),读取图像的时间依然是一个较大的开销,因此本文旨在对常见的框架技术的图像读取方式进行对比。
图像属性:(427,640,3)
附上我的学习链接:https://zhuanlan.zhihu.com/p/30383580
1.对比过程
1.1 OpenCV
import cv2
import time
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = cv2.imread(image)
# img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print(MAXN / (time.time() - time1))
449.38982663317097 imgs / second
import cv2
import time
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = cv2.imread(image)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print(MAXN / (time.time() - time1))
346.7856687525765 imgs / second
1.2 scipy
from scipy.misc import imread
import time
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = imread(image)
print(MAXN / (time.time() - time1))
322.0913625719489 imgs / second
1.3 skimage
from skimage import io
import time
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = io.imread(image)
print(MAXN / (time.time() - time1))
328.2046402456301 imgs / second
1.4 PIL
from PIL import Image
import time
import numpy as np
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = Image.open(image)
img = np.array(img)
print(MAXN / (time.time() - time1))
281.86613847589075 imgs / second
from PIL import Image
import time
import numpy as np
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = Image.open(image)
# img = np.array(img)
print(MAXN / (time.time() - time1))
17242.91258304282 imgs / second
注:大部分时间都消耗在了numpy转换上了。
1.5 MXNet(存在问题!)
import mxnet as mx
import time
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = mx.image.imdecode(open(image, 'rb').read())
mx.nd.waitall()
print(MAXN / (time.time() - time1))
318.6462058057628 imgs / second
注:不知道为啥,速度感觉是错误的。
1.6 Tensorflow
import tensorflow as tf
import time
image = '1.jpg'
MAXN = 1000
time1 = time.time()
for i in range(MAXN):
img = tf.gfile.FastGFile(image, 'rb').read()
img = tf.image.decode_jpeg(img)
print(MAXN / (time.time() - time1))
1634.8531350343824 imgs / second
注:这个方法的速度还是很快的,一开始的时候直接用tf.gfile.FastGFile(image)
,显示编码错误,改成'rb'
方式就好了。
此处附上一位大神的讲解(关于Tensorflow数据读取方式):
https://zhuanlan.zhihu.com/p/27238630