深度学习图像超分辨率开山之作SRCNN——原理分析及代码（效果基本可以达到论文中的效果）

最新推荐文章于 2024-04-23 09:16:39 发布

Allen吖

最新推荐文章于 2024-04-23 09:16:39 发布

阅读量8.1k

点赞数 24

分类专栏：笔记文章标签： tensorflow 卷积深度学习 python 超分辨率重构

转载请注明原地址

本文链接：https://blog.csdn.net/weixin_43723423/article/details/108368746

版权

笔记专栏收录该内容

18 篇文章 6 订阅

订阅专栏

基于python+tensorflow下的超分辨率图像重构（效果基本可以达到论文中的效果）

论文地址：点击此处跳转

搞这篇论文时，踩了很多坑，效果优于网上的大部分代码，网上大部分代码效果离理想效果差5-6db，而我把里面的坑踩掉了，最后效果很逼近论文中的效果。

经过仔细阅读论文，我将复现工作主要分为以下几个部分：

Step1：数据集的处理：Train 数据集包括 91 张图片，仅取亮度通道。之后将图片 bicubic 将分辨率降低，再次进行 bicubic 将其大小放大至与原图一致。将图片按照 stride 进行裁剪，这样 91 张图片可以得到 2 万多张大小相同的子图片( 我想应该是为了增大训练数据，以便得到较好的结果)将子图片输入网络进行训练，且将未经过 bicubic 的数据作为 label。

Step2：：搭建卷积神经网络：三层网络模型，前两层有激活函数，最后一层不使用激活函数。且每层网络中的 filter weight 以及 biase 的大小都已给出，训练时需要学习的参数只有这六项的权重。损失函数取得是 MSE 的值。需要注意的是 Train 时为避免边界效应，卷积层都没有填充，因此输入的图片会变小。因此预处理时 label 要取相对应的大小，以进行 loss 函数计算。

Step3 ：训练结束后保存 filter weight 以及 biase 的权重值。这里 Test 时需要整张图片读取，为保证输出大小与输入大小一致，卷积层中的 padding=“SAME”进行填充，以保证输出图片大小不变。之后进行 psnr 计算即可。

在这里插入图片描述

对应这三个部分的py代码如下：（代码没有严格对齐，所以直接复制是运行不了的，可以参考下框架，最后会有完整代码的链接）

Code of Step1 ：

取亮度通道：image=scipy.misc.imread(data[i], flatten=True, mode='YCbCr').astype(np.float)
将图片裁剪到能整除 scale 的大小：

if len(image.shape) == 3:
h, w, _ = image.shape
h = h - np.mod(h, scale)
w = w - np.mod(w, scale)
label_ = image[0:h, 0:w, :]
else:
h, w = image.shape
h = h - np.mod(h, scale)
w = w - np.mod(w, scale)
label_ = image[0:h, 0:w]

两次进行 bicubic 操作得到与原图大小一样的低分辨率图片

label_ = label_ / 255. #此时 label_为真图，后续进行 mse 计算时与预测图片进行对比
input_ = scipy.ndimage.interpolation.zoom(label_, (1./scale), prefilter=False)
input_ = scipy.ndimage.interpolation.zoom(input_, (scale/1.), prefilter=False)

切割子图片，其中输入图片裁剪为 3333 ，而 label 需要裁剪为 2121. （因为三层卷积了网络损失了12）

for x in range(0, h-image_size+1, stride): #以 stride 为步长进行取子图片操作
for y in range(0, w-image_size+1, stride):
sub_input = input_[x:x+image_size, y:y+image_size] # [33 x 33]大小
sub_label=label_[x+int(padding):x+int(padding)+label_size,
y+int(padding):y+int(padding)+label_size] # [21 x 21]大小

最后将数据转成.h5 文件存储

with h5py.File(savepath, 'w') as hf: #数据集的制作
hf.create_dataset('data', data=arrdata)
hf.create_dataset('label', data=arrlabel)

Code of Step2 ：

卷积神经网络搭建

images = tf.placeholder(tf.float32, [None, None, None, c_dim], name='images')
labels = tf.placeholder(tf.float32, [None, None, None, c_dim], name='labels')
#滤波器
weights = {
'w1': tf.Variable(tf.random_normal([9, 9, 1, 64], stddev=1e-3),trainable=trainable,
name='w1'),
'w2':tf.Variable(tf.random_normal([1,1, 64, 32], stddev=1e-3),trainable=trainable,
name='w2'),
'w3':tf.Variable(tf.random_normal([5,5,32,1],stddev=1e-3),trainable=trainable,
name='w3')
}
#偏置
biases = {
'b1': tf.Variable(tf.zeros([64]),trainable=trainable ,name='b1'),
'b2': tf.Variable(tf.zeros([32]),trainable=trainable, name='b2'),
'b3': tf.Variable(tf.zeros([1]),trainable=trainable, name='b3')
}
#三层卷积网络
conv1  =  tf.nn.relu(tf.nn.conv2d(images,  weights['w1'],  strides=[1,1,1,1],
padding=padding) + biases['b1'])
conv2 = tf.nn.relu(tf.nn.conv2d(conv1, weights['w2'], strides=[1,1,1,1], padding=padding)
+ biases['b2'])
conv3 = tf.nn.conv2d(conv2, weights['w3'], strides=[1,1,1,1], padding=padding) +
biases['b3']
#预测值以及 loss 函数和优化器
pred=conv3
loss = tf.reduce_mean(tf.square(labels - pred))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

进行训练进行训练并保存模型

saver=tf.train.Saver(max_to_keep=5)
with tf.Session() as sess:
if is_train:
print("Training...")
sess.run(tf.initialize_all_variables()) #变量初始化
for ep in xrange(epoch):
batch_idxs = len(train_data) // batch_size
for idx in xrange(0, batch_idxs):
batch_images = train_data[idx*batch_size : (idx+1)*batch_size]
batch_labels = train_label[idx*batch_size : (idx+1)*batch_size]
counter +=1
_, err = sess.run([train_op, loss], feed_dict={images: batch_images, labels:
batch_labels})
if counter % 10 == 0:
print("Epoch: [%2d], step: [%2d], time: [%4.4f], loss: [%.8f]" \
% ((ep+1), counter, time.time()-start_time, err))
if counter % 500 == 0:
saver.save(sess,os.path.join('checkpoint','SRCNN'),global_step=counter,write_meta_graph=False)

Code of Step3 ：

卷积核填充方式改为“SAME”
进行预测并进行 pnsr 计算

with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state("checkpoint")
if ckpt and ckpt.model_checkpoint_path: # 加载保存的模型
saver.restore(sess, ckpt.model_checkpoint_path)
result = pred.eval({images: train_data, labels: train_label})
result2 = result.squeeze()
image_path = os.path.join(os.getcwd(),'sample')
image_path = os.path.join(image_path, "test_image.png")
imsave(result2, image_path)
label=train_label.squeeze()
print(pnsr(label,result2)) #计算 pnsr 值
def pnsr(img1,img2):
diff = np.abs(img1*255.0- img2*255.0)
mse = np.square(diff).mean() #mse 表示当前图像与原有图像的均方误差
psnr = 20 * np.log10(255 / np.sqrt(mse)) #评价指标即峰值信噪比
return psnr

写到这里，基本上整个srcnn的框架已经搭好了，接下来就是进行训练然后看下效果。在我的机子上跑了40个小时，最终得到的结果：
在这里插入图片描述

在这里插入图片描述
预测，读取checkpoint\SRCNN-16300000里面的参数，并对set5里面的woman图片进行SRCNN。

Input：

在这里插入图片描述
label：

Psnr约为25.38db，与论文中的30.92db相差好大。而且我把生成的图片保存下来发现似乎亮度比输入的低，感觉问题好像出在这里。具体我也不太清楚到底问题出在哪里。后来打开原作者的MATLAB代码进行数据对比，发现了step1预处理中有一些问题：
主要有：

问题1.matlab中的rgb2ycbcr函数与python中的处理函数运算上有差别。解决方案：我根据matlab中rgb2ycbcr函数的运算规则自己在python中写了一个rgb2ycbcr函数。
问题2.matlab中的bicubic函数与我一开始在matlab中选取得bicubic函数运算规则不同。改进：重新选取了bicubic函数以求与matlab源代码处理一致。
问题3：作者的源代码所有输入输出都是保留四位小数，而我一开始是保留的位数较多，导致有一定的差别。改进：都进行了保留四位小数的处理。
问题4：作者源代码的pnsr计算对图像边缘进行了裁剪。改进：我对最后的结果输出图片进行了相对应的边缘的裁剪。

问题一解决对应代码：

def rgb2ycbcr(img, only_y=True):  #自己重新写的rgb2ycbcr函数以求对应到matlab的rgb2ycbcr函数
    '''same as matlab rgb2ycbcr
    only_y: only return Y channel
    Input:
        uint8, [0, 255]
        float, [0, 1]
    '''
    in_img_type = img.dtype
    img.astype(np.float32)
    if in_img_type != np.uint8:
        img *= 255.
    # convert
    if only_y:
        rlt = np.dot(img, [65.481, 128.553, 24.966]) / 255.0 + 16.0
    else:
        rlt = np.matmul(img, [[65.481, -37.797, 112.0], [128.553, -74.203, -93.786],
                              [24.966, 112.0, -18.214]]) / 255.0 + [16, 128, 128]
    if in_img_type == np.uint8:
        rlt = rlt.round()
    else:
        rlt /= 255.
    return rlt.astype(in_img_type)

问题二及问题三解决方法：

          #进行两次插值构造低分辨率图片
          label_1=Image.fromarray(label_)
          input_= label_1.resize(( w // scale,h // scale),Image.BICUBIC)
          input_= input_.resize((w,h), Image.BICUBIC)
          input_=np.float64(input_)
          
          #保存四位小数
          label_=np.around(label_, decimals=4)
          input_=np.around(input_,decimals=4)
          
          #下面这个插值函数与matlab中不一致，因此舍弃
          #input_ = scipy.ndimage.interpolation.zoom(label_, (1./scale), prefilter=False)#进行双三次插值变为低分辨率图片
          #input_ = scipy.ndimage.interpolation.zoom(input_, (scale/1.), prefilter=False)#再次进行双三次插值变为与高分辨率图片一样大小

问题四在最后做了裁剪。
经过这一系列的坑之后，现在的训练得到的效果明显优于bicubic等方法，只是距离理想效果仍有1db多的差距。而且图片仍然偏暗，后来发现主要问题出在测试的时候，做了zero-padding之后，论文中还有一步操作是对感受野中非零区域做normalize。这一步漏掉了，所以导致了最后超分辨率生成的图片整体偏暗。（最后这块normalize我就没有再去做了，很简单，自己可以加一下就好了，而且即使不加也是优于bicubic等方法的）

最后我将最终完整的py代码分为三个.py文件，先运行pre预处理，再运行train进行训练，最后进行test预测。完整代码下面贴出来链接，供大家交流。
在这里插入图片描述

完整代码压缩包链接

Allen吖

关注

24
点赞
踩
167

收藏

觉得还不错? 一键收藏
打赏
24
评论
深度学习图像超分辨率开山之作SRCNN——原理分析及代码（效果基本可以达到论文中的效果）

基于python+tensorflow下的超分辨率图像重构（效果基本可以达到论文中的效果）论文地址：点击此处跳转搞这篇论文时，踩了很多坑，效果优于网上的大部分代码，网上大部分代码效果离理想效果差5-6db，而我把里面的坑踩掉了，最后效果很逼近论文中的效果。经过仔细阅读论文，我将复现工作主要分为以下几个部分：Step1：数据集的处理：Train 数据集包括 91 张图片，仅取亮度通道。之后将图片 bicubic 将分辨率降低，再次进行 bicubic 将其大小放大至与原图一致。将图片按照 stride
复制链接

扫一扫