TensorFlow北大公开课学习笔记8-复现vgg16并实现图片识别

最新推荐文章于 2024-07-09 08:45:00 发布

快乐成长吧

最新推荐文章于 2024-07-09 08:45:00 发布

阅读量3.7k

点赞数

分类专栏： TensorFlow北大公开课学习笔记

本文链接：https://blog.csdn.net/qq_37791134/article/details/83789479

版权

这篇博客记录了在学习TensorFlow北大公开课时复现VGG16模型的过程，遇到了图片四通道问题导致的reshape错误。作者分享了如何解决这个问题，包括图片转换的技巧和使用numpy进行数据存储与加载的方法。还提到了理解张量的维度(shape)、类型(dtype)等参数的重要性。

摘要由CSDN通过智能技术生成

https://www.cs.toronto.edu/~frossard/post/vgg16/
】img_ready = re_img.reshape((1, 224, 224, 3))
ValueError: cannot reshape array of size 200704 into shape (1,224,224,3) 因为图片是四通道的，为啥？224*334*4=200704

#coding:utf-8
# 将4通道转换为三通道
import matplotlib.pyplot as plt
from PIL import Image
img_path = input('Input the path and image name:')
img = Image.open(img_path)
lena_RGBA_rgb= img.convert("RGB")
lena_RGBA_rgb.save('dog.jpg')
plt.figure("dog")
plt.imshow(lena_RGBA_rgb)
# plt.savefig('1.jpg')
plt.show()

你可能在想为什么不直接在这下面改？

我也想，但是imread读入的图没有convert这个操作，然后用了open吧，img/255.0又不能直接操作，原因在于，这不是一个矩阵，不能直接这么算，然后找了图片转矩阵的代码例子，发现由不能整除，不能reshpe什么鬼。所以写了一个。可以改一改，写成一个函数，当来图片调用这个函数，然后转换，然后将转换的图片保存，然后再imread就可以了。我就不去研究这个了。大家感兴趣可以搞搞。

ValueError: Input must be scalar but has rank 2 for 'split' (op: 'Split') with input shapes: [2,3], [] and with computed input tensors: input[0] = <[1 2 3][4 5 6]>.

看https://tensorflow.google.cn/ 比一切都好

You may need to pass the encoding= option to numpy.load

#当前目录/vgg16.npy，索引到 vgg16.npy 文件
√np.save：写数组到文件（未压缩二进制形式），文件默认的扩展名是.npy 。
np.save("名.npy"，某数组):将某数组写入“名.npy”文件。
某变量 = np.load("名.npy"， encoding = " ").item()：将“名.npy”文件读
出给某变量。 encoding = " " 可以不写‘latin1’ 、 ‘ASCII’ 、 ‘bytes’ ，
默认为’ASCII’ 。
例：
>>> import numpy as np
A = np.arange(15).reshape(3,5)
>>> A
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> np.save("A.npy",A) #如果文件路径末尾没有扩展名.npy，该扩展名会被
自动加上。
>>> B=np.load("A.npy")
>>> B
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])

关于张量的是哪个参数（名字name，维度shape，类型dtype）维度的理解

#coding:utf-8
import numpy as np
import tensorflow as tf
#引入绘图模块
import matplotlib.pyplot as plt
#引用自定义模块
import vgg16
import utils
from Nclasses import labels


img_path = input('Input the path and image name:')
#对待测试图像出预处理操作

img_ready = utils.load_image(img_path)
#定义画图窗口，并指定窗口名称
fig = plt.figure(u"Top-5 预测结果")


with tf.Session() as sess:
    # 定义一个维度为[1, 224, 224, 3]的占位符
    images = tf.placeholder(tf.float32, [1, 224, 224, 3])
    # 实例化出vgg
    vgg = vgg16.Vgg16()
    # 前向传播过程，调用成员函数，并传入待测试图像
    vgg.forward(images)
    # 将一个batch数据喂入网络，得到网络的预测输出
    probability = sess.run(vgg.prob, feed_dict={images:img_ready})
    # 得到预测概率最大的五个索引值
    top5 = np.argsort(probability[0])[-1:-6:-1]
    print("top5:", top5)
    # 定义两个list-对应概率值和i实际标签
    values = []
    bar_label = []
    # 枚举上面取出的五个索引值
    for n, i in enumerate(top5):
        print ("n:",n)
        print ("i:",i)
        # 将索引值对应的预测概率值取出并放入value
        values.append(probability[0][i])
        # 将索引值对应的实际标签取出并放入bar_label
        bar_label.append(labels[i])
        print (i, ":", labels[i], "----", utils.percent(probability[0][i]))

    # 将画布分为一行一列，并把下图放入其中
    ax = fig.add_subplot(111)
    # 绘制柱状图
    ax.bar(range(len(values)), values, tick_label=bar_label, width=0.5, fc='g')
    # 设置横轴标签
    ax.set_ylabel(u'probabilityit')
    # 添加标题
    ax.set_title(u'Top-5')
    for a,b in zip(range(len(values)), values):
        # 显示预测概率值
        ax.text(a, b+0.0005, utils.percent(b), ha='center', va = 'bottom', fontsize=7)
    # 显示图像
    plt.savefig('./result.jpg')  # 保存图片
    plt.show()

#!/usr/bin/python
#coding:utf-8
from skimage import io, transform

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from pylab import mpl

mpl.rcParams['font.sans-serif'] = ['SimHei'] # 正常显示中文标签
mpl.rcParams['axes.unicode_minus'] = False # 正常显示正负号

def load_image(path):
    fig = plt.figure("Centre and Resize")
    # 传入读入图片的参数路径
    img = io.imread(path)
    # 将像素归一化处理到[0,1]
    img = img / 255.0

    # 将该画布分为一行三列，把下面的图像放在画布的第一个位置
    ax0 = fig.add_subplot(131)
    # 添加子标签
    ax0.set_xlabel(u'Original Picture')
    # 添加展示该图像
    ax0.imshow(img)

    # 找到该图像的最短边
    short_edge = min(img.shape[:2])
    # 把图像的w和h分别减去最短边，并求平均
    y = (img.shape[0] - short_edge) // 2
    x = (img.shape[1] - short_edge) // 2
    # 取出切分过的中心图像
    crop_img = img[y:y+short_edge, x:x+short_edge]

    # 把下面的图像放在画布的第二个位置
    ax1 = fig.add_subplot(132)
    # 添加子标签
    ax1.set_xlabel(u"Centre Picture")
    # 添加展示该图像
    ax1.imshow(crop_img)

    # resize成固定的imagesize
    re_img = transform.resize(crop_img, (224, 224))

    # 把下面的图像放在画布的第三个位置
    ax2 = fig.add_subplot(133) 
    ax2.set_xlabel(u"Resize Picture") 
    ax2.imshow(re_img)
    # 转换为需要的输入形状
    img_ready = re_img.reshape((1, 224, 224, 3))

    return img_ready

#定义百分比转换函数
def percent(value):
    return '%.2f%%' % (value * 100)

#!/usr/bin/python
#coding:utf-8

import inspect
import os
import numpy as np
import tensorflow as tf
import time
import matplotlib.pyplot as plt

#样本RGB的平均值
VGG_MEAN = [103.939, 116.779, 123.68] 

class Vgg16():
    def __init__(self, vgg16_path=None):
        if vgg16_path is None:
            # 返回当前工作目录
            vgg16_path = os.path.join(os.getcwd(), "vgg16.npy")
            # 遍历其内键值对，导入模型参数
            self.data_dict = np.load(vgg16_path, encoding='latin1').item() 

    def forward(self, images):
        
        print("build model started")
        # 获取前向传播开始时间
        start_time = time.time()
        # 逐个像素乘以255
        rgb_scaled = images * 255.0
        # 从GRB转换彩色通道到BRG
        red, green, blue = tf.split(rgb_scaled,3,3)
        # 减去每个通道的像素平均值，这种操作可以移除图像的平均亮度值
        # 该方法常用在灰度图像上
        bgr = tf.concat([     
            blue - VGG_MEAN[0],
            green - VGG_MEAN[1],
            red - VGG_MEAN[2]],3)

        # 构建VGG的16层网络（包含5段卷积，3层全连接），并逐层根据命名空间读取网络参数

        # 第一段卷积，含有两个卷积层，后面接最大池化层，用来缩小图片尺寸
        self.conv1_1 = self.conv_layer(bgr, "conv1_1")
        # 传入命名空间的name，来获取该层的卷积核和偏置，并做卷积运算，最后返回经过激活函数后的值
        self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2")
        # 根据传入的pooling名字对该层做相应的池化操作
        self.pool1 = self.max_pool_2x2(self.conv1_2, "pool1")

        # 第二段卷积，包含两个卷积层，一个最大池化层
        self.conv2_1 = self.conv_layer(self.pool1, "conv2_1")
        self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2")
        self.pool2 = self.max_pool_2x2(self.conv2_2, "pool2")


        #第三段卷积，包含三个卷积层，一个最大池化层
        self.conv3_1 = self.conv_layer(self.pool2, "conv3_1")
        self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2")
        self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3")
        self.pool3 = self.max_pool_2x2(self.conv3_3, "pool3")

        # 第四段卷积，包含三个卷积层，一个最大池化层
        self.conv4_1 = self.conv_layer(self.pool3, "conv4_1")
        self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2")
        self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3")
        self.pool4 = self.max_pool_2x2(self.conv4_3, "pool4")

        # 第五段卷积，包含三个卷积层，一个最大池化层
        self.conv5_1 = self.conv_layer(self.pool4, "conv5_1")
        self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2")
        self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3")
        self.pool5 = self.max_pool_2x2(self.conv5_3, "pool5")

        # 第六层全连接
        # 根据命名空间name做加权求和运算
        self.fc6 = self.fc_layer(self.pool5, "fc6")
        # 经过relu激活函数
        self.relu6 = tf.nn.relu(self.fc6)

        # 第七层全连接
        self.fc7 = self.fc_layer(self.relu6, "fc7")
        self.relu7 = tf.nn.relu(self.fc7)

        # 第八层全连接
        self.fc8 = self.fc_layer(self.relu7, "fc8")
        self.prob = tf.nn.softmax(self.