学习记录：Inception-V3图片分类

是板栗啊

已于 2023-08-10 09:18:55 修改

阅读量1.3k

点赞数 1

分类专栏：人工智能领域学习记录文章标签：学习

于 2023-08-09 12:05:48 首次发布

本文链接：https://blog.csdn.net/qq_48383456/article/details/132184934

版权

人工智能领域学习记录专栏收录该内容

5 篇文章

订阅专栏

Inception-V3介绍

Inception-V3模型是谷歌在大型图像数据库ImageNet上训练好的一个图像分类模型，这个模型可以对1000种类别的图像进行分类，是一种用于实现ImageNet上大规模视觉任务的一种神经网络。Inception-V3反复使用了Inception Block，涉及到大量的卷积和池化，因此手动在ImageNet上训练Inception-V3，需要耗费大量的时间。

好在开发团队为我们提供了已经预训练好的模型，我们可以加载这个模型，来完成一些图片分类的任务。

Inception 模块基本思路

使用不同规格的卷积核，分别对输入进行处理，然后再将得到的各个结论摞在一起

Inception V3对于inception原生版本的优化

（1） Inception的原生版本（a）会带来很大的计算量，所以在（b）中进行优化：使用1x1卷积核先进行降维处理，减少计算量

（2）使用两个3x3的卷积核代替5x5卷积核（感受野都是一样的，都是5x5得到一个值），可以大大的减少参数量

（3）使用不对称的两个卷积核1xN、Nx1代替3x3卷积核，这样可以减少计算量

数据准备

预训练好的模型一共包括三个部分：

classify_image_graph_def.pb （用于存储Inception-V3的模型结构与参数）

imagenet_2012_challenge_label_map_proto.pbtxt （编号到字符串的对应关系）

imagenet_synset_to_human_label_map.txt （字符串到类别名的对应关系）

在进行预测之前，我们需要将后面两个关系合成为一个编号到类别名的对应关系，方便后续处理。

imagenet_2012_challenge_label_map_proto.pbtxt的内部结构：

imagenet_synset_to_human_label_map.txt的内部结构：

代码实现

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np

uid_to_human = {}   #从imagenet_synset_to_human_label_map文件中获取编号与人类可以理解的词汇的对应关系，并将其存入map中
for line in tf.gfile.GFile('imagenet_synset_to_human_label_map.txt').readlines():   #从这个文件中一行一行地读取
	items = line.strip().split('\t')    #使用split(\t)进行分割
	uid_to_human[items[0]] = items[1]

node_id_to_uid = {}   #从imagenet_2012_challenge_label_map_proto文件中获取数字与编号的对应关系，同样放入map中
for line in tf.gfile.GFile('imagenet_2012_challenge_label_map_proto.pbtxt').readlines():
	if line.startswith('  target_class:'):
		target_class = int(line.split(': ')[1])    #当这一行以target_class开头，则表示该行代表数字，将其存入一个暂时的变量中
	if line.startswith('  target_class_string:'):#当这一行以target_class_string开头，则表示该行代表编号
		target_class_string = line.split(': ')[1].strip('\n').strip('\"')
		node_id_to_uid[target_class] = target_class_string   #将对应关系存入map中

node_id_to_name = {}    #将上面的两个map整合为一个数字与人类理解词汇对应的关系，存入一个map
for key, value in node_id_to_uid.items():
	node_id_to_name[key] = uid_to_human[value]

def create_graph():   #加载模型
	with tf.gfile.FastGFile('classify_image_graph_def.pb', 'rb') as f:    #以二进制方式读取classify_image_graph_def.pb文件
		graph_def = tf.GraphDef()   #声明一个图 GraphDefine
		graph_def.ParseFromString(f.read())    #将读取的数据反序列化存储待Graph_def中，把图片的结构读取出来
		_ = tf.import_graph_def(graph_def, name='')   #将图从graph_def导入到当前默认图中

def classify_image(image, top_k=1):  #image:待分类的图片， top_k=1 :获取最高概率的一项
	image_data = tf.gfile.FastGFile(image, 'rb').read()

	create_graph()  

	with tf.Session() as sess:
		# 'softmax:0': A tensor containing the normalized prediction across 1000 labels，最后一个包含1000个标签的一层
		# 'pool_3:0': A tensor containing the next-to-last layer containing 2048 float description of the image  倒数第二层
		# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG encoding of the image 
		softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')#获取最后一层的tensor
		predictions = sess.run(softmax_tensor, feed_dict={'DecodeJpeg/contents:0': image_data})   #预测最后一层的数值
		predictions = np.squeeze(predictions)  #对最后一层数值进行排序

		top_k = predictions.argsort()[-top_k:]   #获取最大的数据
		for node_id in top_k:    #输出最大的数据
			human_string = node_id_to_name[node_id]
			score = predictions[node_id]
			print('%s (score = %.5f)' % (human_string, score))

classify_image('test1.png')  #调用函数进行图像分类

测试数据test1.png：