基于BOW的图像检索

最新推荐文章于 2022-06-18 16:40:55 发布

4oo

最新推荐文章于 2022-06-18 16:40:55 发布

阅读量567

点赞数

本文链接：https://blog.csdn.net/weixin_45583603/article/details/106162671

版权

文章目录

一、图像检索

1.1 图像检索原理概述

图像检索，简单的说便是从图片检索数据库中检索出满足条件的图片，图像检索技术的研究根据描述图像内容方式的不同可以分为两类：一类是基于文本的图像检索技术，一类为基于内容的图像检索技术。它最早用于对于文章内容的检索，原理是将文本看作是单词的集合，不考虑其中的语法，上下文等等。通过建立词典，对每个单词出现次数进行统计，以便得到文本内容的分类。计算机视觉的专家从中获得灵感，将其用于图像的检索中，就有了Bag Of Features。

1.2 实现图像检索的步骤

1.特征提取
2.采用K-means算法学习“视觉词典”（visual vocabulary）
3.针对输入特征集，根据视觉词典进行量化
4.把输入图像，根据TF-IDF转化成视觉单词（ visual words）的频率直方图
5.构造特征到图像的倒排表，通过倒排表快速索引相关图像
6.根据索引结果进行直方图匹配

特征提取
之前学习了关于特征提取的几个方式，例如sift，Harris脚点。这里通过SIFT来提取图像的特征点。类似BOW，我们将图像看成一个由各种图像块组成的集合，通过特征提取，获得图像的关键图像特征。

学习视觉词典
通过步骤1，得到了多张图像的特征点。这些特征并没有进行分类，其中有的特征点之间是极其相似，所以通过K-means聚类算法，将我们提取出来的特征点进行分类处理。
K-Means算法基本流程:
随机初始化 K 个聚类中心
重复下述步骤直至算法收敛:

对应每个特征，根据距离关系赋值给某个中心/类别
对每个类别，根据其对应的特征集重新计算聚类中心

聚类是学习视觉词典的重点操作。将聚类出来的聚类中心称为视觉单词(codevector)。而将视觉单词组成的集合称为视觉词典/码本(codebook)。
注意：关于码本的大小

如果我们做出来的码本规模太小，就会出现，我们的视觉单词不能包括所有可能的情况。
相反的，如果我们做出来的码本规模过大，会使得计算量增加，且有过拟合现象出现。

根据TF-IDF转化成频率直方图
在这里插入图片描述

其中分子表示某个特征在总的特征出现的次数，分母表示总特征的数量，所以tf表示某个特征出现的频率。

在这里插入图片描述

其中的分子表示全部的图像数量，分母表示某个特征在总的图像下出现的次数。

在转换为频率直方图时候，使用到TF-IDF（即词频与逆文档频率乘积）作为权值。引入这个权值的目的是为了降低一些重复特征所带来的影响。比如在BOW中，一些常用词汇比如the，it，do等等词汇，不能体现文本内容特征，但是出现频率却很高，利用tf-idf可以降低这种不必要词汇的影响。同理，在BOF图像搜索中，图像之间也会有这样的无意义的特征出现，所以需要降低这类特征的权值。

二、具体实现

2.1 选用数据集

该数据集共包含150张图片，展示部分数据集如下：
在这里插入图片描述

2.2 实现代码

2.2.1 生成代码所需要的模型文件

生成视觉词典：

# -*- coding: utf-8 -*-
import pickle
from PCV.imagesearch import vocabulary
from PCV.tools.imtools import get_imlist
from PCV.localdescriptors import sift

#获取图像列表
imlist = get_imlist('C:/meitu/')
nbr_images = len(imlist)
#获取特征列表
featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]

#提取文件夹下图像的sift特征
for i in range(nbr_images):
    sift.process_image(imlist[i], featlist[i])

#生成词汇
voc = vocabulary.Vocabulary('ukbenchtest')
voc.train(featlist, 30, 10)
#保存词汇
# saving vocabulary
with open('C:/meitu/vocabulary.pkl', 'wb') as f:
    pickle.dump(voc, f)
print ('vocabulary is:', voc.name, voc.nbr_words)

对应151张图像生成了151个sift文件，并且生成了视觉词典：
kmeans聚类得到的pkl文件：

2.2.2 将模型数据导入数据库

# -*- coding: utf-8 -*-
import pickle

from PCV.imagesearch import imagesearch
from PCV.localdescriptors import sift
import sqlite3
from PCV.tools.imtools import get_imlist

# 获取图像列表
# imlist = get_imlist('E:/Python37_course/test7/first1000/')
imlist = get_imlist('C:/meitu/')
nbr_images = len(imlist)
# 获取特征列表
featlist = [imlist[i][:-3] + 'sift' for i in range(nbr_images)]

# load vocabulary
# 载入词汇
'''with open('E:/Python37_course/test7/first1000/vocabulary1.pkl', 'rb') as f:
    voc = pickle.load(f)'''
with open('C:/meitu//vocabulary4.pkl', 'rb') as f:
    # voc = pickle.load(f)
    voc = pickle.load(f, encoding='iso-8859-1')
# 创建索引
indx = imagesearch.Indexer('testImaAdd4.db', voc)
indx.create_tables()
# go through all images, project features on vocabulary and insert
# 遍历所有的图像，并将它们的特征投影到词汇上

# for i in range(nbr_images)[:1000]:
for i in range(nbr_images)[:1000]:
    locs, descr = sift.read_features_from_file(featlist[i])
    indx.add_to_index(imlist[i], descr)
# commit to database
# 提交到数据库
indx.db_commit()

con = sqlite3.connect('testImaAdd4.db')
print(con.execute('select count (filename) from imlist').fetchone())
print(con.execute('select * from imlist').fetchone())

运行后在代码文件夹之下生成了一个数据库文件：

2.2.3 进行测试

# -*- coding: utf-8 -*-
import pickle
# import sift
from PCV.imagesearch import imagesearch
from PCV.geometry import homography
from PCV.tools.imtools import get_imlist
from PIL import Image
# from pylab import *
from PCV.localdescriptors import sift
from PCV.localdescriptors import harris

# load image list and vocabulary
# 载入图像列表
# imlist = get_imlist('E:/Python37_course/test7/first1000/')
imlist = get_imlist('C:/meitu//')
nbr_images = len(imlist)
# 载入特征列表
featlist = [imlist[i][:-3] + 'sift' for i in range(nbr_images)]

# 载入词汇
'''with open('E:/Python37_course/test7/first1000/vocabulary.pkl', 'rb') as f:
    voc = pickle.load(f)'''
with open('C:/meitu//vocabulary1.pkl', 'rb') as f:
    voc = pickle.load(f, encoding='iso-8859-1')

src = imagesearch.Searcher('testImaAdd1.db', voc)

# index of query image and number of results to return
# 查询图像索引和查询返回的图像数
q_ind = 0
nbr_results = 20

# regular query
# 常规查询(按欧式距离对结果排序)
res_reg = [w[1] for w in src.query(imlist[q_ind])[:nbr_results]]
print('top matches (regular):', str(res_reg))

# load image features for query image
# 载入查询图像特征
q_locs, q_descr = sift.read_features_from_file(featlist[q_ind])
fp = homography.make_homog(q_locs[:, :2].T)

# RANSAC model for homography fitting
# 用单应性进行拟合建立RANSAC模型
model = homography.RansacModel()
rank = {}

# load image features for result
# 载入候选图像的特征
for ndx in res_reg[1:]:
    locs, descr = sift.read_features_from_file(featlist[ndx])  # because 'ndx' is a rowid of the DB that starts at 1
    # get matches
    matches = sift.match(q_descr, descr)
    ind = matches.nonzero()[0]
    ind2 = matches[ind]
    tp = homography.make_homog(locs[:, :2].T)
    # compute homography, count inliers. if not enough matches return empty list
    try:
        H, inliers = homography.H_from_ransac(fp[:, ind], tp[:, ind2], model, match_theshold=4)
    except:
        inliers = []
    # store inlier count
    rank[ndx] = len(inliers)

# sort dictionary to get the most inliers first
sorted_rank = sorted(rank.items(), key=lambda t: t[1], reverse=True)
res_geom = [res_reg[0]] + [s[0] for s in sorted_rank]
print('top matches (homography):', res_geom)

# 显示查询结果
imagesearch.plot_results(src, res_reg[:5])  # 常规查询
imagesearch.plot_results(src, res_geom[:5])  # 重排后的结果

主要是对比两种查询方式：

常规查询
用单应性进行拟合建立RANSAC模型

测试图片：
在这里插入图片描述
控制台输出的查询到的图片索引值：

2.3 不同k-means维度的实现结果

不同k-means的维度分别生成对应的pkl文件和数据库文件：
在这里插入图片描述

kmeans聚类时选取维度不同得到不同结果
测试图片：

2.3.1 30维

常规查询：
在这里插入图片描述
RANSAC查询：

2.3.2 50维

常规查询：
在这里插入图片描述
RANSAC查询：

2.3.3 1000维

常规查询：
在这里插入图片描述
RANSAC查询：

小结：

通过比较不同K-means聚类维度检索出来的匹配结果中可以看出，维度较小的匹配结果比维度高的错误率高。维度越高，图片检索的准确度越高。
观察发现常规查询的查询结果匹配度高于重排后的查询，重排后查询会降低准确率。可能原因是重排用到了Ransac算法，Ransac算法能有效剔除错配，但它同样可能删去正确的匹配特征，由此导致了检索结果不准确。