计算机视觉-图像检索与识别

最新推荐文章于 2024-07-22 11:37:15 发布

hhc68

最新推荐文章于 2024-07-22 11:37:15 发布

阅读量731

点赞数

文章标签：计算机视觉图像处理机器学习

本文链接：https://blog.csdn.net/sjdkljlkfc/article/details/125350965

版权

一原理解析

1 图像分类方法

视觉词袋模型( Bag-of-features )是当前计算机视觉领域中较为常用的图像表示方法。
视觉词袋模型来源于词袋模型(Bag-of-words)，词袋模型最初被用在文本分类中，将文档表示成特征矢量。它的基本思想是假定对于一个文本，忽略其词序和语法、句法, 仅仅将其看做是一些词汇的集合, 而文本中的每个词汇都是独立的。简单说就是讲每篇文档都看成一个袋子 (因为里面装的都是词汇，
所以称为词袋，Bag of words即因此而来)然后看这个袋子里装的都是些什么词汇，将其分类。
如果文档中猪、马、牛、羊、山谷、土地、拖拉机这样的词汇多些，而银行、大厦、汽车、公园这样的词汇少些, 我们就倾向于判断它是一篇描绘乡村的文档，而不是描述城镇的。
Bag of Feature也是借鉴了这种思路，只不过在图像中，我们抽出的不再是一个个word, 而是图像的关键特征Feature,所以研究人员将它更名为Bag of Feature.Bag of Feature在检索中的算法流程和分类几乎完全一样,唯一的区别在于，对于原始的BOF特征，也就是直方图向量，我们引入TF_IDF权值。

2 视觉单词

视觉单词是图像中的基本单元，它基于子块提取、基于特征点提取和基于对象提取。视觉单词的生成基于图像视觉特征进行（基于子块的视觉单词提取也最终落实到视觉特征上）
获取视觉词典：假定有N个图像，从每幅图像中检测得到一系列特征（如SIFT特征），可将这些SIFT特征看成图像中的单词。然后我们找到一些方法来寻找这些单词的代表（一般采用聚类算法），这些代表就构成了从N幅图像中提取的视觉单词。

3 Bag-of-features算法和过程

算法过程:
1提取图像特征
2对特征进行聚类，得到一部视觉字典( visual vocabulary )
3根据字典将图片表示成向量(直方图)
4把输入图片转化成视觉单词的频率直方图

1)提取图像特征
特征提取及描述主要是将一些具有代表性且区分性较强的全局或局部特征从图像中进行抽取，并对这些特征进行描述。
这些特征一般是类别之间差距比较明显的特征，可以将其与其他类别区分开，其次，这些特征还要求具有较好的稳定性，能够最大限度的在光照、视角、尺度、噪声以及各种外在因素变化的情况下保持稳定，不受其影响。这样即使在非常复杂的情况下，计算机也能通过这些稳定的特征很好的检测与识别出这个物体。
特征提取最简单且有效的方法就是规则网格方法，
该方法采用均匀网格对图像进行划分，从而得到图像的局部区域特征。
兴趣点检测方法是另一个有效的特征提取方法，兴趣点检测的基本思想是:
在人为判断一幅图像的类别时，首先捕捉到物体的整体轮廓特征，然后聚焦于物体与其他物体具有显著特征区别的地方，最后判断出图像的类别。即通过该物体与其他物体区别开的显著特征，进而判断图像的类别。
在提取完图像的特征后，下一步就要应用特征描述子来对抽取的图像特征进行描述，特征描述子所表示的特征向量一般在处理算法时会作为输入数据,因此，如果描述子具有一定的判别性及可区分性，则该描述子会在后期的图像处理过程中起着很大的作用。
其中，SIFT描述子是近年比较经典且被广泛应用的一种描述子。
SIFT会从图片上提取出很多特征点，每个特征点都是128维的向量，因此，如果图片足够多的话,我们会提取出一个巨大的特征向量库。

2训练字典（ visual vocabulary ）

在上面提取完SIFT特征的步骤后,利用K-means聚类算法将提取的SIFT特征聚类生成视觉词典。
K-means算法是度量样本间相似性的一种方法，该算法设置参数为K，把N个对象分成K个簇，簇内之间的相似度较高，而簇间的相似度较低。聚类中心有K个，视觉词典为K。构建视觉单词的过程如图所示。

提取完特征后，我们会采用一些聚类算法对这些特征向量进行聚类。最常用的聚类算法是k-means。
至于k-means中的k如何取,要根据具体情况来确定。另外，由于特征的数量可能非常庞大，这个聚类的过程也会非常漫长。聚类完成后，我们就得到了这k个向量组成的字曲，这k个向量有一个通用的表达，叫visual word.

3图片直方图表示
利用视觉词典中的词汇表示待分类图像。计算每幅图像中的SIFT特征到这K个视觉单词的距离，
其中距离最近的视觉单词为该SIFT特征对应的视觉单词。
通过统计每个单词在图像中出现的次数，将图像表示成一个K维数值向量，
如图所示，其中K=4，每幅图像用直方图进行描述。

4训练分类器
当我们得到每幅图片的直方图向量后，剩下的这一步跟以往的步骤是一样的。
无非是数据库图片的向量以及图片的标签，训练分类器模型。然后对需要预测的图片，我们仍然按照上述方法，提取SIFT特征，再根据字典量化直方图向量，用分类器模型对直方图向量进行分类。当然，也可以直接根据 KNN 算法对直方图向量做相似性判断。

二代码实现

1 具体代码

def train(self,featurefiles,k=100,subsampling=10):
        """ 用含有k个单词的 K-means 列出在 featurefiles 中的特征文件训练出一个词汇。对训练数据下采样可以加快训练速度 """
        
        nbr_images = len(featurefiles)
        # 从文件中读取特征
        descr = []
        descr.append(sift.read_features_from_file(featurefiles[0])[1])
        # 将所有的特征并在一起，以便后面进行 K-means 聚类
        descriptors = descr[0]
        for i in arange(1,nbr_images):
            descr.append(sift.read_features_from_file(featurefiles[i])[1])
            descriptors = vstack((descriptors,descr[i]))
            
        #K-means: 最后一个参数决定运行次数
        self.voc,distortion = kmeans(descriptors[::subsampling,:],k,1)
        self.nbr_words = self.voc.shape[0]
        
        # 遍历所有的训练图像，并投影到词汇上
        imwords = zeros((nbr_images,self.nbr_words))
        for i in range( nbr_images ):
            imwords[i] = self.project(descr[i])
        
        nbr_occurences = sum( (imwords > 0)*1 ,axis=0)
        
        self.idf = log( (1.0*nbr_images) / (1.0*nbr_occurences+1) )
        self.trainingdata = featurefiles
    
    def project(self,descriptors):
        """ 将描述子投影到词汇上，以创建单词直方图  """
        
        # 图像单词直方图
        imhist = zeros((self.nbr_words))
        words,distance = vq(descriptors,self.voc)
        for w in words:
            imhist[w] += 1
        
        return imhist
# -*- codeing =utf-8 -*-
# @Time : 2021/6/1 14:25
# @Author : ArLin
# @File : demo1.py
# @Software: PyCharm
# -*- coding: utf-8 -*-
import pickle
from PCV.imagesearch import vocabulary
from PCV.tools.imtools import get_imlist
from PCV.localdescriptors import sift
 
 
# 获取图像列表
imlist = get_imlist('datasets/')
nbr_images = len(imlist)
# 获取特征列表
featlist = [imlist[i][:-3] + 'sift' for i in range(nbr_images)]
 
 
# 提取文件夹下图像的sift特征
for i in range(nbr_images):
    sift.process_image(imlist[i], featlist[i])
 
 
# 生成词汇
voc = vocabulary.Vocabulary('test77_test')
voc.train(featlist, 37, 10)
 
 
# 保存词汇
# saving vocabulary
with open('BOW\\vocabulary.pkl', 'wb') as f:
    pickle.dump(voc, f)
print('vocabulary is:', voc.name, voc.nbr_words)

# -*- codeing =utf-8 -*-
# @Time : 2021/6/1 14:52
# @Author : ArLin
# @File : demo2.py
# @Software: PyCharm
import pickle
from PCV.imagesearch import imagesearch
from PCV.localdescriptors import sift
import sqlite3
from PCV.tools.imtools import get_imlist
 
 
# 获取图像列表
# imlist = get_imlist('E:/Python37_course/test7/first1000/')
imlist = get_imlist('datasets/')
nbr_images = len(imlist)
# 获取特征列表
featlist = [imlist[i][:-3] + 'sift' for i in range(nbr_images)]
 
 
# load vocabulary
# 载入词汇
'''with open('E:/Python37_course/test7/first1000/vocabulary.pkl', 'rb') as f:
    voc = pickle.load(f)'''
with open('BOW\\vocabulary.pkl', 'rb') as f:
    voc = pickle.load(f)
# 创建索引
indx = imagesearch.Indexer('testImaAdd.db', voc)
indx.create_tables()
 
 
# go through all images, project features on vocabulary and insert
# 遍历所有的图像，并将它们的特征投影到词汇上
for i in range(nbr_images)[:36]:
    locs, descr = sift.read_features_from_file(featlist[i])
    indx.add_to_index(imlist[i], descr)
# commit to database
# 提交到数据库
indx.db_commit()
 
 
con = sqlite3.connect('testImaAdd.db')
print(con.execute('select count (filename) from imlist').fetchone())
print(con.execute('select * from imlist').fetchone())

Searcher 类:
class Searcher(object):
    
    def __init__(self,db,voc):
        """ Initialize with the name of the database. """
        self.con = sqlite3.connect(db)
        self.voc = voc
    
    def __del__(self):
        self.con.close()
    
    def get_imhistogram(self,imname):
        """ Return the word histogram for an image. """
        
        im_id = self.con.execute(
            "select rowid from imlist where filename='%s'" % imname).fetchone()
        s = self.con.execute(
            "select histogram from imhistograms where rowid='%d'" % im_id).fetchone()
        
        # use pickle to decode NumPy arrays from string
        return pickle.loads(s[0])
    
    def candidates_from_word(self,imword):
        """ Get list of images containing imword. """
        
        im_ids = self.con.execute(
            "select distinct imid from imwords where wordid=%d" % imword).fetchall()
        return [i[0] for i in im_ids]
    
    def candidates_from_histogram(self,imwords):
        """ Get list of images with similar words. """
        
        # get the word ids
        words = imwords.nonzero()[0]
        
        # find candidates
        candidates = []
        for word in words:
            c = self.candidates_from_word(word)
            candidates+=c
        
        # take all unique words and reverse sort on occurrence
        tmp = [(w,candidates.count(w)) for w in set(candidates)]
        tmp.sort(key=cmp_to_key(lambda x,y:operator.gt(x[1],y[1])))
        tmp.reverse()
        
        # return sorted list, best matches first    
        return [w[0] for w in tmp]
    
    def query(self,imname):
        """ Find a list of matching images for imname. """
        
        h = self.get_imhistogram(imname)
        candidates = self.candidates_from_histogram(h)
        
        matchscores = []
        for imid in candidates:
            # get the name
            cand_name = self.con.execute(
                "select filename from imlist where rowid=%d" % imid).fetchone()
            cand_h = self.get_imhistogram(cand_name)
            cand_dist = sqrt( sum( self.voc.idf*(h-cand_h)**2 ) )
            matchscores.append( (cand_dist,imid) )
        
        # return a sorted list of distances and database ids
        matchscores.sort()
        return matchscores
    
    def get_filename(self,imid):
        """ Return the filename for an image id. """
        
        s = self.con.execute(
            "select filename from imlist where rowid='%d'" % imid).fetchone()
        return s[0]
 
 
def tf_idf_dist(voc,v1,v2):
    
    v1 /= sum(v1)
    v2 /= sum(v2)
    
    return sqrt( sum( voc.idf*(v1-v2)**2 ) )
 
 
def compute_ukbench_score(src,imlist):
    """ Returns the average number of correct
        images on the top four results of queries. """
        
    nbr_images = len(imlist)
    pos = zeros((nbr_images,4))
    # get first four results for each image
    for i in range(nbr_images):
        pos[i] = [w[1]-1 for w in src.query(imlist[i])[:4]]
    
    # compute score and return average
    score = array([ (pos[i]//4)==(i//4) for i in range(nbr_images)])*1.0
    return sum(score) / (nbr_images)
# -*- codeing =utf-8 -*-
# @Time : 2021/6/1 15:29
# @Author : ArLin
# @File : demo3.py
# @Software: PyCharm
 
 
import pickle
from PCV.localdescriptors import sift
from PCV.imagesearch import imagesearch
from PCV.geometry import homography
from PCV.tools.imtools import get_imlist
 
 
# load image list and vocabulary
# 载入图像列表
imlist = get_imlist('datasets/') # 存放数据集的路径
nbr_images = len(imlist)
# 载入特征列表
featlist = [imlist[i][:-3] + 'sift' for i in range(nbr_images)]
 
# 载入词汇
with open('BOW\\vocabulary.pkl', 'rb') as f: # 存放模型的路径
    voc = pickle.load(f)
src = imagesearch.Searcher('testImaAdd.db', voc)
 
 
# index of query image and number of results to return
# 查询图像索引和查询返回的图像数
q_ind =18
nbr_results = 5
 
 
# regular query
# 常规查询(按欧式距离对结果排序)
res_reg = [w[1] for w in src.query(imlist[q_ind])[:nbr_results]]
print('top matches (regular):', res_reg)
 
 
# load image features for query image
# 载入查询图像特征
q_locs, q_descr = sift.read_features_from_file(featlist[q_ind])
fp = homography.make_homog(q_locs[:, :2].T)
 
 
# RANSAC model for homography fitting
# 用单应性进行拟合建立RANSAC模型
model = homography.RansacModel()
rank = {}
 
 
# load image features for result
# 载入候选图像的特征
for ndx in res_reg[1:]:
    locs, descr = sift.read_features_from_file(featlist[ndx])  # because 'ndx' is a rowid of the DB that starts at 1
    # get matches
    # 获取匹配数 # get matches执行完后会出现两张图片
    matches = sift.match(q_descr, descr)
    ind = matches.nonzero()[0]
    ind2 = matches[ind]
    tp = homography.make_homog(locs[:, :2].T)
    # compute homography, count inliers. if not enough matches return empty list
    # 计算单应性，对内点技术。如果没有足够的匹配书则返回空列表
    try:
        H, inliers = homography.H_from_ransac(fp[:, ind], tp[:, ind2], model, match_theshold=4)
    except:
        inliers = []
    # store inlier count
    rank[ndx] = len(inliers)
 
 
# sort dictionary to get the most inliers first
# 将字典排序，以首先获取最内层的内点数
sorted_rank = sorted(rank.items(), key=lambda t: t[1], reverse=True)
res_geom = [res_reg[0]] + [s[0] for s in sorted_rank]
print('top matches (homography):', res_geom)
 
 
# 显示查询结果
imagesearch.plot_results(src, res_reg[:8])  # 常规查询
imagesearch.plot_results(src, res_geom[:8])  # 重排后的结果

2 运行结果

在这里插入图片描述

三总结

Bag of Feature 在提取特征时不需要相关的 label 进行学习，因此是一种弱监督的学习方法。当然，没有什么方法会是十全十美的，Bag of Feature 也存在一个明显的不足，那就是它完全没有考虑到特征之间的位置关系，而位置信息对于人理解图片来说，作用是很明显的。有不少学者也提出了针对该缺点的改进，关于改进的方法，这里就不再介绍了。