opencv-python实现图像拼接和图像识别（机器视觉）

最新推荐文章于 2024-06-13 20:10:26 发布

llzmgjzhy

最新推荐文章于 2024-06-13 20:10:26 发布

阅读量4.9k

点赞数 6

分类专栏：机器视觉 opencv

本文链接：https://blog.csdn.net/wgz52157/article/details/117607646

版权

机器视觉同时被 2 个专栏收录

1 篇文章 3 订阅

订阅专栏

opencv

1 篇文章 0 订阅

订阅专栏

三.算法实现

3.1.图像拼接

3.1.1思路

提取待拼接图片的特征点、特征描述符，找到待拼接图片的对应的位置点，进行匹配
对图片进行柱面投影，产生图像的扭曲效果
设置阈值，当匹配点个数达到阈值后，进行图像拼接
在拼接前，先对第二张图片进行透视变化，利用已经找到的关键点，使得第二张图片透视旋转到与第一张图片可以进行拼接的角度
加权处理，使拼接图接缝处平滑过渡

3.1.2 实现方法

提取特征点、描述符：使用opencv创建SURF对象，Hessian算法检测关键点。调节SURF对象的参数，在可接受范围内减少关键点、减少获取的向量的维度、不检测关键点的方向，以便加快提取速度

cv2.xfeatures2d.SURF_create ([hessianThreshold[, nOctaves[, nOctaveLayers[, extended[, upright]]]]])
#该函数可生成SURF对象，改变hessian Threshold来控制关键点的数量
cv2.SURF.detectAndCompute(image, mask[, descriptors[, useProvidedKeypoints]])
#用于计算图片的关键点和描述符

柱面投影：在全景图的拼接中，为提高视觉可读性，对图片进行适当的柱面投影，使得拼接更平滑

def cylindrical_projection(img , f) :
   rows = img.shape[0]
   cols = img.shape[1]

   blank = np.zeros_like(img)
   center_x = int(cols / 2)
   center_y = int(rows / 2)
   
   for  y in range(rows):
       for x in range(cols):
           theta = math.atan((x- center_x )/ f)
           point_x = int(f * math.tan( (x-center_x) / f) + center_x)
           point_y = int( (y-center_y) / math.cos(theta) + center_y)
           
           if point_x >= cols or point_x < 0 or point_y >= rows or point_y < 0:
               pass
           else:
               blank[y , x, :] = img[point_y , point_x ,:]
   return blank

关键点匹配：利用已经提取好的关键点和特征向量进行匹配，为加快匹配速度，使用FLANN的单应性匹配

flann=cv2.FlannBasedMatcher(indexParams,searchParams)
match=flann.knnMatch(descrip1,descrip2,k=2)
#快速匹配器，返回值包括两张图的描述符距离、训练图（第二张）的描述符索引、查询的图（第一张）的描述符索引
M,mask=cv2.findHomography(srcPoints, dstPoints[, method[, ransacReprojThreshold[, mask]]])
#实现单应性匹配，返回的M是一个矩阵，即对关键点srcPoints做M变换能变到dstPoints的位置

透视变换：对第二张图片进行透视变换，透视旋转到与第一张图可以进行拼接的角度

warpImg=cv2.warpPerspective(src,np.linalg.inv(M),dsize[,dst[,flags[,borderMode[,borderValue]]]])
#对图片进行透视变换，变换视角。src是要变换的图片，np.linalg.inv(M)是单应性矩阵M的逆矩阵

加权处理：将第一张图叠在左边，对重叠区进行加权处理，重叠部分，离左边近，左边图的权重就高，右边亦然，两者相加，使得平滑过渡

3.1.3完整代码

import cv2
import numpy as np
import math
from matplotlib import pyplot as plt
from skimage.transform import resize

#定义最少匹配点数目
MIN = 10

img1 = cv2.imread('Desktop/1.png') 
img2 = cv2.imread('Desktop/2.png') 


#圆柱投影
#f为圆柱半径，每次匹配需要调节f
def cylindrical_projection(img , f) :
   rows = img.shape[0]
   cols = img.shape[1]
   
   blank = np.zeros_like(img)
   center_x = int(cols / 2)
   center_y = int(rows / 2)
   
   for  y in range(rows):
       for x in range(cols):
           theta = math.atan((x- center_x )/ f)
           point_x = int(f * math.tan( (x-center_x) / f) + center_x)
           point_y = int( (y-center_y) / math.cos(theta) + center_y)
           
           if point_x >= cols or point_x < 0 or point_y >= rows or point_y < 0:
               pass
           else:
               blank[y , x, :] = img[point_y , point_x ,:]
   return blank

#创建SURF对象
surf=cv2.xfeatures2d.SURF_create(100,nOctaves=4,extended=False,upright=False)

#柱面投影
img1 = cylindrical_projection(img1,1500)
img2 = cylindrical_projection(img2,1500)

#提取特征点、特征描述符
kp1,descrip1=surf.detectAndCompute(img1,None)
kp2,descrip2=surf.detectAndCompute(img2,None)

#FLANN快速匹配器
FLANN_INDEX_KDTREE = 0
indexParams = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
searchParams = dict(checks=50)

flann=cv2.FlannBasedMatcher(indexParams,searchParams)
match=flann.knnMatch(descrip1,descrip2,k=2)


#获取符合条件的匹配点
good=[]
for i,(m,n) in enumerate(match):
        if(m.distance<0.75*n.distance):
                good.append(m)

if len(good)>MIN:
        src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1,1,2)
        ano_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1,1,2)

        #实现单应性匹配，返回关键点srcPoints做M变换能变到dstPoints的位置
        M,mask=cv2.findHomography(src_pts,ano_pts,cv2.RANSAC,5.0)
        #对图片进行透视变换，变换视角。src是要变换的图片，np.linalg.inv(M)是单应性矩阵M的逆矩阵
        warpImg = cv2.warpPerspective(img2, np.linalg.inv(M), (img1.shape[1]+img2.shape[1], img2.shape[0]))
        rows,cols=img1.shape[:2]

        #图像融合，进行加权处理

        for col in range(0,cols):
            if img1[:, col].any() and warpImg[:, col].any():#开始重叠的最左端
                left = col
                break
        for col in range(cols-1, 0, -1):
            if img1[:, col].any() and warpImg[:, col].any():#重叠的最右一列
                right = col
                break

        res = np.zeros([rows, cols, 3], np.uint8)
        for row in range(0, rows):
            for col in range(0, cols):
                if not img1[row, col].any():#如果没有原图，用旋转的填充
                    res[row, col] = warpImg[row, col]
                elif not warpImg[row, col].any():
                    res[row, col] = img1[row, col]
                else:
                    srcImgLen = float(abs(col - left))
                    testImgLen = float(abs(col - right))
                    alpha = srcImgLen / (srcImgLen + testImgLen)
                    res[row, col] = np.clip(img1[row, col] * (1-alpha) + warpImg[row, col] * alpha, 0, 255)

        warpImg[0:img1.shape[0], 0:img1.shape[1]]=res
        img4=cv2.cvtColor(warpImg,cv2.COLOR_BGR2RGB)
        plt.imshow(img4,),plt.show()
        cv2.imwrite("test12.png",warpImg)
        
else:
        print("not enough matches!")

3.1.4 实验结果

在这里插入图片描述

3.1.5 实验结果分析

对只有水平位移的图片进行拼接时，无需使用柱面投影，拼接效果较佳。

实现全景图拼接时，因为不同的图片使用柱面投影的半径不同，导致在最后拼接时出现部分视觉上的一些不平滑，且因为图片分辨率及拼接算法的限制，导致部分区域出现重影及缺失。在以后的迭代版本可以进行优化

3.2 目标识别

3.2.1 思路

使用selective search网络对图片进行目标检测，记录所有预测结果,使用Resnet进行目标识别
对所有预测结果进行判断，如果符合预期，则记录左上角x，左上角y，宽和高
利用非极大抑制得到最精确的取景框
在原始图上绘制符合预期的待候选区域

3.2.2 实现方法

调用Resnet网络

ResNet网络是参考了VGG19网络，在其基础上进行了修改，并通过短路机制加入了残差单元，如图5所示。变化主要体现在ResNet直接使用stride=2的卷积做下采样，并且用global average pool层替换了全连接层。ResNet的一个重要设计原则是：当feature map大小降低一半时，feature map的数量增加一倍，这保持了网络层的复杂度。
```
model = ResNet50(weights='imagenet')#载入ResNet50网络模型，并使用在ImageNet ILSVRC比赛中已经训练好的权重
target_size = (224, 224)#ResNet50的输入大小固定为(224,224)，其他大小会报错
top_n=1#只输出最高概率对应的一类
img = cv2.imread("1.jpg")#待预测图像（三通道图像）
preds = predict(model, target_size, top_n, img)#预测结果
```
使用selective search网络对图片进行目标检测

在图像中寻找物体，可以依据多种特征，例如颜色、纹理、形状等。然而，这些特征并不能通用地用来寻找所有的物体，物体在图像中的尺度也大小不一。为了兼顾各种尺度与特征，selective search的做法是先寻找尺寸较小的区域，然后逐渐将特征相近的小尺度的区域合并为大尺度区域，从而得到内部特征一致的物体图像。
```
# 通过调节三个参数来实现目标的精准检测
# 选择性搜索
img_select, regions = selectivesearch.selective_search(
    img, scale=300, sigma=0.7, min_size=200)

# 计算原始候选区域数量
temp = set()
for i in range(img_select.shape[0]):
    for j in range(img_select.shape[1]):
        temp.add(img_select[i, j, 3])
```

使用非极大抑制：为避免同一个物体出现多个检测框，则对检测框进行排序。如果一个物体存在多个检测框，按照得分排序，取得分最高的检测框。接下来计算其他框与当前框的重合程度，如果程度大于阈值就删除。本次实验阈值取0，不会出现重叠的检测框

def NMS(data, thresh):
    # 计算四角位置
    x1 = data[:, 0]
    w = data[:, 2]
    x2 = x1 + w - 1
    y1 = data[:, 1]
    h = data[:, 3]
    y2 = y1 + h - 1

    # 精确度
    scores = data[:, 4]
    # 检测框的面积
    areas = w * h  

    order = scores.argsort()[::-1]
    # 结果对应的取景框集合
    keep = []  

    while order.size > 0:
        index = order[0]
        keep.append(index)
        ix1 = np.maximum(x1[index], x1[order[1:]])
        ix2 = np.minimum(x2[index], x2[order[1:]])
        iy1 = np.maximum(y1[index], y1[order[1:]])
        iy2 = np.minimum(y2[index], y2[order[1:]])
        iw = np.maximum(0.0, ix2 - ix1 + 1)
        ih = np.maximum(0.0, iy2 - iy1 + 1)
        inter = iw * ih
        # 计算IoU
        ratio = inter / (areas[index] + areas[order[1:]] - inter)
        inds = np.where(ratio <= thresh)[0]
        # 保留IoU小于阈值的inds
        order = order[inds + 1]
    return keep

通过selective search对图片进行目标检测后，使用Resnet网络进行目标识别，判断是否为预期目标，满足则计入输出数组。设置一个阈值，保留置信度排名在阈值内的目标

# 创建一个集合 记录每一个元素的左上角x，左上角y,宽,高，表示候选区域的边框Repository = set()# 载入ResNet50网络模型，并使用在ImageNet ILSVRC比赛中已经训练好的权重for r in regions:    # 排除重复的候选区    if r['rect'] in Repository:        continue    # 根据具体图片的大小，调节区域大小    if r['size'] < 5000:        continue    # 排除扭曲严重的候选区域边框    x, y, w, h = r['rect']    # 调节宽高比或者高宽比进行候选框筛选    if w / h > 1.5 or h / w > 1.5:        continue    # 切割图像，用于输入到resnet    img_cut = img[y:y + h, x:x + w]    # 将切割后图像输入到resnet，并保留概率前num的预测结果，通过调整num来最大限度找到需要的目标    num = 15    pres = resnet.predict(model, target_shape, num, img_cut)    for i in range(num):        # 保存需要的预测结果        if pres[i][1] == 'book_jacket' :            # 设置最小置信度，减小识别误差            if pres[i][2] < 0.03:                continue            Repository.add(r['rect'] + pres[i][1:])

在原始图像上绘制满足条件的候选区域边框

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))b, g, r = cv2.split(img)img_rgb = cv2.merge([r, g, b])for i in final:    x, y, w, h = i[:4]    text = i[4] + '\n'    text += str(i[5])    print(x, y, w, h)    rect = mpatches.Rectangle(        (x, y), w, h, fill=False, edgecolor='red', linewidth=1)    ax.add_patch(rect)    ax.annotate(text, (x, y+10))

3.2.3 完整代码

import numpy as npimport cv2import matplotlib.pyplot as pltimport matplotlib.patches as mpatchesimport selectivesearchimport resnet_for_image_classify as resnetfrom keras.applications.resnet import ResNet50, preprocess_input, decode_predictions# 非极大值抑制函数def NMS(data, thresh):    # 计算四角位置    x1 = data[:, 0]    w = data[:, 2]    x2 = x1 + w - 1    y1 = data[:, 1]    h = data[:, 3]    y2 = y1 + h - 1    # 精确度    scores = data[:, 4]    # 检测框的面积    areas = w * h      order = scores.argsort()[::-1]    # 结果对应的取景框集合    keep = []      while order.size > 0:        index = order[0]        keep.append(index)        ix1 = np.maximum(x1[index], x1[order[1:]])        ix2 = np.minimum(x2[index], x2[order[1:]])        iy1 = np.maximum(y1[index], y1[order[1:]])        iy2 = np.minimum(y2[index], y2[order[1:]])        iw = np.maximum(0.0, ix2 - ix1 + 1)        ih = np.maximum(0.0, iy2 - iy1 + 1)        inter = iw * ih        # 计算IoU        ratio = inter / (areas[index] + areas[order[1:]] - inter)        inds = np.where(ratio <= thresh)[0]        # 保留IoU小于阈值的inds        order = order[inds + 1]    return keep# 加载图片数据img = cv2.imread("easy123.png")# Resnet 固定图片大小target_shape = (224, 224)# 通过调节三个参数来实现目标的精准检测# 选择性搜索img_select, regions = selectivesearch.selective_search(    img, scale=300, sigma=0.7, min_size=200)# 计算原始候选区域数量temp = set()for i in range(img_select.shape[0]):    for j in range(img_select.shape[1]):        temp.add(img_select[i, j, 3])# 创建一个集合 记录每一个元素的左上角x，左上角y,宽,高，表示候选区域的边框Repository = set()# 载入ResNet50网络模型，并使用在ImageNet ILSVRC比赛中已经训练好的权重model = ResNet50(weights='imagenet')for r in regions:    # 排除重复的候选区    if r['rect'] in Repository:        continue    # 根据具体图片的大小，调节区域大小    if r['size'] < 5000:        continue    # 排除扭曲严重的候选区域边框    x, y, w, h = r['rect']    # 调节宽高比或者高宽比进行候选框筛选    if w / h > 1.5 or h / w > 1.5:        continue    # 切割图像，用于输入到resnet    img_cut = img[y:y + h, x:x + w]    # 将切割后图像输入到resnet，并保留概率前num的预测结果，通过调整num来最大限度找到需要的目标    num = 15    pres = resnet.predict(model, target_shape, num, img_cut)    for i in range(num):        # 保存需要的预测结果        if pres[i][1] == 'book_jacket' :            # 设置最小置信度，减小识别误差            if pres[i][2] < 0.03:                continue            Repository.add(r['rect'] + pres[i][1:])# 利用非极大值抑制得到最精确的取景框arr = []for i in Repository:    nw = []    nw[:4] = i[:4]    nw.append(i[5])    arr.append(nw)data = np.array(arr)# 设置非极大抑制为0，禁止重叠框的出现keep = NMS(data, 0)# 利用非极大值抑制进行取景框的筛选arr = []final = []for i in Repository:    nw = []    nw[:6] = i[:]    arr.append(nw)for i in keep:    final.append(arr[i])# 在原始图像上绘制候选区域边框fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))b, g, r = cv2.split(img)img_rgb = cv2.merge([r, g, b])for i in final:    x, y, w, h = i[:4]    text = i[4] + '\n'    text += str(i[5])    print(x, y, w, h)    rect = mpatches.Rectangle(        (x, y), w, h, fill=False, edgecolor='red', linewidth=1)    ax.add_patch(rect)    ax.annotate(text, (x, y+10))plt.show()

3.2.4 实验结果

在这里插入图片描述

3.2.5 实验结果分析

简单示例只进行了水平拼接，且原图片分辨率较高，识别效果较好，至于右上角的识别框的大小问题应该通过调节参数可以解决。

困难实例为了加快拼接速度，使用了压缩后的图片，分辨率严重降低，导致视觉效果不好，但在目标检测的过程中效果较好，基本将所有需要的目标都完成了识别

llzmgjzhy

关注

6
点赞
踩
70

收藏

觉得还不错? 一键收藏
9
评论
opencv-python实现图像拼接和图像识别（机器视觉）

三.算法实现3.1.图像拼接3.1.1思路提取待拼接图片的特征点、特征描述符，找到待拼接图片的对应的位置点，进行匹配对图片进行柱面投影，产生图像的扭曲效果设置阈值，当匹配点个数达到阈值后，进行图像拼接在拼接前，先对第二张图片进行透视变化，利用已经找到的关键点，使得第二张图片透视旋转到与第一张图片可以进行拼接的角度加权处理，使拼接图接缝处平滑过渡3.1.2 实现方法提取特征点、描述符：使用opencv创建SURF对象，Hessian算法检测关键点。调节SURF对象的参数，在可接受范围
复制链接

扫一扫

专栏目录