Python计算机视觉——第三章图像到图像的映射

最新推荐文章于 2024-08-27 13:42:00 发布

Eaton_chen

最新推荐文章于 2024-08-27 13:42:00 发布

阅读量668

点赞数 1

分类专栏：计算机视觉

本文链接：https://blog.csdn.net/weixin_44395365/article/details/118278180

版权

计算机视觉专栏收录该内容

10 篇文章 8 订阅

订阅专栏

文章目录

3.1 单应性变换
- 3.1.1 直接线性变换算法
- 3.1.2 仿射变换
3.2 图像扭曲

3.1 单应性变换

单应性变换是将一个平面内的点映射到另一个平面内的二维投影变换，平面是指图像或三维中的平面表面，单应性变换在图像配准、图像纠正、纹理扭曲和创建全景图像等具有很强的变换性，单应性变换如下矩阵变换定义。
$\left[ \begin{matrix}x'\\\\y'\\\\w' \end{matrix} \right]=\left[ \begin{matrix} h_1 & h_2 & h_3\\ \\ h_4 & h_5 &h_6 \\ \\ h_7&h_8&H_9 \end{matrix} \right]=\left[ \begin{matrix}x\\\\y\\\\w \end{matrix} \right]$ 或 $x^{'} = H x$ 平面中的点用坐标来表示，我们使用齐次坐标来表示。《计算机图形学(OpenGL版)》的作者F.S. Hill Jr.曾说过一句话：“齐次坐标表示是计算机图形学的重要手段之一，它既能够用来明确区分向量和点，同时也更易用于进行仿射（线性）几何变换。”

我们需要来理解一下什么是齐次坐标系，简而言之，齐次坐标就是用N+1维来代表N维坐标，我们可以在一个2D笛卡尔坐标末尾加上一个额外的变量w来形成2D齐次坐标，因此，一个点(X,Y)在齐次坐标里面变成了（x,y,w），并且有X = x/w，Y = y/w。（x,y,w）(齐次坐标系) 等价于（x/w,y/w）(笛卡尔坐标系)，齐次坐标系的点依赖于尺度定义。

单应性矩阵H具有8个独立的自由度，在统计学上是指可以自由取值的变量个数，矩阵的自由度反映了矩阵中每个元素互相约束的状态。所以单应性矩阵H仅受一个变量的约束。令X=[x,y,w]，通常使用w=1来归一化点，这样，点具有唯一的图像坐标x和y，坐标变换我们可以使用一个矩阵来表示变换。

# 点归一化，转换齐次坐标
def normalize(points):
    """ Normalize a collection of points in 
        homogeneous coordinates so that last row = 1. """

    for row in points:
    #每一行的点集除以最后一行的点集，使最后一行为1。
        row /= points[-1]
    return points
    
    
def make_homog(points):
    """ Convert a set of points (dim*n array) to 
        homogeneous coordinates. """
     #在points的基础上，叠加一行（元素均为1），相当于齐次坐标变换。
    return vstack((points,ones((1,points.shape[1]))))

在进行点和变换处理时，按照列优先的原则存储这些点，n个二维点集等价为齐次坐标意义之下的一个3 $\times$ n数组。(H矩阵尺寸是3 $\times$ 3)，在这些投影变换中，有一些特别重要的变换，如：仿射变换：
$\left[ \begin{matrix}x'\\\\y'\\\\w' \end{matrix} \right]=\left[ \begin{matrix} a_1 & a_2 & t_x\\ \\ a_3 & a_4 &t_y \\ \\ 0&0&1\end{matrix} \right]=\left[ \begin{matrix}x\\\\y\\\\w \end{matrix} \right]=\left[ \begin{matrix}a_1x+a_2y+t_x w\\\\a_3x+a_4y+t_yw\\\\w \end{matrix} \right]=\left[ \begin{matrix}a_1x/w+a_2y/w+t_x \\\\a_3x/w+a_4y/w+t_y\\\\1 \end{matrix} \right]$

仿射变换包含可逆矩阵A和一个平移向量 $t=[t_x,t_y]$ ，可用于图像扭曲。

相似变换：
$\left[ \begin{matrix}x'\\\\y'\\\\1 \end{matrix} \right]=\left[ \begin{matrix} scos(\theta) & -ssin(\theta) & t_x\\ \\ s sin(\theta) & scos(\theta) &t_y \\ \\ 0&0&1\end{matrix} \right]=\left[ \begin{matrix}x\\\\y\\\\w \end{matrix} \right]$
s制定了变换的尺度，R是角度为 $\theta$ 的旋转矩阵， $t=[t_x,t_y]$ 同样是平移向量，s=1是，称为刚体变换，保持距离不变。

3.1.1 直接线性变换算法

单应性矩阵H可由两幅图像中的对应点对计算出来，一个完全射影变换具有8个自由度，根据对应点的约束，每个对应点可以写出两个方程，分别对应于x和y坐标，因此，计算单应性矩阵H需要4个对应点对。X=[x,y,w]表示一个对应点，所以求解单应性矩阵H需要4个方程组一共8个方程（H矩阵有8个未知数），表示为Ah=0，将4个方程的系数堆叠到一个矩阵中，我们使用SVD（奇异值分解）算法找到H的最小二乘解，最小二乘解即为矩阵SVD分解后所得矩阵V的最后一行，改行经过变形后得到矩阵H。代码如下：

# fp是原军阵，tp是映射后的矩阵
def H_from_points(fp,tp):
    """ Find homography H, such that fp is mapped to tp
        using the linear DLT method. Points are conditioned
        automatically. """
    
    if fp.shape != tp.shape:
        raise RuntimeError('number of points do not match')
        
    # condition points (important for numerical reasons)
    # --from points--
    # 映射起始点
    # fp前两个点取均值（横轴方向）
    # 令均值为0，方差为1
    m = mean(fp[:2], axis=1)
    # 求横轴方向前两个点的标准差
    maxstd = max(std(fp[:2], axis=1)) + 1e-9
    # c1是对角矩阵，主对角线方向元素为列表中的值，其余元素为0
    C1 = diag([1/maxstd, 1/maxstd, 1]) 
    C1[0][2] = -m[0]/maxstd
    C1[1][2] = -m[1]/maxstd
    # 乘积
    fp = dot(C1,fp)
    
    # --to points--
    # 映射起始点
    m = mean(tp[:2], axis=1)
    maxstd = max(std(tp[:2], axis=1)) + 1e-9
    C2 = diag([1/maxstd, 1/maxstd, 1])
    C2[0][2] = -m[0]/maxstd
    C2[1][2] = -m[1]/maxstd
    tp = dot(C2,tp)
    
    # create matrix for linear method, 2 rows for each correspondence pair
    # 对应点对创造矩阵A
    nbr_correspondences = fp.shape[1]
    A = zeros((2*nbr_correspondences,9))
    for i in range(nbr_correspondences):        
        A[2*i] = [-fp[0][i],-fp[1][i],-1,0,0,0,
                    tp[0][i]*fp[0][i],tp[0][i]*fp[1][i],tp[0][i]]
        A[2*i+1] = [0,0,0,-fp[0][i],-fp[1][i],-1,
                    tp[1][i]*fp[0][i],tp[1][i]*fp[1][i],tp[1][i]]
    # u大小为(M,M)，s大小为(M,N)，v大小为(N,N)。A = u*s*v，其中s是对矩阵a的奇异值分解。s除了对角元素不为0，其他元素都为0，并且对角元素从大到小排列。s中有n个奇异值，一般排在后面的比较接近0，所以仅保留比较大的r个奇异值。 
    U,S,V = linalg.svd(A)
    H = V[8].reshape((3,3))    
    
    # decondition
    # 反归一化
    H = dot(linalg.inv(C2),dot(H,C1))
    
    # normalize and return
    return H / H[2,2]

3.1.2 仿射变换

仿射变换具有6个自由度，因此我们需要三个对应点来估计矩阵H，通过将 $h_7、h_8$ 设置为0，使用如下的函数来计算仿射变换矩阵：

# 计算H，使得tp是fp经过仿射变换H得到的。
def Haffine_from_points(fp,tp):
    """ Find H, affine transformation, such that 
        tp is affine transf of fp. """
    
    if fp.shape != tp.shape:
        raise RuntimeError('number of points do not match')
        
    # condition points
    # --from points--
    m = mean(fp[:2], axis=1)
    maxstd = max(std(fp[:2], axis=1)) + 1e-9
    C1 = diag([1/maxstd, 1/maxstd, 1]) 
    C1[0][2] = -m[0]/maxstd
    C1[1][2] = -m[1]/maxstd
    fp_cond = dot(C1,fp)
    
    # --to points--
    m = mean(tp[:2], axis=1)
    # C1与C2要具有相同的尺寸
    C2 = C1.copy() #must use same scaling for both point sets
    C2[0][2] = -m[0]/maxstd
    C2[1][2] = -m[1]/maxstd
    tp_cond = dot(C2,tp)
    
    # conditioned points have mean zero, so translation is zero
    A = concatenate((fp_cond[:2],tp_cond[:2]), axis=0)
    U,S,V = linalg.svd(A.T)
    
    # create B and C matrices as Hartley-Zisserman (2:nd ed) p 130.
    tmp = V[:2].T
    B = tmp[:2]
    C = tmp[2:4]
    
    tmp2 = concatenate((dot(C,linalg.pinv(B)),zeros((2,1))), axis=1) 
    H = vstack((tmp2,[0,0,1]))
    
    # decondition
    H = dot(linalg.inv(C2),dot(H,C1))
    
    return H / H[2,2]

3.2 图像扭曲

对图像块应用仿射变换，我们将其称为图像扭曲（或者仿射扭曲），可以使用Scipy中的ndimage包来简单完成，命令语句：transformed_im=ndimage.affine_transform(im,A,b,size)。其中，A为线性变换，b为平移向量，size用来指定输出图像的大小。

from scipy import ndimage
from PIL import Image
from pylab import *

im=array(Image.open('pic/empire.jpg').convert('L'))
H=array([[1.4,0.05,-100],[0.05,1.5,-100],[0,0,1]])
im2=ndimage.affine_transform(im,H[:2,:2],(H[0,2],H[1,2]))

figure()
gray()
imshow(im2)
axis('off')
show()

在这里插入图片描述
array[[1.4,0.05],[0.05,1.5]]是线性变化，[-100,-100]是平移变量，达到图像扭曲的效果，图像中丢失的像素用零填充。

3.2.1 图像中的图像

仿射扭曲有一个简单的例子，将图像或者图像的一部分放置在另一幅图像中，使得它们能够和指定的区域或者标记物对齐，将图像image_in_image()添加到warp.py文件中，该函数的输入参数为两幅图像和一个坐标，该坐标为第一幅图像放置到第二幅图像中的角点坐标。

import matplotlib.delaunay as md 
from scipy import ndimage
from pylab import *
from numpy import *

import homography
    

def image_in_image(im1,im2,tp):
    """ Put im1 in im2 with an affine transformation
        such that corners are as close to tp as possible.
        tp are homogeneous and counter-clockwise from top left. """ 
    
    # points to warp from
    m,n = im1.shape[:2]
    fp = array([[0,m,m,0],[0,0,n,n],[1,1,1,1]])
    
    # compute affine transform and apply
    # import 仿射变换的函数，tp经仿射变换H得到fp
    H = homography.Haffine_from_points(tp,fp)
    im1_t = ndimage.affine_transform(im1,H[:2,:2],
                    (H[0,2],H[1,2]),im2.shape[:2])
    alpha = (im1_t > 0)
    
    return (1-alpha)*im2 + alpha*im1_t

从而创建了alpha图像（将扭曲的图像和第二幅图像融合），扭曲的图像是在扭曲区域边界以外以0来填充的图像，创建一个二值的alpha图像，坐标值可以通过查看绘制的图形手工确定的，也可以用Pylab库中的ginput()函数得到。

import warp
im2=array(Image.open('pic/JiMeiUni.jpg').convert('L').resize((1000,1000)))
im1=array(Image.open('pic/word_happy.jpg').convert('L'))

# 选定坐标
tp=array([[264,538,540,264],[40,36,605,605],[1,1,1,1]])

im3=warp.image_in_image(im1,im2,tp)

figure()
gray()
imshow(im3)

axis('equal')
axis('off')
show()

在这里插入图片描述
将图1插入到图二的上半部分

在这里插入图片描述

tp=array(([675,826,826,677],[55,52,281,277],[1,1,1,1]))（左下）(4个点，应是齐次坐标系）

在强透视效应的情况下，我们不可能使用同一个仿射变换将4个角点变换到它们的目标位置，当你打算使用仿射变换时，有一个很有用的技巧。仿射变换具有6个自由度，对应3个对应点对，所以可以将图形分成两个三角形，分别对它们进行扭曲图形操作。

m,n = im1.shape[:2]
fp = array([[0,m,m,0],[0,0,n,n],[1,1,1,1]])
# first triangle
tp2 = tp[:,:3]
fp2 = fp[:,:3]
# compute H
H = homography.Haffine_from_points(tp2,fp2)
im1_t = ndimage.affine_transform(im1,H[:2,:2],
(H[0,2],H[1,2]),im2.shape[:2])
# alpha for triangle
alpha = warp.alpha_for_triangle(tp2,im2.shape[0],im2.shape[1])
im3 = (1-alpha)*im2 + alpha*im1_t
# second triangle
tp2 = tp[:,[0,2,3]]
fp2 = fp[:,[0,2,3]]
# compute H
H = homography.Haffine_from_points(tp2,fp2)
im1_t = ndimage.affine_transform(im1,H[:2,:2],
(H[0,2],H[1,2]),im2.shape[:2])
# alpha for triangle
alpha = warp.alpha_for_triangle(tp2,im2.shape[0],im2.shape[1])
im4 = (1-alpha)*im3 + alpha*im1_t

imshow(im4)
axis('off')
show()

在这里插入图片描述
其函数的目的是利用两个三角形的仿射弯曲效果，将4个角点变换到它们的目标位置,输出im3图像，即第一个三角形，如图所示。

3.2.2 分段仿射扭曲

给定任意图像的标记点，通过这些点进行三角剖分，然后使用扭曲仿射来扭曲每个三角形，将图像和另一副图像的对应标记点扭曲对应，演示如何使用Matplotlib和Scipy来完成该操作（狄洛克三角剖分）

from scipy.spatial import Delaunay
# 正态分布
x,y=array(random.standard_normal((2,100)))

# 获取三角形列表
tri = Delaunay(np.c_[x,y]).simplices

figure()
for t in tri:
    # 将最后一个点放在最后
    t_ext=[t[0],t[1],t[2],t[0]]
    plot(x[t_ext],y[t_ext],'r')
plot(x,y,'*')
axis('off')
show()
```python
在这里插入代码片

在这里插入图片描述

输出三角形三个点的切片


#首先检查该图像时灰度还是彩色图像，若是彩色图像，则要对每个颜色通道进行扭曲处理。
# 分段仿射图像函数
def pw_affine(fromim,toim,fp,tp,tri):
   # fromim = 将要扭曲的图像 toim = 目标图 fp = 扭曲前的点 tp = 扭曲后的点 tri = 三角剖分
                
    im = toim.copy()
    
    # check if image is grayscale or color
    is_color = len(fromim.shape) == 3
    
    # create image to warp to (needed if iterate colors)
    im_t = zeros(im.shape, 'uint8') 
    
    for t in tri:
        # compute affine transformation
        H = homography.Haffine_from_points(tp[:,t],fp[:,t])
        
        # 遍历颜色通道
        if is_color:
            for col in range(fromim.shape[2]):
            # 仿射变换
                im_t[:,:,col] = ndimage.affine_transform(
                    fromim[:,:,col],H[:2,:2],(H[0,2],H[1,2]),im.shape[:2])
        else:
            im_t = ndimage.affine_transform(
                    fromim,H[:2,:2],(H[0,2],H[1,2]),im.shape[:2])
        
        # 三角形的alpha
        alpha = alpha_for_triangle(tp[:,t],im.shape[0],im.shape[1])
        
        # 将三角形加入到图像中
        im[alpha>0] = im_t[alpha>0]
        
    return im

使用狄洛克三角剖分标记点进行分段仿射扭曲

import homography
import warp
import os
from numpy import * 
fromin = array(Image.open('pic/sunset_tree.jpg'))
# 创造网格
x,y = meshgrid(range(5),range(6))
x = (fromin.shape[1]/4)*x.flatten()
y = (fromin.shape[0]/5)*y.flatten()

#三角剖分
tri = Delaunay(np.c_[x,y]).simplices

#打开图像和目标点
im = array(Image.open('pic/turningtorso1.jpg'))
tp = loadtxt('turninggtorso_points.txt')

figure()
subplot(1, 4, 1)
axis('off')
imshow(im)

# 将点转换成齐次坐标
fp = array(vstack((y, x, ones((1, len(x))))), 'int')
tp = array(vstack((tp[:, 1], tp[:, 0], ones((1, len(tp))))), 'int')

# 扭曲三角形
im = warp.pw_affine(fromim, im, fp, tp, tri)

# 绘制图像
subplot(1, 4, 2)
axis('off')
imshow(fromim)
warp.plot_mesh(fp[1], fp[0], tri)
subplot(1, 4, 3)
axis('off')
imshow(im)
subplot(1, 4, 4)
axis('off')
imshow(im)
warp.plot_mesh(tp[1], tp[0], tri)
show()

在这里插入图片描述
第四张图显示带有三角剖分的图像被扭曲，并使用仿射变换，与第一张图片融合。

3.2.3 图像配准

图像配准是对图像进行变换，使变换后的图像能够在常见的坐标系中对齐，让我们一起看一个对多个人脸图像进行严格配准的例子，使得计算的平均人脸和人脸表现的变化具有意义，这类配准中，实际上是寻找一个相似变换，在对应点之间建立映射。

使用xml.dom模块中的minidom来读取XML文件。

def read_points_from_xml(xmlFileName):
    """ Reads control points for face alignment. """

    xmldoc = minidom.parse(xmlFileName)
    facelist = xmldoc.getElementsByTagName('face')    
    faces = {}    
    for xmlFace in facelist:
        fileName = xmlFace.attributes['file'].value
        xf = int(xmlFace.attributes['xf'].value)
        yf = int(xmlFace.attributes['yf'].value)     
        xs = int(xmlFace.attributes['xs'].value)
        ys = int(xmlFace.attributes['ys'].value)
        xm = int(xmlFace.attributes['xm'].value)
        ym = int(xmlFace.attributes['ym'].value)
        faces[fileName] = array([xf, yf, xs, ys, xm, ym])
    return faces

文件中的标记点会以字典的形式返回，字典的键值为图像的文件名，值为：xf 、yf(左眼)；xs、ys(右眼)；xm、ym(嘴)，参数使用最小二乘解，这些点被映射到目标位置 $x_i',y_i']$ 。
$\left[ \begin{matrix}x_i '\\\\y_i'\\ \end{matrix} \right]=\left[ \begin{matrix} a& -b \\ \\ b & -a \\\end{matrix} \right ] \left[ \begin{matrix}x_i \\\\y_i\\ \end{matrix} \right]+ [t_x,t_y]^T$ 使用linalg.listsq()函数来计算最小二乘解

def write_points_to_xml(faces, xmlFileName):
    xmldoc = minidom.Document()
    xmlFaces = xmldoc.createElement("faces")

    keys = faces.keys()
    for k in keys:
        xmlFace = xmldoc.createElement("face")
        xmlFace.setAttribute("file", k)
        xmlFace.setAttribute("xf", "%d" % faces[k][0])    
        xmlFace.setAttribute("yf", "%d" % faces[k][1])
        xmlFace.setAttribute("xs", "%d" % faces[k][2])    
        xmlFace.setAttribute("ys", "%d" % faces[k][3])
        xmlFace.setAttribute("xm", "%d" % faces[k][4])
        xmlFace.setAttribute("ym", "%d" % faces[k][5])
        xmlFaces.appendChild(xmlFace)

    xmldoc.appendChild(xmlFaces)

    fp = open(xmlFileName, "w")
    fp.write(xmldoc.toprettyxml(encoding='utf-8'))    
    fp.close()


def compute_rigid_transform(refpoints,points):
    """ Computes rotation, scale and translation for
        aligning points to refpoints. """
    
    A = array([    [points[0], -points[1], 1, 0],
                [points[1],  points[0], 0, 1],
                [points[2], -points[3], 1, 0],
                [points[3],  points[2], 0, 1],
                [points[4], -points[5], 1, 0],
                [points[5],  points[4], 0, 1]])

    y = array([    refpoints[0],
                refpoints[1],
                refpoints[2],
                refpoints[3],
                refpoints[4],
                refpoints[5]])
    
    # least sq solution to mimimize ||Ax - y||
    a,b,tx,ty = linalg.lstsq(A,y)[0]
    R = array([[a, -b], [b, a]]) # rotation matrix incl scale

    return R,tx,ty

返回具有尺度地旋转矩阵，以及x和y方向上的平移量

对每个颜色通道进行仿射变换，直接使用第一幅图像中的标记位置作为参考坐标系，来进行配准操作


def rigid_alignment(faces,path,plotflag=False):
    """    Align images rigidly and save as new images.
        path determines where the aligned images are saved
        set plotflag=True to plot the images. """
    
    # take the points in the first image as reference points
    refpoints = faces.values()[0]
    
    # warp each image using affine transform
    for face in faces:
        points = faces[face]
        
        R,tx,ty = compute_rigid_transform(refpoints, points)
        T = array([[R[1][1], R[1][0]], [R[0][1], R[0][0]]])    
        
        im = array(Image.open(os.path.join(path,face)))
        im2 = zeros(im.shape, 'uint8')
        
        # warp each color channel
        for i in range(len(im.shape)):
            im2[:,:,i] = ndimage.affine_transform(im[:,:,i],linalg.inv(T),offset=[-ty,-tx])
            
        if plotflag:
            imshow(im2)
            show()
            
        # crop away border and save aligned images
        h,w = im2.shape[:2]
        border = (w+h)/20
        
        # crop away border
        # 组合路径后返回并裁剪边界
        imsave(os.path.join(path, 'aligned/'+face),im2[border:h-border,border:w-border,:])