在训练模型时,数据量不够的情况下,通过做数据增广来增加数据的复杂性,透视变换是一种比较好的数据增广手段,下面介绍如何对图片图片透视变换(Perspective Transformation):透视变换主要是基于图片的平移、旋转特性对图片进行一系列的矩阵变换操作,达到3D的变换效果,透视变换的原理可以参考:
https://stackoverflow.com/questions/17087446/how-to-calculate-perspective-transform-for-opencv-from-rotation-angles
变换主要思想是平面旋转,同时沿着某个基准轴进行旋转,需要计算每种变换的矩阵,其推导过程较为复杂,具体可参考上面链接:
实现过程可参考链接:https://github.com/eborboihuc/rotate_3d
作者提供了沿着x,y,z轴的任意角度的变换过程,其参数含义如下:
- gamma : the rotation around the x axis
- phi : the rotation around the y axis
- gamma : the rotation around the z axis
如果图片的黑色填充尺寸过大,可以对图片进行裁剪操作,主要需要对图片进行灰度化,然后按通域的找到图片的外接矩形进行裁剪,参考如下:
ImageTr = ImageTransformer(os.path.join(remake_src, imgfile))
ang = '45'
res, mat = ImageTr.rotate_along_axis(gamma=0, phi=45, theta=0, dx=100,dy=100, dz=100) # rotate around z axis
# show_image("res", res)
print(mat)
# plt.imshow(res)
grey = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(grey, 10, 255, cv2.THRESH_BINARY)
out = cv2.findContours(thresh, 1, 2)
x, y, w, h = cv2.boundingRect(out[0])
crop = res[y:y + h, x:x + w]
offset = np.hstack((np.ones((4,1))*x,np.ones((4,1))*y))
lines =''
flag = os.path.exists(os.path.join(remake_src,txtname))
if not flag:
continue
with open(os.path.join(remake_src, txtname), 'r', encoding='utf-8')as f:
for line in f.readlines():
pt_ = np.array(line.split(',')[0:8], np.int32).reshape(4, 2)
pt_new = np.insert(pt_, 2, values=np.ones(4), axis=1) # pt_new dimension: 4*3
pt_result = mat.dot(pt_new.T) # mat:3*3
poly = np.divide(pt_result, pt_result[2])
poly_ = np.delete(poly, 2, axis=0).T - np.array([x,y])
tmp = list(map(int,poly_.reshape((-1))))
tmp.append(line.split(",", 8)[8])
lines += ','.join(list(map(str, tmp)))
with open(os.path.join(remake_dst, pre + '.txt'), 'w', encoding='utf-8') as ff:
ff.write(lines)
cv2.imwrite(remake_dst + pre + back, crop, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
参考:https://nbviewer.jupyter.org/github/manisoftwartist/perspectiveproj/blob/master/perspective.ipynb