前两篇文章已经完成基本从mxnet到ncnn的unet模型训练和转换。不过还存在几个问题,1. 模型比较大,2. 单帧处理需要15秒左右的时间(MAC PRO,ncnn没有使用openmp的情况),3. 得到的mask结果不是特别理想。针对这三个问题,本文将对网络结构进行调整。
1. 模型比较大
采取将网络卷积核数量减少4倍的方式,模型大小下降到2M,粗略用图片测试,效果也还可以。为了提高准确率,采取将样本翻转、crop、旋转等方式进行扩充。同时把之前用0值填充图片的方式,改成用边界值填充,因为测试的时候发现之前的方式总在填充的边界往往会出现检测错误。原先还做过试验,如果2M还不够小,可以把U型下降段改成mobilenet的方式进一步压缩模型大小。
#!/usr/bin/env python
# coding=utf8
import os
import sys
import random
import cv2
import mxnet as mx
import numpy as np
from mxnet.io import DataIter, DataBatch
sys.path.append('../')
def padding_and_resize(img, dstwidth, dstheight):
height = img.shape[0]
width = img.shape[1]
top = 0
bottom = 0
left = 0
right = 0
if width > height:
top = int((width - height) / 2)
bottom = int((width - height) - top)
else:
left = int((height - width) / 2)
right = int((height - width) - left)
tmp = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_REPLICATE)
return cv2.resize(img, (dstwidth, dstheight))
def rotate_image(image, angle):
# grab the dimensions of the image and then determine the
# center
(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
# grab the rotation matrix (applying the negative of the
# angle to rotate clockwise), then grab the sine and cosine
# (i.e., the rotation components of the matrix)
M = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
# compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
# adjust the rotation matrix to take into account translation
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
# perform the actual rotation and return the image
return cv2.warpAffine(image, M, (nW, nH), flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REPLICATE)
def get_batch(items, root_path, nClasses, height, width):
x = []
y = []
for item in items:
flipped = False
cropped = False
rotated = False
rotated_neg = False
image_path = root_path + item.split(' ')[0]
label_path = root_path + item.split(' ')[-1].strip()
if image_path.find('_flipped.') >= 0:
image_path = image_path.replace('_flipped', '')
flipped = True
elif image_path.find('_cropped.') >= 0:
image_path = image_path.replace('_cropped', '')
cropped = True
elif image_path.find('_rotated.') >= 0:
image_path = image_path.replace('_rotated', '')
rotated = True
elif image_path.find('_rotated_neg.') >= 0:
image_path = image_path.replace('_rotated_neg', '')
rotated_neg = True
im = cv2.imread(image_path, 1)
lim = cv2.imread(label_path, 1)
if cropped:
tmp_height = im.shape[0]
im = im[:,tmp_height//5:tmp_height*4//5]
tmp_height = lim.shape[0]
lim = lim[:,tmp_height//5:tmp_height*4//5]
if flipped:
im = cv2.flip(im, 1)
lim = cv2.flip(lim, 1)
if rotated:
im = rotate_image(im, 13)
lim = rotate_image(lim, 13)
if rotated_neg:
im = rotate_image(im, -13)
lim = rotate_image(lim, -13)
im = padding_and_resize(im, width, height)
lim = padding_and_resize(lim, width, height)
im = np.float32(im) / 255.0
lim = lim[:, :, 0]
seg_labels = np.zeros((height, width, nClasses))
for c in range(nClasses):
seg_labels[:, :, c] = (lim == c).astype(int)
seg_labels = np.reshape(seg_labels, (width * height, nClasses))
x.append(im.transpose((2,0,1)))
y.append(seg_labels.transpose((1,0)))
return mx.nd.array(x), mx.nd.array(y)
2. 单帧处理需要15秒左右的时间
按照第一步处理之后,基本上一张图片只要1秒钟就处理完成,如果用上了openmp多开几个线程,1秒应该可以处理好几张。有个想法是,如果把神经网络的每一层搞成一个线程负责,用流水线的方式,也许可以做到实时处理视频帧。
3. 得到的mask结果不是特别理想
通过扩充样本,修改网络concat方式混合训练,比如up6 = mx.sym.concat(*[trans_conv6, conv5], dim=1, name='concat6')的conv5换成第一次卷积(原先是第二次)的结果,训练几个epoch再换回原来的网络。
附几张效果图