KittiBox training processing
Box (Detection) training的网络框架
preparation
未设置的系统变量采用了默认值
import tensorvision.train as train
import tensorvision.utils as utils
No environment variable 'TV_PLUGIN_DIR' found. Set to '/home/keysen/tv-plugins'.
No environment variable 'TV_STEP_SHOW' found. Set to '50'.
No environment variable 'TV_STEP_EVAL' found. Set to '250'.
No environment variable 'TV_STEP_WRITE' found. Set to '1000'.
No environment variable 'TV_MAX_KEEP' found. Set to '10'.
No environment variable 'TV_STEP_STR' found. Set to 'Step {step}/{total_steps}: loss = {loss_value:.2f}; lr = {lr_value:.2e}; {sec_per_batch:.3f} sec (per Batch); {examples_per_sec:.1f} imgs/sec'.
预定义flags
- ‘name’ : None
- ‘project’ : None
- ‘save’ : Ture, ‘是否保存the run, 输出保存在文件夹 TV_DIR_RUNS/debug’
- ‘hypes’ : ‘hypes/kittiBox.json’ ‘存储模型参数’
导入KittiBox.json, 添加 ‘dirs’ 属性
hypes = json.load(f)
...
utils.set_dirs(hypes, tf.app.flags.FLAGS.hypes)#创建三个路径
...
train.initialize_training_folder(hypes)#在路径为RUNS/kittiBox../model_files下面添加文件,创建一个image文件夹和一个输出日志文件
"dirs": {
"base_path": "/home/keysen/lingck/KittiBox/hypes",
"data_dir": "/home/keysen/lingck/KittiBox/hypes/../DATA",
"files_dir": "model_files",
"image_dir": "/home/keysen/lingck/KittiBox/hypes/../RUNS/kittiBox_2017_05_12_16.53/images",
"output_dir": "/home/keysen/lingck/KittiBox/hypes/../RUNS/kittiBox_2017_05_12_16.53"
}
输出文件保存
- RUNS/KittiBox_date_/images :
- RUNS/KittiBox_date_/model_files :
- architecure.py
- data_input.py
- eval.py
- hypes.py
- objective.py
- solver.py
- RUNS/KittiBox_date_/output.log
Do Training
加载models_files中的文件,返回各文件句柄
modules = utils.load_modules_from_hypes(hypes) #加载几个训练相关的PY文件
modules
- [‘input’] : ../inputs/Kitti_input.py
- [‘arch’] : ../encoder/vgg.py
- [‘objective’] : ../decoder/fastBox.py
- [‘solver’] : ../optimizer/generic_optimizer.py
- [‘eval’] : ../evals/kitti_eval.py
队列queue
属性
types : tf.float32
grid_size : 39*12
shapes = (
# image
[384,1248,3],
# labels
[12,39],# confidences
[12,39,4],# boxes
[12,39]$ mask
)
容量capacity : 30
创建队列
queue = tf.FIFOQueue(capacity=capacity, dtypes=dtypes, shapes=shapes) #创建一个队列,队列长度为30,每个位置的shape是“shapes”
基于model_files创建tf流图
tv_graph = core.build_training_graph(hypes, queue, modules)
STEPS_1 input_image 的预变换
操作:从队列中取hypes[‘batch_size’]个成员,并对每个成员中的image元素进行了亮度和对比度的调整,返回image,labels(confidences, boxes, mask)对象
image, confidences, boxes, mask = q.dequeue_many(hypes['batch_size']) #从队列中移出5个元素batch_size=5
image = tf.image.random_brightness(image, max_delta=30) ##随机亮度变换
image = tf.image.random_contrast(image, lower=0.75, upper=1.25) #随机对比度变换
STEPS_2 build encoder-VGGNet
操作:构建完整的VGG16Net,用已训练的VGGNet模型为框架
logits = encoder.inference(hypes, image, train=True) #构架网络
...
vgg_fcn = fcn8_vgg.FCN8VGG(vgg16_npy_path=vgg16_npy_path)
num_classes = 2 # does not influence training what so ever
vgg_fcn.wd = hypes['wd']
vgg_fcn.build(images, train=train, num_classes=num_classes,
random_init_fc8=True)
vgg_dict = {'deep_feat': deep_feat,# pool_5,编码器的输出,解码器的公共特征
'early_feat': vgg_fcn.conv4_3}
return vgg_dict
CONV:
input:(5,384,1248,3)
activation function: RELU
Layer name: conv1_1,strides:(1,1,1,1), padding:’SAME’
Layer shape: (3, 3, 3, 64)
Layer name: conv1_2
Layer shape: (3, 3, 64, 64)
POOL:(1, 2, 2, 1),strides:(1, 2, 2, 1), padding:’SAME’
Layer name: conv2_1
Layer shape: (3, 3, 64, 128)
Layer name: conv2_2
Layer shape: (3, 3, 128, 128)
Layer name: conv3_1
Layer shape: (3, 3, 128, 256)
Layer name: conv3_2
Layer shape: (3, 3, 256, 256)
Layer name: conv3_3
Layer shape: (3, 3, 256, 256)
Layer name: conv4_1
Layer shape: (3, 3, 256, 512)
Layer name: conv4_2
Layer shape: (3, 3, 512, 512)
Layer name: conv4_3
Layer shape: (3, 3, 512, 512)
Layer name: conv5_1
Layer shape: (3, 3, 512, 512)
Layer name: conv5_2
Layer shape: (3, 3, 512, 512)
Layer name: conv5_3
Layer shape: (3, 3, 512, 512)
FULL CONECT(1*1 conv kernel):
Layer name: fc6
Layer shape: [7, 7, 512, 4096]
Layer name: fc7
Layer shape: [1, 1, 4096, 4096]
fc8(score_fr):
shape: (1, 1, 4096, 2)
PRED:
pred = tf.argmax(self.score_fr, dimension=3) # 获取指定维度最大值的索引值,也即是正向传播sorce的最大值索引
UPSCORE:
- upscore2: score_fr层(实际跟pool5大小相同)向上采样
- 采样后大小:[与pool4前三维大小一样,num_class]
- num_class : 2
- filter_size : [4, 4, 2, 2]
- strides : [1, 2, 2, 1]
- score_ppol4: 对pool4的评分
- 输出大小: [与pool4前三维大小一样,num_class]
- fuse_pool4: 融合upscore2 与 pool4
- upscore4: fuse_pool4层向上采样
- 采样后大小:[与pool3前三维大小一样,num_class]
- num_class : 2
- filter_size : [4, 4, 2, 2]
- strides : [1, 2, 2, 1]
- score_pool3: 对pool3的评分
- 输出大小: [与pool3前三维大小一样,num_class]
- fuse_pool3: 融合upscore4 与 pool3
- upscore32: fuse_pool3层向上采样
- 采样后大小:[与input_image前三维大小一样,num_class]
- num_class : 2
- filter_size : [16, 16, 2, 2]
- strides : [1, 8, 8, 1]
PRED_UP:
self.pred_up = tf.argmax(self.upscore32, dimension=3) # 获取指定维度最大值的索引值,也即是上采样sorce的最大值索引
STEPS_3 build decoder-detectaion
操作:构建解码器部分,主要是detector解码器,调用objective.py(实际是fastBox.py)
decoded_logits = objective.decoder(hypes, logits, train=True)
...
# 中间变化
hidden_output = _build_inner_layer(hyp, encoded_features, train)
# encoder_feature(pool5),经过1*1*500的卷积由(5, 12, 39, 512)大小变成(5*39*12,500)的hidden_output
# 输出层output_layer
pred_boxes, pred_logits, pred_confidences = _build_output_layer(hyp, hidden_output)
# pred_boxes是encoder_feature的输出预测的框的位置:(5*12*39, 1, 4)
# pred_logits是 没有经过softmax 的置信度:(5*12*39, 1, 2)
# pred_confidences是 经过softmax 的置信度:(5*12*39, 1, 2)
# 采用rezoom layer: 利用conv4_3 来提高输出的分辨率
# 输入:
rezoom_input = pred_boxes, pred_logits, pred_confidences, early_feat, hidden_output
# 输出:
rezoom_output = _build_rezoom_layer(hyp, rezoom_input, train) # rezoom_output由以下 5 个元素组成
...
# pred_boxes是encoder_feature的输出预测的框的位置:(5*12*39, 1, 4)
# pred_logits是 没有经过softmax 的置信度:(5*12*39, 1, 2)
# pred_confidences是 经过rezoom layer后再softmax 的置信度:(5*12*39, 1, 2)
...
ip1 = tf.nn.relu(tf.matmul(delta_features, delta_weights1)) #(2340,128)
delta_confs_weights = tf.get_variable('delta2', shape=[dim, hyp['num_classes']]) #(128,2)
delta_boxes_weights = tf.get_variable('delta_boxes', shape=[dim, 4])#(128,4)
rere_feature = tf.matmul(ip1, delta_boxes_weights) * 5 #(2340,4)
pred_boxes_delta = (tf.reshape(rere_feature, [outer_size, 1, 4])) #(2340,1,4)
feature2 = tf.matmul(ip1, delta_confs_weights) * scale
pred_confs_delta = tf.reshape(feature2, [outer_size, 1, hyp['num_classes']]) #(2340,1,2)
# pred_confs_delta是 经过rezoom layer后 没有 softmax 的置信度:(5*12*39, 1, 2)
# pred_boxes_delta是 经过rezoom layer后 的输出预测框的位置‘
# 然后由 变量 dlogits 统一管理:
# dlogits['pred_confs_deltas'] = pred_confs_deltas
# dlogits['pred_boxes_deltas'] = pred_boxes_deltas
# dlogits['pred_boxes_new'] = pred_boxes + pred_boxes_deltas
# dlogits['pred_boxes'] = pred_boxes
# dlogits['pred_logits'] = pred_logits
# dlogits['pred_confidences'] = pred_confidences
return dlogits
STEPS_4 Loss function
操作: 定义loss函数,用上述得到的pred_boxes、pred_logits、pred_confidences以及给定对应的labels构造损失函数
losses = objective.loss(hypes, decoded_logits, labels)
...
# 置信度的损失计算
# pred_class:pred_logits(置信度), ture_class: confidences(0或1)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=pred_classes, labels=true_classes) #计算标签和预测输出之间的softmax交叉熵
cross_entropy_sum = (tf.reduce_sum(mask_r*cross_entropy))#累加
confidences_loss = cross_entropy_sum / outer_size * head[0] #求平均得到最后的分类损失
# box的损失计算
boxes_mask = tf.reshape(
tf.cast(tf.greater(confidences, 0), 'float32'), (outer_size, 1, 1))
#公式中的delta,每个groundtruth的置信度
# ture_boxes: boxes(from labels),
residual = (true_boxes - pred_boxes) * boxes_mask
boxes_loss = tf.reduce_sum(tf.abs(residual)) / outer_size * head[1]#得到框坐标损失
# 由于采用了rezoom layer,因此有rezoom loss计算
# 输入:
rezoom_loss_input = true_boxes, pred_boxes, confidences, boxes_mask, pred_confs_deltas, pred_boxes_deltas, mask_r
# rezoom loss
delta_confs_loss, delta_boxes_loss = _compute_rezoom_loss(hypes, rezoom_loss_input)
...
error = (perm_truth[:, :, 0:2] - pred_boxes[:, :, 0:2]) \
/ tf.maximum(perm_truth[:, :, 2:4], 1.)
square_error = tf.reduce_sum(tf.square(error), 2)
inside = tf.reshape(tf.to_int64( tf.logical_and(tf.less(square_error, 0.2**2), tf.greater(classes, 0)) ), [-1])
# 置信度交叉loss
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred_confs_deltas, labels=inside)
# 残差置信度损失
delta_confs_loss = tf.reduce_sum(cross_entropy*mask_r) \
/ outer_size * hypes['solver']['head_weights'][0] * 0.1
# 残差框坐标损失
delta_unshaped = perm_truth - (pred_boxes + pred_boxes_deltas)
delta_residual = tf.reshape(delta_unshaped * pred_mask,
[outer_size, hypes['rnn_len'], 4])
sqrt_delta = tf.minimum(tf.square(delta_residual), 10. ** 2)
delta_boxes_loss = (tf.reduce_sum(sqrt_delta) /
outer_size * head[1] * 0.03)
# 常规loss部分
loss = confidences_loss + boxes_loss + delta_boxes_loss + delta_confs_loss
# 正则化部分
reg_loss_col = tf.GraphKeys.REGULARIZATION_LOSSES
weight_loss = tf.add_n(tf.get_collection(reg_loss_col), name='reg_loss')
# total loss
total_loss = weight_loss + loss
# 上述loss 由 losses对象统一管理:
losses['total_loss'] = total_loss
losses['loss'] = loss
losses['confidences_loss'] = confidences_loss
losses['boxes_loss'] = boxes_loss
losses['weight_loss'] = weight_loss
if hypes['use_rezoom']:
losses['delta_boxes_loss'] = delta_boxes_loss
losses['delta_confs_loss'] = delta_confs_loss
STEPS_5 define train_op
操作: 定义训练的操作
train_op = optimizer.training(hypes, losses, global_step, learning_rate)
...
# 选择hypes.json中预设的优化函数
if sol['opt'] == 'Adam':
opt = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=sol['epsilon']) #代入参数构建一个优化器
# 得到总损失的梯度和方差
grads_and_vars = opt.compute_gradients(total_loss)
# 梯度修剪
clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
grads_and_vars = zip(clipped_grads, tvars)
# 更新操作
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
# 应用梯度和方差进行优化
with tf.control_dependencies(update_ops):
train_op = opt.apply_gradients(grads_and_vars,
global_step=global_step)
STEPS_6 evaluation
操作:评估,比较lodits(预测值)与labels,计算精度
eval_list = objective.evaluation(hypes, image, labels, decoded_logits, losses, global_step)
...
# 精度
a = tf.equal(tf.cast(confidences, 'int64'), tf.argmax(pred_confidences_r, 3))
accuracy = tf.reduce_mean(tf.cast(a, 'float32'), name='/accuracy')
# 在训练过程中的评估,会保存一些在中间结果
# pred的中间结果
#根据得到的置信度画框,返回有框的图片
#py_func函数将log_image转换成tensorflow op
pred_log_img = tf.py_func(log_image,
[images, test_pred_confidences,
test_pred_boxes, global_step, 'pred'],
[tf.float32])
# groundtruth果
true_log_img = tf.py_func(log_image,
[images, confidences,
mask, global_step, 'true'],
[tf.uint8])
STEPS_7
以上操作所得结果及部分参数由graph对象统一管理:
graph['losses'] = losses
graph['eval_list'] = eval_list
graph['summary_op'] = summary_op
graph['train_op'] = train_op
graph['global_step'] = global_step
graph['learning_rate'] = learning_rate
graph['decoded_logits'] = learning_rate
流图创建好后,在会话开始运行流图操作
STEPS_1 prepare session
操作: 准备流图会话
tv_sess = core.start_tv_session(hypes)
STEPS_2 logging record
操作:训练过程写入日志的操作
STEPS_3 begin feed input data
操作:开始加载数据
modules['input'].start_enqueuing_threads(hypes, queue, 'train', sess)
...
# 从DATA/KittiBox/train.txt中记录的image地址和labels地址加载
STEPS_4 start training after all is ready
操作:所有操作都准备完毕后,运行训练
run_training(hypes, modules, tv_graph, tv_sess)
...
# 展示间隔次数,展示训练信息
display_iter = hypes['logging']['display_iter'] #200
# 保存变量到监视器的间隔
write_iter = hypes['logging'].get('write_iter', 5*display_iter) #1000
# 评估间隔次数
eval_iter = hypes['logging']['eval_iter']#800
# 保存checkpoint到本地磁盘的间隔
save_iter = hypes['logging']['save_iter'] #2000
# 保存中间图片的间隔
image_iter = hypes['logging'].get('image_iter', 5*save_iter) #10000