MultiNet学习——KittiBox training processing

最新推荐文章于 2023-01-05 21:15:07 发布

置顶 hukai7190

最新推荐文章于 2023-01-05 21:15:07 发布

阅读量3.8k

点赞数 2

分类专栏： MultiNet 文章标签：深度学习 python 深度学习框架

本文链接：https://blog.csdn.net/Hanging_Gardens/article/details/72357610

版权

MultiNet 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

KittiBox training processing

源码

Box (Detection) training的网络框架

preparation

未设置的系统变量采用了默认值

import tensorvision.train as train  
import tensorvision.utils as utils

No environment variable 'TV_PLUGIN_DIR' found. Set to '/home/keysen/tv-plugins'.
No environment variable 'TV_STEP_SHOW' found. Set to '50'.
No environment variable 'TV_STEP_EVAL' found. Set to '250'.
No environment variable 'TV_STEP_WRITE' found. Set to '1000'.
No environment variable 'TV_MAX_KEEP' found. Set to '10'.
No environment variable 'TV_STEP_STR' found. Set to 'Step {step}/{total_steps}: loss = {loss_value:.2f}; lr = {lr_value:.2e}; {sec_per_batch:.3f} sec (per Batch); {examples_per_sec:.1f} imgs/sec'.

预定义flags

‘name’ : None
‘project’ : None
‘save’ : Ture, ‘是否保存the run，输出保存在文件夹 TV_DIR_RUNS/debug’
‘hypes’ : ‘hypes/kittiBox.json’ ‘存储模型参数’

导入KittiBox.json, 添加 ‘dirs’ 属性

hypes = json.load(f)
...
utils.set_dirs(hypes, tf.app.flags.FLAGS.hypes)#创建三个路径
...
train.initialize_training_folder(hypes)#在路径为RUNS/kittiBox../model_files下面添加文件，创建一个image文件夹和一个输出日志文件

"dirs": {
    "base_path": "/home/keysen/lingck/KittiBox/hypes", 
    "data_dir": "/home/keysen/lingck/KittiBox/hypes/../DATA", 
    "files_dir": "model_files", 
    "image_dir": "/home/keysen/lingck/KittiBox/hypes/../RUNS/kittiBox_2017_05_12_16.53/images", 
    "output_dir": "/home/keysen/lingck/KittiBox/hypes/../RUNS/kittiBox_2017_05_12_16.53"
  }

输出文件保存

RUNS/KittiBox_date_/images :
RUNS/KittiBox_date_/model_files :
- architecure.py
- data_input.py
- eval.py
- hypes.py
- objective.py
- solver.py
RUNS/KittiBox_date_/output.log

Do Training

加载models_files中的文件，返回各文件句柄

modules = utils.load_modules_from_hypes(hypes) #加载几个训练相关的PY文件

modules

[‘input’] : ../inputs/Kitti_input.py
[‘arch’] : ../encoder/vgg.py
[‘objective’] : ../decoder/fastBox.py
[‘solver’] : ../optimizer/generic_optimizer.py
[‘eval’] : ../evals/kitti_eval.py

队列queue

属性

types ： tf.float32  
grid_size : 39*12  
shapes = (  
    # image
    [384,1248,3],  
    # labels
    [12,39],# confidences  
    [12,39,4],# boxes  
    [12,39]$ mask  
)  
容量capacity ： 30

创建队列

queue = tf.FIFOQueue(capacity=capacity, dtypes=dtypes, shapes=shapes) #创建一个队列，队列长度为30,每个位置的shape是“shapes”

基于model_files创建tf流图

tv_graph = core.build_training_graph(hypes, queue, modules)

STEPS_1 input_image 的预变换

操作：从队列中取hypes[‘batch_size’]个成员，并对每个成员中的image元素进行了亮度和对比度的调整，返回image，labels(confidences, boxes, mask)对象

image, confidences, boxes, mask = q.dequeue_many(hypes['batch_size']) #从队列中移出5个元素batch_size=5
image = tf.image.random_brightness(image, max_delta=30) ##随机亮度变换
image = tf.image.random_contrast(image, lower=0.75, upper=1.25) #随机对比度变换

STEPS_2 build encoder-VGGNet

操作：构建完整的VGG16Net，用已训练的VGGNet模型为框架

logits = encoder.inference(hypes, image, train=True) #构架网络
...
    vgg_fcn = fcn8_vgg.FCN8VGG(vgg16_npy_path=vgg16_npy_path)

    num_classes = 2  # does not influence training what so ever
    vgg_fcn.wd = hypes['wd']

    vgg_fcn.build(images, train=train, num_classes=num_classes,
                  random_init_fc8=True) 

    vgg_dict = {'deep_feat': deep_feat,# pool_5，编码器的输出，解码器的公共特征
                'early_feat': vgg_fcn.conv4_3}

    return vgg_dict

CONV:
input:(5,384,1248,3)
activation function: RELU
Layer name: conv1_1，strides:(1,1,1,1), padding:’SAME’
Layer shape: (3, 3, 3, 64)
Layer name: conv1_2
Layer shape: (3, 3, 64, 64)
POOL:(1, 2, 2, 1),strides:(1, 2, 2, 1), padding:’SAME’
Layer name: conv2_1
Layer shape: (3, 3, 64, 128)
Layer name: conv2_2
Layer shape: (3, 3, 128, 128)
Layer name: conv3_1
Layer shape: (3, 3, 128, 256)
Layer name: conv3_2
Layer shape: (3, 3, 256, 256)
Layer name: conv3_3
Layer shape: (3, 3, 256, 256)
Layer name: conv4_1
Layer shape: (3, 3, 256, 512)
Layer name: conv4_2
Layer shape: (3, 3, 512, 512)
Layer name: conv4_3
Layer shape: (3, 3, 512, 512)

Layer name: conv5_1
Layer shape: (3, 3, 512, 512)
Layer name: conv5_2
Layer shape: (3, 3, 512, 512)
Layer name: conv5_3
Layer shape: (3, 3, 512, 512)
FULL CONECT(1*1 conv kernel):
Layer name: fc6
Layer shape: [7, 7, 512, 4096]
Layer name: fc7
Layer shape: [1, 1, 4096, 4096]
fc8(score_fr):
shape: (1, 1, 4096, 2)

PRED:

pred = tf.argmax(self.score_fr, dimension=3) # 获取指定维度最大值的索引值，也即是正向传播sorce的最大值索引

UPSCORE:

upscore2: score_fr层(实际跟pool5大小相同)向上采样
- 采样后大小：[与pool4前三维大小一样，num_class]
- num_class : 2
- filter_size : [4, 4, 2, 2]
- strides : [1, 2, 2, 1]
score_ppol4: 对pool4的评分
- 输出大小： [与pool4前三维大小一样，num_class]
fuse_pool4: 融合upscore2 与 pool4
upscore4: fuse_pool4层向上采样
- 采样后大小：[与pool3前三维大小一样，num_class]
- num_class : 2
- filter_size : [4, 4, 2, 2]
- strides : [1, 2, 2, 1]
score_pool3: 对pool3的评分
- 输出大小： [与pool3前三维大小一样，num_class]
fuse_pool3: 融合upscore4 与 pool3
upscore32: fuse_pool3层向上采样
- 采样后大小：[与input_image前三维大小一样，num_class]
- num_class : 2
- filter_size : [16, 16, 2, 2]
- strides : [1, 8, 8, 1]

PRED_UP:

self.pred_up = tf.argmax(self.upscore32, dimension=3) # 获取指定维度最大值的索引值，也即是上采样sorce的最大值索引

STEPS_3 build decoder-detectaion

操作：构建解码器部分，主要是detector解码器，调用objective.py(实际是fastBox.py)

decoded_logits = objective.decoder(hypes, logits, train=True)
...
    # 中间变化
    hidden_output = _build_inner_layer(hyp, encoded_features, train)
    # encoder_feature(pool5),经过1*1*500的卷积由(5, 12, 39, 512)大小变成（5*39*12,500）的hidden_output

    # 输出层output_layer
    pred_boxes, pred_logits, pred_confidences = _build_output_layer(hyp, hidden_output)
    # pred_boxes是encoder_feature的输出预测的框的位置:(5*12*39, 1, 4)
    # pred_logits是 没有经过softmax 的置信度:(5*12*39, 1, 2)
    # pred_confidences是 经过softmax 的置信度:(5*12*39, 1, 2)

    # 采用rezoom layer: 利用conv4_3 来提高输出的分辨率
    # 输入： 
    rezoom_input = pred_boxes, pred_logits, pred_confidences, early_feat, hidden_output
    # 输出：
    rezoom_output = _build_rezoom_layer(hyp, rezoom_input, train) # rezoom_output由以下 5 个元素组成
        ...
    # pred_boxes是encoder_feature的输出预测的框的位置:(5*12*39, 1, 4)
    # pred_logits是 没有经过softmax 的置信度:(5*12*39, 1, 2)
    # pred_confidences是 经过rezoom layer后再softmax 的置信度:(5*12*39, 1, 2)
    ...
    ip1 = tf.nn.relu(tf.matmul(delta_features, delta_weights1)) #（2340,128）
    delta_confs_weights = tf.get_variable('delta2', shape=[dim, hyp['num_classes']]) #（128,2）
    delta_boxes_weights = tf.get_variable('delta_boxes', shape=[dim, 4])#（128,4）
    rere_feature = tf.matmul(ip1, delta_boxes_weights) * 5 #（2340,4）
    pred_boxes_delta = (tf.reshape(rere_feature, [outer_size, 1, 4])) #（2340,1,4）
    feature2 = tf.matmul(ip1, delta_confs_weights) * scale
    pred_confs_delta = tf.reshape(feature2, [outer_size, 1, hyp['num_classes']]) #(2340,1,2)
    # pred_confs_delta是 经过rezoom layer后 没有 softmax 的置信度:(5*12*39, 1, 2)
    # pred_boxes_delta是 经过rezoom layer后 的输出预测框的位置‘

    # 然后由 变量 dlogits 统一管理：
        # dlogits['pred_confs_deltas'] = pred_confs_deltas 
        # dlogits['pred_boxes_deltas'] = pred_boxes_deltas 
        # dlogits['pred_boxes_new'] = pred_boxes + pred_boxes_deltas
        # dlogits['pred_boxes'] = pred_boxes 
        # dlogits['pred_logits'] = pred_logits  
        # dlogits['pred_confidences'] = pred_confidences
    return  dlogits

STEPS_4 Loss function

操作：定义loss函数，用上述得到的pred_boxes、pred_logits、pred_confidences以及给定对应的labels构造损失函数

losses = objective.loss(hypes, decoded_logits, labels)
...
    # 置信度的损失计算
    # pred_class:pred_logits(置信度), ture_class: confidences(0或1)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=pred_classes, labels=true_classes) #计算标签和预测输出之间的softmax交叉熵
    cross_entropy_sum = (tf.reduce_sum(mask_r*cross_entropy))#累加
    confidences_loss = cross_entropy_sum / outer_size * head[0] #求平均得到最后的分类损失

    # box的损失计算 
    boxes_mask = tf.reshape(
        tf.cast(tf.greater(confidences, 0), 'float32'), (outer_size, 1, 1))
    #公式中的delta，每个groundtruth的置信度

    # ture_boxes: boxes(from labels),
    residual = (true_boxes - pred_boxes) * boxes_mask
    boxes_loss = tf.reduce_sum(tf.abs(residual)) / outer_size * head[1]#得到框坐标损失

    # 由于采用了rezoom layer，因此有rezoom loss计算
    # 输入：
    rezoom_loss_input = true_boxes, pred_boxes, confidences, boxes_mask, pred_confs_deltas, pred_boxes_deltas, mask_r
    # rezoom loss
    delta_confs_loss, delta_boxes_loss = _compute_rezoom_loss(hypes, rezoom_loss_input)
    ...
        error = (perm_truth[:, :, 0:2] - pred_boxes[:, :, 0:2]) \
            / tf.maximum(perm_truth[:, :, 2:4], 1.)
        square_error = tf.reduce_sum(tf.square(error), 2)
        inside = tf.reshape(tf.to_int64( tf.logical_and(tf.less(square_error, 0.2**2), tf.greater(classes, 0)) ), [-1])

        # 置信度交叉loss
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred_confs_deltas, labels=inside)

        # 残差置信度损失
        delta_confs_loss = tf.reduce_sum(cross_entropy*mask_r) \
        / outer_size * hypes['solver']['head_weights'][0] * 0.1 

        # 残差框坐标损失
        delta_unshaped = perm_truth - (pred_boxes + pred_boxes_deltas)
        delta_residual = tf.reshape(delta_unshaped * pred_mask,
                                [outer_size, hypes['rnn_len'], 4])
        sqrt_delta = tf.minimum(tf.square(delta_residual), 10. ** 2)
        delta_boxes_loss = (tf.reduce_sum(sqrt_delta) /
                        outer_size * head[1] * 0.03)

    # 常规loss部分
    loss = confidences_loss + boxes_loss + delta_boxes_loss + delta_confs_loss

    # 正则化部分
    reg_loss_col = tf.GraphKeys.REGULARIZATION_LOSSES
    weight_loss = tf.add_n(tf.get_collection(reg_loss_col), name='reg_loss')
    # total loss
    total_loss = weight_loss + loss
    # 上述loss 由 losses对象统一管理：
        losses['total_loss'] = total_loss
        losses['loss'] = loss
        losses['confidences_loss'] = confidences_loss
        losses['boxes_loss'] = boxes_loss
        losses['weight_loss'] = weight_loss
        if hypes['use_rezoom']:
            losses['delta_boxes_loss'] = delta_boxes_loss
            losses['delta_confs_loss'] = delta_confs_loss

STEPS_5 define train_op

操作：定义训练的操作

train_op = optimizer.training(hypes, losses, global_step, learning_rate)
...
    # 选择hypes.json中预设的优化函数
    if sol['opt'] == 'Adam':
            opt = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=sol['epsilon']) #代入参数构建一个优化器

    # 得到总损失的梯度和方差
    grads_and_vars = opt.compute_gradients(total_loss)
    # 梯度修剪
    clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
    grads_and_vars = zip(clipped_grads, tvars)

    # 更新操作
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

    # 应用梯度和方差进行优化
    with tf.control_dependencies(update_ops):
            train_op = opt.apply_gradients(grads_and_vars,
                                           global_step=global_step)

STEPS_6 evaluation

操作：评估,比较lodits(预测值)与labels，计算精度

eval_list = objective.evaluation(hypes, image, labels, decoded_logits, losses, global_step)
...
    # 精度
    a = tf.equal(tf.cast(confidences, 'int64'), tf.argmax(pred_confidences_r, 3))
    accuracy = tf.reduce_mean(tf.cast(a, 'float32'), name='/accuracy')

    # 在训练过程中的评估，会保存一些在中间结果
    # pred的中间结果
    #根据得到的置信度画框，返回有框的图片
    #py_func函数将log_image转换成tensorflow op
    pred_log_img = tf.py_func(log_image,
                              [images, test_pred_confidences,
                               test_pred_boxes, global_step, 'pred'],
                              [tf.float32])

    # groundtruth果
    true_log_img = tf.py_func(log_image,
                              [images, confidences,
                               mask, global_step, 'true'],
                              [tf.uint8])

STEPS_7
以上操作所得结果及部分参数由graph对象统一管理：

    graph['losses'] = losses  
    graph['eval_list'] = eval_list  
    graph['summary_op'] = summary_op  
    graph['train_op'] = train_op  
    graph['global_step'] = global_step  
    graph['learning_rate'] = learning_rate  
    graph['decoded_logits'] = learning_rate

流图创建好后，在会话开始运行流图操作

STEPS_1 prepare session

操作：准备流图会话

tv_sess = core.start_tv_session(hypes)

STEPS_2 logging record

操作：训练过程写入日志的操作

STEPS_3 begin feed input data

操作：开始加载数据

modules['input'].start_enqueuing_threads(hypes, queue, 'train', sess)
...
    # 从DATA/KittiBox/train.txt中记录的image地址和labels地址加载

STEPS_4 start training after all is ready

操作：所有操作都准备完毕后，运行训练

run_training(hypes, modules, tv_graph, tv_sess)
...
    # 展示间隔次数，展示训练信息
    display_iter = hypes['logging']['display_iter'] #200

    # 保存变量到监视器的间隔
    write_iter = hypes['logging'].get('write_iter', 5*display_iter) #1000

    # 评估间隔次数
    eval_iter = hypes['logging']['eval_iter']#800

    # 保存checkpoint到本地磁盘的间隔
    save_iter = hypes['logging']['save_iter'] #2000

    # 保存中间图片的间隔
    image_iter = hypes['logging'].get('image_iter', 5*save_iter) #10000