MultiNet学习——KittiBox training processing

KittiBox training processing

源码

Box (Detection) training的网络框架

preparation

未设置的系统变量采用了默认值

import tensorvision.train as train  
import tensorvision.utils as utils  
No environment variable 'TV_PLUGIN_DIR' found. Set to '/home/keysen/tv-plugins'.
No environment variable 'TV_STEP_SHOW' found. Set to '50'.
No environment variable 'TV_STEP_EVAL' found. Set to '250'.
No environment variable 'TV_STEP_WRITE' found. Set to '1000'.
No environment variable 'TV_MAX_KEEP' found. Set to '10'.
No environment variable 'TV_STEP_STR' found. Set to 'Step {step}/{total_steps}: loss = {loss_value:.2f}; lr = {lr_value:.2e}; {sec_per_batch:.3f} sec (per Batch); {examples_per_sec:.1f} imgs/sec'.

预定义flags

  • ‘name’ : None
  • ‘project’ : None
  • ‘save’ : Ture, ‘是否保存the run, 输出保存在文件夹 TV_DIR_RUNS/debug’
  • ‘hypes’ : ‘hypes/kittiBox.json’ ‘存储模型参数’

导入KittiBox.json, 添加 ‘dirs’ 属性

hypes = json.load(f)
...
utils.set_dirs(hypes, tf.app.flags.FLAGS.hypes)#创建三个路径
...
train.initialize_training_folder(hypes)#在路径为RUNS/kittiBox../model_files下面添加文件,创建一个image文件夹和一个输出日志文件
"dirs": {
    "base_path": "/home/keysen/lingck/KittiBox/hypes", 
    "data_dir": "/home/keysen/lingck/KittiBox/hypes/../DATA", 
    "files_dir": "model_files", 
    "image_dir": "/home/keysen/lingck/KittiBox/hypes/../RUNS/kittiBox_2017_05_12_16.53/images", 
    "output_dir": "/home/keysen/lingck/KittiBox/hypes/../RUNS/kittiBox_2017_05_12_16.53"
  }   

输出文件保存

  • RUNS/KittiBox_date_/images :
  • RUNS/KittiBox_date_/model_files :
    • architecure.py
    • data_input.py
    • eval.py
    • hypes.py
    • objective.py
    • solver.py
  • RUNS/KittiBox_date_/output.log

Do Training

加载models_files中的文件,返回各文件句柄

modules = utils.load_modules_from_hypes(hypes) #加载几个训练相关的PY文件

modules

  • [‘input’] : ../inputs/Kitti_input.py
  • [‘arch’] : ../encoder/vgg.py
  • [‘objective’] : ../decoder/fastBox.py
  • [‘solver’] : ../optimizer/generic_optimizer.py
  • [‘eval’] : ../evals/kitti_eval.py

队列queue

属性

types : tf.float32  
grid_size : 39*12  
shapes = (  
    # image
    [384,1248,3],  
    # labels
    [12,39],# confidences  
    [12,39,4],# boxes  
    [12,39]$ mask  
)  
容量capacity : 30  

创建队列

queue = tf.FIFOQueue(capacity=capacity, dtypes=dtypes, shapes=shapes) #创建一个队列,队列长度为30,每个位置的shape是“shapes”

基于model_files创建tf流图

tv_graph = core.build_training_graph(hypes, queue, modules)

STEPS_1 input_image 的预变换

操作:从队列中取hypes[‘batch_size’]个成员,并对每个成员中的image元素进行了亮度和对比度的调整,返回image,labels(confidences, boxes, mask)对象

image, confidences, boxes, mask = q.dequeue_many(hypes['batch_size']) #从队列中移出5个元素batch_size=5
image = tf.image.random_brightness(image, max_delta=30) ##随机亮度变换
image = tf.image.random_contrast(image, lower=0.75, upper=1.25) #随机对比度变换

STEPS_2 build encoder-VGGNet

操作:构建完整的VGG16Net,用已训练的VGGNet模型为框架

logits = encoder.inference(hypes, image, train=True) #构架网络
...
    vgg_fcn = fcn8_vgg.FCN8VGG(vgg16_npy_path=vgg16_npy_path)

    num_classes = 2  # does not influence training what so ever
    vgg_fcn.wd = hypes['wd']

    vgg_fcn.build(images, train=train, num_classes=num_classes,
                  random_init_fc8=True) 

    vgg_dict = {'deep_feat': deep_feat,# pool_5,编码器的输出,解码器的公共特征
                'early_feat': vgg_fcn.conv4_3}

    return vgg_dict

CONV:
input:(5,384,1248,3)
activation function: RELU
Layer name: conv1_1,strides:(1,1,1,1), padding:’SAME’
Layer shape: (3, 3, 3, 64)
Layer name: conv1_2
Layer shape: (3, 3, 64, 64)
POOL:(1, 2, 2, 1),strides:(1, 2, 2, 1), padding:’SAME’
Layer name: conv2_1
Layer shape: (3, 3, 64, 128)
Layer name: conv2_2
Layer shape: (3, 3, 128, 128)
Layer name: conv3_1
Layer shape: (3, 3, 128, 256)
Layer name: conv3_2
Layer shape: (3, 3, 256, 256)
Layer name: conv3_3
Layer shape: (3, 3, 256, 256)
Layer name: conv4_1
Layer shape: (3, 3, 256, 512)
Layer name: conv4_2
Layer shape: (3, 3, 512, 512)
Layer name: conv4_3
Layer shape: (3, 3, 512, 512)

Layer name: conv5_1
Layer shape: (3, 3, 512, 512)
Layer name: conv5_2
Layer shape: (3, 3, 512, 512)
Layer name: conv5_3
Layer shape: (3, 3, 512, 512)
FULL CONECT(1*1 conv kernel):
Layer name: fc6
Layer shape: [7, 7, 512, 4096]
Layer name: fc7
Layer shape: [1, 1, 4096, 4096]
fc8(score_fr):
shape: (1, 1, 4096, 2)

PRED:

pred = tf.argmax(self.score_fr, dimension=3) # 获取指定维度最大值的索引值,也即是正向传播sorce的最大值索引


UPSCORE:

  • upscore2: score_fr层(实际跟pool5大小相同)向上采样
    • 采样后大小:[与pool4前三维大小一样,num_class]
    • num_class : 2
    • filter_size : [4, 4, 2, 2]
    • strides : [1, 2, 2, 1]
  • score_ppol4: 对pool4的评分
    • 输出大小: [与pool4前三维大小一样,num_class]
  • fuse_pool4: 融合upscore2 与 pool4
  • upscore4: fuse_pool4层向上采样
    • 采样后大小:[与pool3前三维大小一样,num_class]
    • num_class : 2
    • filter_size : [4, 4, 2, 2]
    • strides : [1, 2, 2, 1]
  • score_pool3: 对pool3的评分
    • 输出大小: [与pool3前三维大小一样,num_class]
  • fuse_pool3: 融合upscore4 与 pool3
  • upscore32: fuse_pool3层向上采样
    • 采样后大小:[与input_image前三维大小一样,num_class]
    • num_class : 2
    • filter_size : [16, 16, 2, 2]
    • strides : [1, 8, 8, 1]

PRED_UP:

self.pred_up = tf.argmax(self.upscore32, dimension=3) # 获取指定维度最大值的索引值,也即是上采样sorce的最大值索引

STEPS_3 build decoder-detectaion

操作:构建解码器部分,主要是detector解码器,调用objective.py(实际是fastBox.py)

decoded_logits = objective.decoder(hypes, logits, train=True)
...
    # 中间变化
    hidden_output = _build_inner_layer(hyp, encoded_features, train)
    # encoder_feature(pool5),经过1*1*500的卷积由(5, 12, 39, 512)大小变成(5*39*12,500)的hidden_output

    # 输出层output_layer
    pred_boxes, pred_logits, pred_confidences = _build_output_layer(hyp, hidden_output)
    # pred_boxes是encoder_feature的输出预测的框的位置:(5*12*39, 1, 4)
    # pred_logits是 没有经过softmax 的置信度:(5*12*39, 1, 2)
    # pred_confidences是 经过softmax 的置信度:(5*12*39, 1, 2)

    # 采用rezoom layer: 利用conv4_3 来提高输出的分辨率
    # 输入: 
    rezoom_input = pred_boxes, pred_logits, pred_confidences, early_feat, hidden_output
    # 输出:
    rezoom_output = _build_rezoom_layer(hyp, rezoom_input, train) # rezoom_output由以下 5 个元素组成
        ...
    # pred_boxes是encoder_feature的输出预测的框的位置:(5*12*39, 1, 4)
    # pred_logits是 没有经过softmax 的置信度:(5*12*39, 1, 2)
    # pred_confidences是 经过rezoom layer后再softmax 的置信度:(5*12*39, 1, 2)
    ...
    ip1 = tf.nn.relu(tf.matmul(delta_features, delta_weights1)) #(2340,128)
    delta_confs_weights = tf.get_variable('delta2', shape=[dim, hyp['num_classes']]) #(128,2)
    delta_boxes_weights = tf.get_variable('delta_boxes', shape=[dim, 4])#(128,4)
    rere_feature = tf.matmul(ip1, delta_boxes_weights) * 5 #(2340,4)
    pred_boxes_delta = (tf.reshape(rere_feature, [outer_size, 1, 4])) #(2340,1,4)
    feature2 = tf.matmul(ip1, delta_confs_weights) * scale
    pred_confs_delta = tf.reshape(feature2, [outer_size, 1, hyp['num_classes']]) #(2340,1,2)
    # pred_confs_delta是 经过rezoom layer后 没有 softmax 的置信度:(5*12*39, 1, 2)
    # pred_boxes_delta是 经过rezoom layer后 的输出预测框的位置‘

    # 然后由 变量 dlogits 统一管理:
        # dlogits['pred_confs_deltas'] = pred_confs_deltas 
        # dlogits['pred_boxes_deltas'] = pred_boxes_deltas 
        # dlogits['pred_boxes_new'] = pred_boxes + pred_boxes_deltas
        # dlogits['pred_boxes'] = pred_boxes 
        # dlogits['pred_logits'] = pred_logits  
        # dlogits['pred_confidences'] = pred_confidences
    return  dlogits

STEPS_4 Loss function

操作: 定义loss函数,用上述得到的pred_boxes、pred_logits、pred_confidences以及给定对应的labels构造损失函数

losses = objective.loss(hypes, decoded_logits, labels)
...
    # 置信度的损失计算
    # pred_class:pred_logits(置信度), ture_class: confidences(0或1)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=pred_classes, labels=true_classes) #计算标签和预测输出之间的softmax交叉熵
    cross_entropy_sum = (tf.reduce_sum(mask_r*cross_entropy))#累加
    confidences_loss = cross_entropy_sum / outer_size * head[0] #求平均得到最后的分类损失

    # box的损失计算 
    boxes_mask = tf.reshape(
        tf.cast(tf.greater(confidences, 0), 'float32'), (outer_size, 1, 1))
    #公式中的delta,每个groundtruth的置信度

    # ture_boxes: boxes(from labels),
    residual = (true_boxes - pred_boxes) * boxes_mask
    boxes_loss = tf.reduce_sum(tf.abs(residual)) / outer_size * head[1]#得到框坐标损失

    # 由于采用了rezoom layer,因此有rezoom loss计算
    # 输入:
    rezoom_loss_input = true_boxes, pred_boxes, confidences, boxes_mask, pred_confs_deltas, pred_boxes_deltas, mask_r
    # rezoom loss
    delta_confs_loss, delta_boxes_loss = _compute_rezoom_loss(hypes, rezoom_loss_input)
    ...
        error = (perm_truth[:, :, 0:2] - pred_boxes[:, :, 0:2]) \
            / tf.maximum(perm_truth[:, :, 2:4], 1.)
        square_error = tf.reduce_sum(tf.square(error), 2)
        inside = tf.reshape(tf.to_int64( tf.logical_and(tf.less(square_error, 0.2**2), tf.greater(classes, 0)) ), [-1])

        # 置信度交叉loss
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred_confs_deltas, labels=inside)

        # 残差置信度损失
        delta_confs_loss = tf.reduce_sum(cross_entropy*mask_r) \
        / outer_size * hypes['solver']['head_weights'][0] * 0.1 

        # 残差框坐标损失
        delta_unshaped = perm_truth - (pred_boxes + pred_boxes_deltas)
        delta_residual = tf.reshape(delta_unshaped * pred_mask,
                                [outer_size, hypes['rnn_len'], 4])
        sqrt_delta = tf.minimum(tf.square(delta_residual), 10. ** 2)
        delta_boxes_loss = (tf.reduce_sum(sqrt_delta) /
                        outer_size * head[1] * 0.03)

    # 常规loss部分
    loss = confidences_loss + boxes_loss + delta_boxes_loss + delta_confs_loss

    # 正则化部分
    reg_loss_col = tf.GraphKeys.REGULARIZATION_LOSSES
    weight_loss = tf.add_n(tf.get_collection(reg_loss_col), name='reg_loss')
    # total loss
    total_loss = weight_loss + loss
    # 上述loss 由 losses对象统一管理:
        losses['total_loss'] = total_loss
        losses['loss'] = loss
        losses['confidences_loss'] = confidences_loss
        losses['boxes_loss'] = boxes_loss
        losses['weight_loss'] = weight_loss
        if hypes['use_rezoom']:
            losses['delta_boxes_loss'] = delta_boxes_loss
            losses['delta_confs_loss'] = delta_confs_loss

STEPS_5 define train_op

操作: 定义训练的操作

train_op = optimizer.training(hypes, losses, global_step, learning_rate)
...
    # 选择hypes.json中预设的优化函数
    if sol['opt'] == 'Adam':
            opt = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=sol['epsilon']) #代入参数构建一个优化器

    # 得到总损失的梯度和方差
    grads_and_vars = opt.compute_gradients(total_loss)
    # 梯度修剪
    clipped_grads, norm = tf.clip_by_global_norm(grads, clip_norm)
    grads_and_vars = zip(clipped_grads, tvars)

    # 更新操作
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

    # 应用梯度和方差进行优化
    with tf.control_dependencies(update_ops):
            train_op = opt.apply_gradients(grads_and_vars,
                                           global_step=global_step)

STEPS_6 evaluation

操作:评估,比较lodits(预测值)与labels,计算精度

eval_list = objective.evaluation(hypes, image, labels, decoded_logits, losses, global_step)
...
    # 精度
    a = tf.equal(tf.cast(confidences, 'int64'), tf.argmax(pred_confidences_r, 3))
    accuracy = tf.reduce_mean(tf.cast(a, 'float32'), name='/accuracy')

    # 在训练过程中的评估,会保存一些在中间结果
    # pred的中间结果
    #根据得到的置信度画框,返回有框的图片
    #py_func函数将log_image转换成tensorflow op
    pred_log_img = tf.py_func(log_image,
                              [images, test_pred_confidences,
                               test_pred_boxes, global_step, 'pred'],
                              [tf.float32])

    # groundtruth果
    true_log_img = tf.py_func(log_image,
                              [images, confidences,
                               mask, global_step, 'true'],
                              [tf.uint8])

STEPS_7
以上操作所得结果及部分参数由graph对象统一管理:

    graph['losses'] = losses  
    graph['eval_list'] = eval_list  
    graph['summary_op'] = summary_op  
    graph['train_op'] = train_op  
    graph['global_step'] = global_step  
    graph['learning_rate'] = learning_rate  
    graph['decoded_logits'] = learning_rate  

流图创建好后,在会话开始运行流图操作

STEPS_1 prepare session

操作: 准备流图会话

tv_sess = core.start_tv_session(hypes)

STEPS_2 logging record

操作:训练过程写入日志的操作


STEPS_3 begin feed input data

操作:开始加载数据

modules['input'].start_enqueuing_threads(hypes, queue, 'train', sess)
...
    # 从DATA/KittiBox/train.txt中记录的image地址和labels地址加载

STEPS_4 start training after all is ready

操作:所有操作都准备完毕后,运行训练

run_training(hypes, modules, tv_graph, tv_sess)
...
    # 展示间隔次数,展示训练信息
    display_iter = hypes['logging']['display_iter'] #200

    # 保存变量到监视器的间隔
    write_iter = hypes['logging'].get('write_iter', 5*display_iter) #1000

    # 评估间隔次数
    eval_iter = hypes['logging']['eval_iter']#800

    # 保存checkpoint到本地磁盘的间隔
    save_iter = hypes['logging']['save_iter'] #2000

    # 保存中间图片的间隔
    image_iter = hypes['logging'].get('image_iter', 5*save_iter) #10000

检测的中间结果

  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
下面是一个使用PyTorch实现二元分类网络和多元分类网络联合训练的示例代码: ```python import torch import torch.nn as nn import torch.optim as optim # 定义二元分类网络模型 class BinaryNet(nn.Module): def __init__(self): super(BinaryNet, self).__init__() self.fc1 = nn.Linear(100, 64) self.fc2 = nn.Linear(64, 32) self.fc3 = nn.Linear(32, 1) self.sigmoid = nn.Sigmoid() def forward(self, x): x = self.fc1(x) x = self.fc2(x) x = self.fc3(x) x = self.sigmoid(x) return x # 定义多元分类网络模型 class MultiNet(nn.Module): def __init__(self): super(MultiNet, self).__init__() self.fc1 = nn.Linear(100, 64) self.fc2 = nn.Linear(64, 32) self.fc3 = nn.Linear(32, 5) self.softmax = nn.Softmax(dim=1) def forward(self, x): x = self.fc1(x) x = self.fc2(x) x = self.fc3(x) x = self.softmax(x) return x # 实例化二元分类网络和多元分类网络模型 binary_net = BinaryNet() multi_net = MultiNet() # 定义损失函数和优化器 criterion = nn.BCELoss() # 二元分类使用BCELoss,多元分类使用CrossEntropyLoss optimizer = optim.SGD(list(binary_net.parameters()) + list(multi_net.parameters()), lr=0.001) # 训练模型 for epoch in range(100): for i, (input_data, binary_label, multi_label) in enumerate(train_loader): optimizer.zero_grad() binary_output = binary_net(input_data) binary_loss = criterion(binary_output, binary_label) multi_output = multi_net(input_data) multi_loss = criterion(multi_output, multi_label) total_loss = binary_loss + multi_loss total_loss.backward() optimizer.step() ``` 在这个示例代码中,我们定义了一个二元分类网络模型和一个多元分类网络模型,并使用SGD优化器和BCELoss(对于二元分类)或CrossEntropyLoss(对于多元分类)损失函数进行训练。在每个训练batch中,我们分别计算二元分类和多元分类的损失,并将二者相加作为总的损失。最后,我们在优化器的step()函数中更新模型的参数。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值