(Still in process) MultiBin Reproduction Using Transfer Learning(使用迁移学习复现MultiBin)

论文题目:3D Bounding Box Estimation Using Deep Learning and Geometry

1. 环境搭建

系统:ubuntu18.04
显卡+驱动:Nvidia TITAN Xp + CUDA 11.2 + cuDNN 8.2.132
深度学习GPU环境搭建:python 3.8.10 + tensorflow-gpu 2.5.0 + keras-nightly 2.5.0 + keras-preprocessing 1.1.2
深度学习CPU环境搭建:python 3.8.12 + tensorflow-cpu 2.7.0 + keras 2.7.0
其他依赖功能包:graphviz pydot opencv-python ipython numpy

  1. 报错I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    解决

    python
    import os
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
    
  2. 报错fit_generator() got an unexpected keyword argument 'max_q_size'
    解决:内部训练队列的最大大小,改为max_queue_size

  3. 报错Unable to import SGD and Adam from 'keras.optimizers
    解决

    from keras.optimizers import adam_v2
    minimizer = adam_v2.Adam(lr=1e-5)
    
  4. 报错TypeError: 'range' object does not support item assignment
    解决:将上面例子的代码: a = range(0,N)改为a = list(range(0,N))

  5. 报错Invalid argument: TypeError: 'NoneType' object is not subscriptable
    解决:问题出在以下代码,检查后发现在某一张图片无法读取,为空,进入数据集后查看图片未发现问题,暂将该图片移出,代码成功运行。

    img = copy.deepcopy(img[ymin:ymax+1,xmin:xmax+1]).astype(np.float32)
    
  6. 警告WARNING:tensorflow:'period' argument is deprecated. Please use 'save_freq' to specify the frequency in number of batches seen.
    解决

    checkpoint  = ModelCheckpoint('weights_20111219.hdf5', monitor='val_loss', verbose=1, save_best_only=True, mode='min', save_freq=1) 
    
  7. 警告UserWarning: The 'lr' argument is deprecated, use 'learning_rate' instead.
    解决

    minimizer  = SGD(learning_rate=0.0001)
    
  8. 警告UserWarning: 'Model.fit_generator' is deprecated and will be removed in a future version. Please use 'Model.fit', which supports generators.
    解决

    model.fit(train_gen,
    		 steps_per_epoch = 2000, # np.floor(all_exams/batch_size), 
    		 epochs = 500, 
    		  verbose = 1, 
    		  callbacks = [early_stop, checkpoint, tensorboard], 
    		  validation_data = valid_gen,
    		  validation_steps = valid_num,
    		  class_weight = None, 
    		  max_queue_size = 3, 
    		  workers = 1, 
    		  use_multiprocessing = False, 
    		  shuffle = True, 
    		  initial_epoch = 0)
    

2. 网络搭建

  1. 网络结构
    采用迁移学习,使用训练好的vgg16网络来学习图像特征,重新学习部分卷积层的权重用以适应新的数据集。去掉vgg16网络的全连接层后,按照论文所示添加全连接层输出。
    vgg16 transfer learning
    在这里插入图片描述

  2. 输入:裁剪并且resize后的图像
    读取真值txt文件,将truncated和occluded值均大于0.1的车辆(Car、Van、Trunk)二维真值用作二维检测框,裁剪图像,按照输入尺寸进行resize,可对输入进行水平翻转,进行数据增强。
    尺寸:224×224×3

  3. 输出:multi-task网络,输出都服务于三维检测,包括三维尺寸的回归、heading角的回归以及heading角所属BIN的置信度
    尺寸:dimensions 3 heading BIN×2 confidence BIN
    注:由于车辆的尺寸与其所属分类关联性强,因此dimensions是车辆实际尺寸与所属类别平均尺寸差值

  4. 损失函数
    dimension_loss采用mean sqared error
    confidence_loss 原文内容为:

    The confidence loss Lconf is equal to the softmax loss of the confidences of each bin.

    参考他人复现文章中的confidence_loss设定为:输出层采用softmax来激活,然后直接采用mean squared error。
    这种方法与softmax loss定义存在出入:在这里插入图片描述keras中的categorical_crossentropy或许也可以用于confidence的损失函数。
    补充:确实可以直接用categorical_crossentropy。
    按照softmax loss定义实现的loss function:

    def softmax_loss(y_true, y_pred):
        loss = - y_true * tf.math.log(tf.clip_by_value(y_pred,1e-8,1.0))
        # loss = tf.reduce_sum(loss, axis=1)
    	loss = tf.reduce_mean(loss)*2
    	
        return loss
    

    orientation_loss原文提到的为L2 loss,实现代码如下:

    def orientation_loss(y_true, y_pred):
        y_pred = l2_normalize(y_pred)
        y_true = l2_normalize(y_true)
    
        loss = tf.square(y_true[:,:,0]-y_pred[:,:,0]) + tf.square(y_true[:,:,1]-y_pred[:,:,1])
    
        return tf.reduce_mean(loss)
    

    多次训练后发现orientation大小始终在0.55左右(范围为[0, 1])
    尝试使用cosine similarity(余弦相似度)用作loss function
    在这里插入图片描述实现代码如下:

    def cosine_similarity(y_true, y_pred):
        y_pred = l2_normalize(y_pred)
        y_true = l2_normalize(y_true)
        
        loss = -(y_true[:,:,0]*y_pred[:,:,0] + y_true[:,:,1]*y_pred[:,:,1])
        
        return (tf.reduce_mean(loss)+1)/2
    

    余弦相似度范围为[-1, 1],-1代表向量方向一致,0代表向量垂直,1代表向量完全反向,未归一化处理时,训练结束的loss一直为-0.9左右,loss值为负可能会影响权值反向传播更新(不确定是否有影响),归一化后为[0, 1]值越小向量越相似。但是归一化之后训练得到的loss为0.55左右,同上L2 loss,近似于随机数结果。(输入的问题,不应该是r_y而是theta,待修改)

  5. 网络参数
    论文中提及的训练参数:

    Overlap: 0.1
    Learning Rate: 0.0001
    Optimizer: SGD
    Iterations: 20K
    Batch Size: 8
    Best Model: chosed by cross validation

    论文中进行了比较并且能取得较好效果的参数

    Bins: 2
    FC width of orientation: 256

    迁移学习unfreeze的layers数量,尝试了0/2/4/8均在50个epoch前由于下降速率过度调动了EarlyStop停止训练。解冻层数较小时,dimensions的loss很大,解冻层数增加后对dimensions loss明显降低。但解冻所有layers后,仍在50个epoch前调动了EarlyStop,此时orientation和confidence的loss还很大。(迁移学习只需要对全连接层的权值进行训练,当学习层全部freeze之后,30个epoch已经能实现较好效果,待修改)

  6. 网络结构
    使用vgg16卷积层+MultiBin FC层的网络结构如下:

    def net_construct():
        # Construct the network
        # Use vgg-16 to get feature maps of images
        inputs = Input(shape=(224,224,3))
        base_model = VGG16(input_tensor=inputs, weights='imagenet', include_top=False)
    
        # for i, layer in enumerate(base_model.layers):
        #    if(i <= 6):
        #         layer.trainable = False
    
        x = base_model.output
    
        x = Flatten()(x)
    
        dimension   = Dense(512)(x)
        dimension   = LeakyReLU(alpha=0.1)(dimension)
        dimension   = Dropout(0.5)(dimension)
        dimension   = Dense(3)(dimension)
        dimension   = LeakyReLU(alpha=0.1, name='dimension')(dimension)
    
        orientation = Dense(256)(x)
        orientation = LeakyReLU(alpha=0.1)(orientation)
        orientation = Dropout(0.5)(orientation)
        orientation = Dense(BIN*2)(orientation)
        orientation = LeakyReLU(alpha=0.1)(orientation)
        orientation = Reshape((BIN,-1))(orientation)
        orientation = Lambda(l2_normalize, name='orientation')(orientation)
    
        confidence  = Dense(256)(x)
        confidence  = LeakyReLU(alpha=0.1)(confidence)
        confidence  = Dropout(0.5)(confidence)
        confidence  = Dense(BIN, activation='softmax', name='confidence')(confidence)
    
        model = Model(inputs=base_model.input, outputs=[dimension, orientation, confidence])
    
        return model
    

参考链接:
https://keras.io/zh/#_2
https://zhuanlan.zhihu.com/p/34044634
vgg16网络迁移学习图片的参考链接晚些补充
https://github.com/shashwat14/Multibin
https://github.com/smallcorgi/3D-Deepbox
https://github.com/experiencor/image-to-3d-bbox

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值