(Still in process) MultiBin Reproduction Using Transfer Learning（使用迁移学习复现MultiBin）

本文链接：https://blog.csdn.net/qq_42266322/article/details/122093675

论文题目：3D Bounding Box Estimation Using Deep Learning and Geometry

1. 环境搭建

系统：ubuntu18.04
显卡+驱动：Nvidia TITAN Xp + CUDA 11.2 + cuDNN 8.2.132
深度学习GPU环境搭建：python 3.8.10 + tensorflow-gpu 2.5.0 + keras-nightly 2.5.0 + keras-preprocessing 1.1.2
深度学习CPU环境搭建：python 3.8.12 + tensorflow-cpu 2.7.0 + keras 2.7.0
其他依赖功能包：graphviz pydot opencv-python ipython numpy

报错：I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
解决：
```
python
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
```
报错：fit_generator() got an unexpected keyword argument 'max_q_size'
解决：内部训练队列的最大大小，改为max_queue_size

报错：Unable to import SGD and Adam from 'keras.optimizers
解决：

from keras.optimizers import adam_v2
minimizer = adam_v2.Adam(lr=1e-5)

报错：TypeError: 'range' object does not support item assignment
解决：将上面例子的代码： a = range(0,N)改为a = list(range(0,N))
报错：Invalid argument: TypeError: 'NoneType' object is not subscriptable
解决：问题出在以下代码，检查后发现在某一张图片无法读取，为空，进入数据集后查看图片未发现问题，暂将该图片移出，代码成功运行。
```
img = copy.deepcopy(img[ymin:ymax+1,xmin:xmax+1]).astype(np.float32)
```
警告：WARNING:tensorflow:'period' argument is deprecated. Please use 'save_freq' to specify the frequency in number of batches seen.
解决：
```
checkpoint  = ModelCheckpoint('weights_20111219.hdf5', monitor='val_loss', verbose=1, save_best_only=True, mode='min', save_freq=1) 
```
警告：UserWarning: The 'lr' argument is deprecated, use 'learning_rate' instead.
解决：
```
minimizer  = SGD(learning_rate=0.0001)
```

警告：UserWarning: 'Model.fit_generator' is deprecated and will be removed in a future version. Please use 'Model.fit', which supports generators.
解决：

model.fit(train_gen,
		 steps_per_epoch = 2000, # np.floor(all_exams/batch_size), 
		 epochs = 500, 
		  verbose = 1, 
		  callbacks = [early_stop, checkpoint, tensorboard], 
		  validation_data = valid_gen,
		  validation_steps = valid_num,
		  class_weight = None, 
		  max_queue_size = 3, 
		  workers = 1, 
		  use_multiprocessing = False, 
		  shuffle = True, 
		  initial_epoch = 0)

2. 网络搭建

网络结构
采用迁移学习，使用训练好的vgg16网络来学习图像特征，重新学习部分卷积层的权重用以适应新的数据集。去掉vgg16网络的全连接层后，按照论文所示添加全连接层输出。
输入：裁剪并且resize后的图像
读取真值txt文件，将truncated和occluded值均大于0.1的车辆（Car、Van、Trunk）二维真值用作二维检测框，裁剪图像，按照输入尺寸进行resize，可对输入进行水平翻转，进行数据增强。
尺寸：224×224×3
输出：multi-task网络，输出都服务于三维检测，包括三维尺寸的回归、heading角的回归以及heading角所属BIN的置信度
尺寸：dimensions 3 heading BIN×2 confidence BIN
注：由于车辆的尺寸与其所属分类关联性强，因此dimensions是车辆实际尺寸与所属类别平均尺寸差值
损失函数
dimension_loss采用mean sqared error
confidence_loss 原文内容为：

The confidence loss Lconf is equal to the softmax loss of the confidences of each bin.

参考他人复现文章中的confidence_loss设定为：输出层采用softmax来激活，然后直接采用mean squared error。
这种方法与softmax loss定义存在出入：keras中的categorical_crossentropy或许也可以用于confidence的损失函数。
补充：确实可以直接用categorical_crossentropy。
按照softmax loss定义实现的loss function：
```
def softmax_loss(y_true, y_pred):
    loss = - y_true * tf.math.log(tf.clip_by_value(y_pred,1e-8,1.0))
    # loss = tf.reduce_sum(loss, axis=1)
	loss = tf.reduce_mean(loss)*2
	
    return loss
```
orientation_loss原文提到的为L2 loss，实现代码如下：
```
def orientation_loss(y_true, y_pred):
    y_pred = l2_normalize(y_pred)
    y_true = l2_normalize(y_true)

    loss = tf.square(y_true[:,:,0]-y_pred[:,:,0]) + tf.square(y_true[:,:,1]-y_pred[:,:,1])

    return tf.reduce_mean(loss)
```
多次训练后发现orientation大小始终在0.55左右（范围为[0, 1]）
尝试使用cosine similarity（余弦相似度）用作loss function
实现代码如下：
```
def cosine_similarity(y_true, y_pred):
    y_pred = l2_normalize(y_pred)
    y_true = l2_normalize(y_true)
    
    loss = -(y_true[:,:,0]*y_pred[:,:,0] + y_true[:,:,1]*y_pred[:,:,1])
    
    return (tf.reduce_mean(loss)+1)/2
```
余弦相似度范围为[-1, 1]，-1代表向量方向一致，0代表向量垂直，1代表向量完全反向，未归一化处理时，训练结束的loss一直为-0.9左右，loss值为负可能会影响权值反向传播更新（不确定是否有影响），归一化后为[0, 1]值越小向量越相似。但是归一化之后训练得到的loss为0.55左右，同上L2 loss，近似于随机数结果。（输入的问题，不应该是r_y而是theta，待修改）
网络参数
论文中提及的训练参数：

Overlap: 0.1
Learning Rate: 0.0001
Optimizer: SGD
Iterations: 20K
Batch Size: 8
Best Model: chosed by cross validation

论文中进行了比较并且能取得较好效果的参数

Bins: 2
FC width of orientation: 256

迁移学习unfreeze的layers数量，尝试了0/2/4/8均在50个epoch前由于下降速率过度调动了EarlyStop停止训练。解冻层数较小时，dimensions的loss很大，解冻层数增加后对dimensions loss明显降低。但解冻所有layers后，仍在50个epoch前调动了EarlyStop，此时orientation和confidence的loss还很大。（迁移学习只需要对全连接层的权值进行训练，当学习层全部freeze之后，30个epoch已经能实现较好效果，待修改）

网络结构
使用vgg16卷积层+MultiBin FC层的网络结构如下：

def net_construct():
    # Construct the network
    # Use vgg-16 to get feature maps of images
    inputs = Input(shape=(224,224,3))
    base_model = VGG16(input_tensor=inputs, weights='imagenet', include_top=False)

    # for i, layer in enumerate(base_model.layers):
    #    if(i <= 6):
    #         layer.trainable = False

    x = base_model.output

    x = Flatten()(x)

    dimension   = Dense(512)(x)
    dimension   = LeakyReLU(alpha=0.1)(dimension)
    dimension   = Dropout(0.5)(dimension)
    dimension   = Dense(3)(dimension)
    dimension   = LeakyReLU(alpha=0.1, name='dimension')(dimension)

    orientation = Dense(256)(x)
    orientation = LeakyReLU(alpha=0.1)(orientation)
    orientation = Dropout(0.5)(orientation)
    orientation = Dense(BIN*2)(orientation)
    orientation = LeakyReLU(alpha=0.1)(orientation)
    orientation = Reshape((BIN,-1))(orientation)
    orientation = Lambda(l2_normalize, name='orientation')(orientation)

    confidence  = Dense(256)(x)
    confidence  = LeakyReLU(alpha=0.1)(confidence)
    confidence  = Dropout(0.5)(confidence)
    confidence  = Dense(BIN, activation='softmax', name='confidence')(confidence)

    model = Model(inputs=base_model.input, outputs=[dimension, orientation, confidence])

    return model

参考链接：
https://keras.io/zh/#_2
https://zhuanlan.zhihu.com/p/34044634
vgg16网络迁移学习图片的参考链接晚些补充
https://github.com/shashwat14/Multibin
https://github.com/smallcorgi/3D-Deepbox
https://github.com/experiencor/image-to-3d-bbox