层级分类（续）-使用B-CNN(Branch CNN)实现

最新推荐文章于 2025-01-05 18:49:51 发布

胡扑扑

最新推荐文章于 2025-01-05 18:49:51 发布

阅读量7.7k

点赞数 14

分类专栏：机器学习文章标签： python 神经网络深度学习

本文链接：https://blog.csdn.net/weixin_43295229/article/details/105699180

版权

本文介绍了B-CNN（Branch Convolutional Neural Network）在层级分类中的应用，通过论文解析和实际实现，展示了如何构建和训练B-CNN模型，以解决多层次分类问题。文章涵盖了B-CNN结构、损失函数、分支训练策略以及初步的实现步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、上回介绍

承接上回多级分类留下的困惑，这一次更新分享一些取得实质的进展！
在上一篇博客中，我尝试建立这样一个结构的多层次化的分类模型：
在这里插入图片描述
然而实际上我构建的是一个这样的一个分类模型：

虽然最终的分类效果还不错，但我也清楚很大程度上是依赖于利用了训练好的VGG16模型，以及当前分类的数目相对较少。这样的实现是难以符合实际工程需求的，因为实际上很多情况下要分类的类别数目是成千上万的，这样简单的分类方式再加之以百万级别的数据，效果可想而知。所以很自然而然地会想到如何去实现一个像上上图一样的模型，通过层次化的分类，我认为可以提高整个模型应对更多类别以及更多的数据的能力。之前的疑惑也正在此，如何去实现这么一个层次化的模型？

二、B-CNN(Branch Convolutional Neural Network)

感谢校内老师的点拨！这一次发现了一个好东西，B-CNN（Branch CNN），很大程度上解除了我的疑惑，在此分享一下！

Zhu X , Bain M . B-CNN: Branch Convolutional Neural Network for Hierarchical Classification[J]. 2017.

2.1 B-CNN结构

接下来先通过简单分享几个论文的关键片段介绍一下B-CNN的关键构成！首先直接放出结构图！
在这里插入图片描述

A possible way to embed a hierarchy of classes into a CNN model is to output multiple predictions along the CNN layers as the data flow through, from coarse to fine. In this case, lower layers output coarser predictions while higher layers output finer predictions.

We name it Branch Convolutional Neural Network (B-CNN) as it contains several branch networks along the main convolution workflow to do predictions hierarchically.

即论文中认为要实现一个层次化的分类结构，使其能够完成从粗类到细类的分类功能，则应该让模型的lower layers输出粗类（大类）的分类结果而higher layer输出细类（小类）的分类结果。而该模型命名为B-CNN的原因也在于其包含了多个输出分支（如上图）以实现层次化的分类。

A B-CNN model uses existent CNN components as building blocks to construct a network with internal output branches. The network shown at the bottom in Figure 1a is a traditional convolutional neural network. It can be an arbitrary ConvNet with multiple layers. The middle part in Figure 1a shows the output branch networks of a B-CNN. Each branch net produces a prediction on the corresponding level in the label tree (Figure 1b, shown in same color). On the top of each branch, fully connected layers and a softmax layer are used to produce the output in one-hot representation. Branch nets can consist of ConvNets and fully connected neural networks. But for simplicity, in our experiments, we only use fully connected neural networks asour branch nets.

简要概括一下，即论文中所构建的B-CNN结构，在图中水平推进的方向上，（即从input到最后的fine prediction）大多使用的都是ConvNet构成的block，而其中垂直分支出来的部分，为了实验的方便则采用的是全连接层进行预测输出。

2.2 Loss Function&Loss Weight

在这里插入图片描述

When the image is fed into B-CNN, the network will output three corresponding predictions as the data flow through and each level’s loss will contribute to the final loss function base on the loss weights distribution.

论文中同样提到了Label-Tree这样一个概念，即针对不同层次的分类数目，由Coarse 1到最终的Fine（Coarse 3）拥有三种不同one-hot标签。而当一张图片输入进该模型后，整个模型会相应输出三个与层次对应的预测向量，然后每一个层次输出的loss将对应一个loss weight（损失权重）,而三个层次的loss则根据各自的weight合并构成最终模型的loss。

在这里插入图片描述
这里附上论文中所定义的损失函数。简要的解释一下，其中K代表的是不同的层次数，A代表的是损失权重（Loss weight），而最后那个logxxx则是交叉熵损失（Cross Entropy Loss）。这样的一个损失函数能够考虑到每一个层次的loss及它们各自对最终loss的贡献影响。

2.3 Branch Training Strategy-分支训练策略

他来了他来了，他带着策略走来了。我认为又让我眼前一亮的就是论文中提出的针对这个分支模型的训练策略。那么具体来说是怎么样一个策略呢？
论文中提出，通过调整各层次的loss weight，可以使模型在不同训练阶段（即epoch数目）的训练有所侧重（focus）。简单来说，比如一个两层的模型，起初损失权重的分配可以是[0.9,0.1]，即让模型focus于第一个层次的学习，而过了比如30个epoch后，权重分配可以是[0.2,0.8]，这样模型又将focus于第二个层次的学习训练。

This procedure requires the classifier to extract lower features first with coarse instructions and fine tune parameters with fine instructions later. It to an extent prevents the vanishing gradient problem which would make the updates to parameters on lower layers very difficult when the network is very deep.

如论文所阐述的，这样一个有所侧重分支训练策略具有可解释性，并且能够一定程度上减少深度学习常面对的梯度消失问题。

三、实际实现

ok接下来就针对B-CNN结合之前的代码和论文代码进行实现！这里要说明一下原论文代码在这里->B-CNN
首先是一些必要的包以及预设的一些值如下：

import keras
import numpy as np
import os
import cv2
import tensorflow as tf
from keras.models import Model
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, Input
from keras.initializers import he_normal
from keras import optimizers
from keras.callbacks import LearningRateScheduler
from keras.layers.normalization import BatchNormalization
from keras.utils.data_utils import get_file
from keras import backend as K
import matplotlib
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split


Image_width=150
Image_height=150
Image_channels=3
Image_size=(Image_width,Image_height)
Image_shape=(Image_width,Image_height,Image_channels)
batch_size=15

#不同层次的损失权重
alpha = K.variable(value

最低0.47元/天解锁文章