1:首先从模型结构整体理解
重要网络结构:
MSB结构
网络架构应用于人群计数
2:从代码层面理解
2.1 MSB结构,就是四种卷积核对输入分别进行卷积,然后叠加起来(多种卷积核可以提取不同大小范围的特征,实现多尺度),然后经过BN归一化,再然后经过relu激活函数进行非线性变换,这就是MSB的结构了,还是很简单的。
def MSB(filters): """Multi-Scale Blob. Arguments: filters: int, filters num. Returns: f: function, layer func. """ params = {'activation': 'relu', 'padding': 'same', 'kernel_regularizer': l2(5e-4)} def f(x): x1 = Conv2D(filters, 9, **params)(x) x2 = Conv2D(filters, 7, **params)(x) x3 = Conv2D(filters, 5, **params)(x) x4 = Conv2D(filters, 3, **params)(x) x = concatenate([x1, x2, x3, x4]) x = BatchNormalization()(x) x = Activation('relu')(x) return x return f
2.2 MSCNN 结构,这个其实就是整体的网络架构了,它整合了基本的卷积和池化层,最重要结构就是还整合了MSB多尺度卷积,然后最后的输出他不是简单的全连接,它是通过1X1 的卷积核实现全连接的(有参数喔)(MLP的卷积,这种结构增加了模型的特征提取能力和功能)(实现过程就是,首选input->Conv2D->MSB->MaxPooling->MSB->MSB->Maxpooling->MSB->MSB-Conv2D->Conv2D),数据输入是224X224X3 ,最后输出是56X56X1,也就是3136个点
def MSCNN(input_shape): """Multi-scale convolutional neural network for crowd counting. Arguments: input_shape: tuple, image shape with (w, h, c). Returns: model: Model, keras model. """ inputs = Input(shape=input_shape) x = Conv2D(64, 9, activation='relu', padding='same')(inputs) x = MSB(4 * 16)(x) x = MaxPooling2D()(x) x = MSB(4 * 32)(x) x = MSB(4 * 32)(x) x = MaxPooling2D()(x) x = MSB(3 * 64)(x) x = MSB(3 * 64)(x) x = Conv2D(1000, 1, activation='relu', kernel_regularizer=l2(5e-4))(x) x = Conv2D(1, 1, activation='relu')(x) model = Model(inputs=inputs, outputs=x) return model