贝叶斯网络神经网络_使用贝叶斯多尺度神经网络进行人群计数

最新推荐文章于 2024-06-08 08:25:29 发布

weixin_26632369

最新推荐文章于 2024-06-08 08:25:29 发布

阅读量1.1k

点赞数

文章标签：神经网络 python

原文链接：https://towardsdatascience.com/crowd-counting-using-bayesian-multi-scale-neural-networks-4e3d46cd048b

版权

本文介绍了如何利用贝叶斯多尺度神经网络进行人群计数，结合了贝叶斯网络和神经网络的优势，为解决密集场景中的人群计数问题提供了一种新方法。

摘要由CSDN通过智能技术生成

贝叶斯网络神经网络

Convolutional Neural Networks based on estimating the density map over the image has been highly successful for crowd counting. However dense crowd counting remains an open problem because of severe occlusion and perspective view in which people can be present at various shape and sizes. This blog presents our research work done on Crowd Counting by combining Convolutional Neural Networks and uncertainty quantification.

基于估计图像上的密度图的卷积神经网络已经非常成功地用于人群计数。但是，由于严重的遮挡和透视图，人们可以以各种形状和大小出现，因此密集的人群计数仍然是一个未解决的问题。该博客介绍了我们通过结合卷积神经网络和不确定性量化在人群计数方面所做的研究工作。

重要事项 (Important Points)

We propose a new network which uses a ResNet based feature extractor, downsampling block using dilated convolutions and upsampling block using transposed convolutions.
我们提出了一种新网络，该网络使用基于ResNet的特征提取器，使用扩张卷积的下采样块和使用转置卷积的上采样块。
We present a novel aggregation module which makes our network robust to the perspective view problem.
我们提出了一种新颖的聚合模块，该模块使我们的网络对透视图问题具有鲁棒性。
We present the optimization details, loss functions and the algorithm used in our work.
我们介绍了优化细节，损失函数和我们的工作中使用的算法。
We used ShanghaiTech, UCF-CC-50 and UCF-QNRF datasets for training and testing.
我们使用ShanghaiTech，UCF-CC-50和UCF-QNRF数据集进行培训和测试。
Using MSE and MAE as evaluation metrics, our network outperforms previous state-of-the-art approaches while giving uncertainty estimates in a principled bayesian manner.
使用MSE和MAE作为评估指标，我们的网络优于以前的最新方法，同时以有原则的贝叶斯方式给出不确定性估计。

介绍 (Introduction)

Crowd Counting has a range of applications like counting the number of participants in political rallies, social and sports events, etc.

人群计数具有多种应用，例如计算政治集会，社交和体育赛事等参与者的数量。

Crowd Counting is a difficult problem especially in dense crowds due to two main reasons:

人群计数是一个困难的问题，尤其是在人群密集的情况下，原因有两个：

There is often clutter, overlap and occlusions present.
经常会出现混乱，重叠和遮挡的情况。
In perspective view it is difficult to take into account the shape and size of object present with respect to the background.
在透视图中，很难考虑相对于背景存在的对象的形状和大小。

A lot of algorithms have been proposed in the literature for tackling this problem. Most of them use some form of convolutional neural network along with a density map estimation which predicts a density map over the input image and then summing to get the count of objects.

在文献中已经提出了许多算法来解决这个问题。他们中的大多数人都使用某种形式的卷积神经网络以及密度图估计，该估计可预测输入图像上的密度图，然后进行求和以获得对象计数。

数据集 (Datasets)

The following datasets were used in this work for training and testing the network:

以下数据集用于这项工作中的网络训练和测试：

ShanghaiTech is made up of two datasets labelled as part A and part B. In part A, there are 300 images for training and 182 images for testing while Part B has 400 training images and 316 testing images.
ShanghaiTech由两个分别标记为A部分和B部分的数据集组成。在A部分中，有300张训练图像和182张测试图像，而B部分有400张训练图像和316张测试图像。
UCF-CC-50 contains 50 gray images with different resolutions. The average count for each image is 1,280, and the minimum and maximum counts are 94 and 4,532, respectively.
UCF-CC-50包含50个具有不同分辨率的灰度图像。每个图像的平均计数为1,280，最小和最大计数分别为94和4,532。
UCF-QNRF is the third dataset used in this work which has 1535 images with 1.25 million point annotations. The training set has 1,201 images and 334 images are used for testing.
UCF-QNRF是这项工作中使用的第三个数据集，其中包含1535张带有125万个点注释的图像。训练集包含1,201张图像，其中334张图像用于测试。

网络架构 (Network Architecture)

The network architecture used in this work is described in the following points:

在以下几点中描述了此工作中使用的网络体系结构：

A ResNet based feature extractor is used with dilated convolutions which is defined as a downsampling block. This helps in extracting the details of objects at various scales hence solving the perspective view problem faced by earlier approaches.
基于ResNet的特征提取器与膨胀卷积一起使用，膨胀卷积定义为下采样块。这有助于提取各种比例的对象的细节，从而解决了早期方法所面临的透视图问题。
The upsampling block uses transposed convolutions with skip connections in between the two creating an additional pathway thus avoiding overfitting.
上采样模块使用转置卷积和两者之间的跳过连接来创建额外的路径，从而避免过拟合。
The last part has three heads: output of density map which when integrated gives the absolute count, epistemic uncertainty and aleatoric uncertainty heads.
最后一部分有三个头：密度图的输出(集成后给出绝对计数)，认知不确定性和无意不确定性头。

The network architecture along with layerwise details used in this work is shown in Figure 1:

图1显示了此工作中使用的网络体系结构以及分层详细信息：

Image for post — Figure 1: Our Neural network architecture

Where 1×1, 3×3 denotes Filters, 64, 128, 256 denotes Receptive Field, conv denotes Dilated Convolutional layer and conv-2 denotes Transposed convolutional layer.

其中1×1、3×3表示滤波器，64、128、256表示接收场，conv表示扩张卷积层，conv-2表示转置卷积层。

优化 (Optimization)

To solve the vanishing gradient problem, instance normalization was used after both dilated convolutional and transposed convolutional layers as defined in Equation 1:

为了解决消失的梯度问题，在等式1定义的膨胀卷积层和转置卷积层之后使用实例规范化：

where w and b are weight and bias term of the convolution layer, γ and β are weight and bias term of the Instance Normalization layer, µ and σ are mean and variance of the input.

其中w和b是卷积层的权重和偏差项，γ和β是实例规范化层的权重和偏差项，µ和σ是输入的均值和方差。

We propose a new technique to aggregate the filters with sizes 1×1, 3×3, 5×5. ReLU is applied after every convolutional and transposed convolutional layer. Our novel aggregation module used is shown in Figure 2:

我们提出了一种新技术来聚合大小为1×1、3×3、5×5的过滤器。 ReLU在每个卷积层和转置的卷积层之后应用。我们使用的新颖的聚合模块如图2所示：

The filter branches make our network robust and can be extended by using more filters to tackle crowd counting in dense scenes. Our aggregation modules stacked on top of each other behave as ensembles thus minimizing overfitting which is a challenge while training deep networks.

过滤器分支使我们的网络更强大，可以通过使用更多过滤器来扩展密集场景中的人群计数来进行扩展。我们的聚合模块彼此堆叠在一起，因此表现得很整体，从而最大程度地减少了过拟合，这在训练深度网络时是一个挑战。

损失函数 (Loss Functions)

Most existing work uses pixelwise Euclidean loss for training the network. This gives a measure of estimation error at pixel level which is defined in Equation 2:

现有的大多数工作都使用像素级欧几里得损失来训练网络。这给出了在像素级的估计误差的度量，如公式2所示：

where θ denotes a set of the network parameters, N is the number of pixels in density maps, X is the input image and Y is the corresponding ground truth density map, F(X, θ) denotes the estimated density map.

其中θ表示一组网络参数，N是密度图中的像素数，X是输入图像，Y是相应的地面真实密度图，F(X，θ)表示估计的密度图。

We also incorporate SSIM index in our loss to measure the deviation of the prediction from the ground truth. It computes similarity between two images from three local statistics, i.e. mean, variance and covariance. The range of SSIM values is from -1 to 1 and it is equal to 1 when the two images are identical. SSIM index is defined in Equation 3:

我们还将SSIM指标纳入损失中，以衡量预测与基本事实的偏差。它从三个局部统计量(即均值，方差和协方差)计算两幅图像之间的相似度。 SSIM值的范围是-1至1，并且当两个图像相同时，它等于1。 SSIM索引在公式3中定义：

where C1 and C2 are small constants to avoid division by zero. The loss function can be written by averaging over the integral as shown in Equation 4:

其中C1和C2是小的常数，以避免被零除。可以通过对积分求平均值来编写损失函数，如公式4所示：

where N is the number of pixels in density maps. LS gives a measure of the difference between the network predictions and ground truth. The final loss function by adding the two terms can be written as shown in Equation 5:

其中N是密度图中的像素数。 LS给出了网络预测和地面真实性之间差异的度量。可以通过将两个项相加来计算最终损失函数，如公式5所示：

where αC and αS are constants. In our experiments, we set both αC and αS as 0.5 to give equal weightage to both the terms.

其中αC和αS是常数。在我们的实验中，我们将αC和αS均设置为0.5，以使两个项的权重相等。

评估指标 (Evaluation Metrics)

For crowd counting, Mean Absolute Error (MAE) and Mean Squared Error (MSE) are commonly used for quantitative comparison. These metrics are defined in Equation 6 and Equation 7 respectively:

对于人群计数，平均绝对误差(MAE)和均方误差(MSE)通常用于定量比较。这些度量分别在公式6和公式7中定义：

where N is the number of test samples, Ci and CGTi are the estimated and ground truth count corresponding to the i th sample which is given by the integration of the density map.

其中N是测试样本的数量，Ci和CGTi是与第i个样本相对应的估计值和地面真实计数，该值由密度图的积分给出。

MAE shows the accuracy of predicted result while MSE measures the robustness of prediction.

MAE表示预测结果的准确性，而MSE衡量预测的鲁棒性。

不确定度估算 (Uncertainty Estimation)

There are two main sources of uncertainty in model predictions: epistemic uncertainty is uncertainty due to our lack of knowledge and aleatoric uncertainty is due to stochasticity present in the data. Epistemic uncertainty is often called model uncertainty and it can be explained away given enough data. Using bayesian neural networks in which the weights are parameterized by distributions instead of point estimates, epistemic uncertainty can be computed. However crowd counting requires understanding the inherent nuances of the data like occlusions, scale ambiguity etc, hence aleatoric uncertainty is also important.

模型预测的不确定性主要来自两个方面：认知不确定性是由于我们缺乏知识而引起的不确定性，而不确定性是由于数据中存在的随机性而引起的。认知不确定性通常称为模型不确定性，只要有足够的数据就可以解释。使用贝叶斯神经网络(其中权重由分布而不是点估计来参数化)，可以计算认知不确定性。但是，人群计数需要了解数据的固有细微差别，例如遮挡，比例模糊等，因此，不确定的不确定性也很重要。

算法 (Algorithm)

The algorithm used in this work is shown below:

该工作中使用的算法如下所示：

实验结果 (Experimental Results)

As shown in Table 2, our method obtains the lowest Mean Square Error (MSE) and Mean Absolute Error (MAE) on both subset of ShanghaiTech dataset.

如表2所示，我们的方法在ShanghaiTech数据集的两个子集上获得了最低的均方误差(MSE)和均值绝对误差(MAE)。

As shown in Table 3, our method obtains the lowest MSE and MAE on UCF CC 50 dataset.

如表3所示，我们的方法在UCF CC 50数据集上获得最低的MSE和MAE。

As shown in Table 4, our method obtains the lowest MSE and MAE on UCF-QNRF dataset.

如表4所示，我们的方法在UCF-QNRF数据集上获得最低的MSE和MAE。

The number of parameters of our proposed network is the least compared to previous works as shown in Table 5:

与之前的工作相比，我们建议的网络参数数量最少，如表5所示：

Figure 3 and Figure 4 respectively illustrate the qualitative results for sample images from the ShanghaiTech and UCF-QNFRF datasets respectively.

图3和图4分别显示了来自ShanghaiTech和UCF-QNFRF数据集的样本图像的定性结果。

More red color represents higher uncertainty.

红色越多，表示不确定性越高。

The two conclusions to be drawn from the above two figures are:

从以上两个数字可以得出两个结论：

Both epistemic uncertainty and aleatoric uncertainty are corelated especially where the crowd density is high.
认知不确定性和语音不确定性是相互关联的，尤其是在人群密度较高的地方。
The model is less certain in dense crowds hence uncertainty is high in those locations.
该模型在人口密集的人群中不确定性较高，因此在这些位置的不确定性很高。

结论 (Conclusions)

In this blog, we presented a novel neural network for crowd counting which is based on a ResNet based feature extractor and a new feature aggregation module. The downsampling blocks use dilated convolutional layers while upsampling blocks use transposed convolutional layers. Skip connections in between the blocks create an additional pathway thus preventing overfitting. We presented the optimization details, loss functions and algorithms used in this work. Our method not only outperforms previous state of the art methods but also gives a measure of uncertainty thus solving the famous black box problem of neural networks.

在此博客中，我们介绍了一种新颖的用于人群计数的神经网络，该网络基于基于ResNet的特征提取器和新的特征聚合模块。下采样块使用膨胀的卷积层，而上采样块使用转置的卷积层。块之间的跳过连接会创建一条附加路径，从而防止过度装配。我们介绍了这项工作中使用的优化细节，损失函数和算法。我们的方法不仅优于现有技术，而且给出了不确定性的度量，从而解决了著名的神经网络黑盒问题。

翻译自: https://towardsdatascience.com/crowd-counting-using-bayesian-multi-scale-neural-networks-4e3d46cd048b

贝叶斯网络神经网络

weixin_26632369

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
贝叶斯网络神经网络_使用贝叶斯多尺度神经网络进行人群计数

贝叶斯网络神经网络Convolutional Neural Networks based on estimating the density map over the image has been highly successful for crowd counting. However dense crowd counting remains an open problem because o...
复制链接

扫一扫