论文：Automated Vehicle Detection and Classification: Models, Methods, and Techniques翻译（三）

最新推荐文章于 2022-02-15 13:32:28 发布

zhouzhouzhou_li

最新推荐文章于 2022-02-15 13:32:28 发布

阅读量326

点赞数

分类专栏：外文翻译

外文翻译专栏收录该内容

6 篇文章 0 订阅

订阅专栏

4 VEHICLE MAKE RECOGNITION

A deeper level of vehicle classification is to recognize their make or manufacturer (VMR) rather than their type alone. Most works achieve this through vehicle logo recognition (VLR) (Huang et al. 2015). In this section, we shall outline the challenges in VMR, then review and discuss various feature extraction techniques and classification schemes that have been used in the literature. We presented an overview of challenges and issues that need to be tackled by AVC works in Section 1. Describing these issues in the context of VMR, Multiplicity refers to the problem of diverse shapes, designs, and appearances of various vehicle models within a make-class; Ambiguity refers to the problem that occurs when vehicles of different makes have similar appearance, shape, or design. The issues aggravate with changing viewpoints or vehicle orientations, lighting or environmental conditions, and modification to vehicles’ original appearance.

更深层次的车辆分类是识别他们的品牌或制造商（VMR），而不是单独确定其类型。大多数作品通过车辆标识识别（VLR）实现了这一点（Huang等人，2015年）。在本节中，我们将概述VMR中的挑战，然后回顾和讨论已在文献中使用的各种特征提取技术和分类方案。我们介绍了第1部分中AVC工作需要解决的挑战和问题的概述。在VMR的背景下描述这些问题，多重性指的是同一级别不同车型在不同形状，设计和外观的问题; 歧义是指当不同制造商的车辆具有相似的外观，形状或设计时发生的问题。这些问题随着视野或车辆朝向，照明或环境条件的改变以及车辆原有外观的改变而加剧。

4.1 Review

In most VMR works, a vehicle’s logo is detected first, followed by feature extraction or representation of the logo region, which is then used for classification. The work of Psyllos et al. (2010), Yu et al. (2013), Ou et al. (2014), and Badura and Skotnicka (2014) employed SIFT-like features (Lowe 1999) or SIFT-based representations of logos with different classification schemes. In Psyllos et al. (2010), a simple nearest-neighbors-based matching was done to match SIFT descriptors of a query image to the database’s descriptors. They then used generalized Hough Transform (GHT) (Ballard 1981; Grimson 1990) to find the database logo image closest to the query image. Finally, a geometrical validation step was employed to check the keypoints coordinates in the query and its matched database image. Badura and Skotnicka (2014) used the SIFT descriptors (Lowe 1999) from logo regions and employed an exhaustive matching strategy to find the closest database logo image.

在大多数VMR作品中，首先检测车辆的标识，然后是特征提取或标识区域的表示，然后用于分类。 Psyllos等人（2010），Yu等人（2013年），Ou等人（2014年），Badura和Skotnicka（2014年）采用了类似SIFT的特征（Lowe 1999）或基于SIFT的标志表示法，并使用不同的分类方案。在Psyllos等人（2010年）的研究中，一个简单的最近邻居为基础的匹配完成匹配查询图像的SIFT描述符到数据库的描述符。然后，他们使用广义Hough变换（GHT）（Ballard，1981; Grimson，1990）找到离查询图像最近的数据库标识图像。最后，采用几何验证步骤来检查查询中的关键点坐标及其匹配的数据库图像。 Badura和Skotnicka（2014）使用徽标区域的SIFT描述符（Lowe 1999），并采用详尽的匹配策略来查找最接近的数据库徽标图像。

Instead of using SIFT descriptors directly, Yu et al. (2013) proposed the use of Dense-SIFT features (Bosch et al. 2006) of logo images to build their Bag-of-Words-based representations (histograms) through a spatial pyramid scheme, following the approach of Lazebnik et al. (2006). A multi-class SVM classifier was shown to yield better results than k-NN and SIFT-matching. However, they did not consider occlusion scenarios in their work. Ou et al. (2014) also used local Dense- SIFT descriptors, but generated their Locality-constrained Linear Codes (LLCs) (Wang et al. 2010), which were then weighted and max-pooled through the spatial pyramid scheme of Lazebnik et al. (2006). For classification, a linear SVM was used. Although they used logo images with different scales, orientations, and rotations, occlusion scenarios were not considered.

Yu等人（2013）提出使用标志图像的稠密SIFT特征（Bosch et al.2006）来替代直接使用SIFT特征，通过Lazebnik等人（2006年）的方法空间金字塔方案建立他们基于词袋的表示（直方图）。结果显示，多类SVM分类器比k-NN和SIFT匹配产生更好的结果。但是，他们在工作中没有考虑遮挡情况。欧等人（2014）也使用局部稠密SIFT描述符，但生成了局部约束线性码（LLCs）（Wang et al。2010），然后通过Lazebnik等人的空间金字塔方案进行加权和最大汇集（2006年）。对于分类，线性SVM被使用。虽然他们使用不同尺度，方向和旋转的标志图像，但仍未考虑遮挡情景。

In Llorca et al. (2013), appearance-based features such as SIFT and Histogram of Oriented Gradients (HOG) features (Dalal and Triggs 2005), and texture-based features such as Local Binary Patterns (LBPs) (Ojala et al. 2002), were both explored for VMR. Using a multi-class SVM-based classifier, they found that texture-based features like LBPs are not suitable for VMR, either alone, or in combination with other types of features such as HOG.

Llorca等人（2013），基于外观的特征，如SIFT和面向梯度直方图（HOG）特征（Dalal and Triggs 2005）以及基于纹理的特征，如局部二元模式（LBPs）（Ojala et al。2002）探索VMR。使用多类SVM分类器，他们发现基于纹理的功能（如LBPs）不适用于VMR，无论是单独使用，还是与其他类型的功能（如HOG）结合使用。

Xiao et al. (2015) proposed Sharpness Histogram Features (SHF) to represent a logo, which has taken into account the sharpness of edge points and pooled them up in a histogram, forming a global feature representation of the logo. To overcome the difficulty of finding the optimal kernel function and parameters for SVM, they developed a weighted ensemble of multi-class SVM classifiers. The ensemble comprises different multi-class SVM classifiers that were trained with different kernel functions and parameters. Each multi-class SVM classifier in the ensemble was given a weight, according to its own correct classification rate in the test set. The final classification output of the ensemble was then made based on a weighted combination rule.

肖等人（2015）提出了清晰度直方图特征（Sharpness Histogram Features，SHF）来表示标志，该标志考虑了边缘点的锐度并将其汇集在直方图中，形成标志的全局特征表示。为了克服难以寻找SVM的最佳核函数和参数的问题，他们开发了一个多类SVM分类器的加权集合。该集合包括不同的多类SVM分类器，这些分类器用不同的内核函数和参数进行训练。根据测试集中它自己的正确分类率给集合中的每个多类SVM分类器赋予权重。然后根据加权组合规则制作集合的最终分类输出。

In an urban ITS, vehicle images captured from roads or crossings may be of low resolution or quality. The work of Peng et al. (2015) proposed Statistical Random Sparse Distribution (SRSD) features to tackle such cases. An SRSD feature describes the statistical distribution of a grayscale image based on correlations between random pixel pairs that are sparsely sampled. They used multi-scale scanning and the nearest neighbor model for classification. They concluded higher accuracies than HOG-SVM and SIFT-based approaches, but experience a greater time cost.Moreover, failures were encountered in cases of tilted or dirt-covered logos.

在城市ITS中，从道路或交叉口拍摄的车辆图像可能分辨率较低或质量较差。 Peng等人（2015）提出统计随机稀疏分布（SRSD）特征来处理这种情况。 SRSD特征描述了基于稀疏采样的随机像素对之间的相关性的灰度图像的统计分布。他们使用多尺度扫描和最近邻模型进行分类。他们得出的结论比基于HOG-SVM和SIFT的方法具有更高的精度，但是经历了更多的时间成本。另外，在倾斜或污垢覆盖的标志的情况下遇到了失败。

The VMR works based on logo recognition are dependent on accurate logo detection and segmentation, which is a challenging problem in real-world conditions such as lighting or environmental changes and occlusions. In an attempt to remove the dependency on precise logo detection, Huang et al. (2015) proposed Hierarchical Feature Maps (HFMs) extracted by a CNN from larger ROIs around logos. At different levels, different feature maps are obtained by convolutions with different kernels. Between consecutive convolutions, the images are down-sampled through maxpooling to reduce the feature maps’ resolution. The features are finally encoded into a 1D vector to be classified by a back-propagation neural network classifier, which is the last layer of the CNN structure. The authors of Huang et al. (2015) observed reduced accuracy in cases of logos with distorted views (rotation and orientation), complex structures, and blur. In addition, it is unclear how their technique would perform in cases of occluded logos.

基于标识识别的VMR工作依赖于准确的标识检测和分割，这是现实世界中的一个具有挑战性的问题，如照明或环境变化和遮挡。为了消除对标识检测精确的依赖，Huang等人（2015）提出了用CNN从标志附近的较大ROI中提取分层特征映射（Hierarchical Feature Maps，HFM）。在不同的层次上，不同内核的卷积可以得到不同的特征映射。在连续的卷积之间，图像通过maxpooling进行下采样以减少特征图的分辨率。这些特征最终被编码成一维矢量，并通过反向传播神经网络分类器进行分类，这是CNN结构的最后一层。 Huang等人（2015）观察到在视图扭曲（旋转和方向），结构复杂和模糊的情况下精度会降低。此外，目前还不清楚他们的技术如何在封闭标志的情况下执行。

In Table 2, we provide a brief summary of the features extraction, global representation and classification approaches employed in these recent and representativeVMRworks, and the number of classes and images used to validate their works.

在表2中，我们提供了这些最新和代表性的VMRworks中使用的特征提取，全局表示和分类方法以及用于验证其作品的类和图像的数量的简要总结

4.2 Discussion

A major limitation in VMR works is the reliance on localizing logo regions in input images from video streams. Many such works use license plates as cues around which they construct ROIs, assuming they contain vehicle logos. However, these approaches fail in cases when vehicles have varying locations of license plates. In addition,most approaches are inapplicable in real-time applications, due to time-intensive features representation techniques or slow logo detection schemes. Much research is needed to develop real-time VMR systems.

VMR工作中的一个主要限制是依赖本地化来自视频流的输入图像中的标志区域。许多此类作品使用车牌作为提示投资回报的线索，假设它们包含车辆标志。然而，当车辆具有不同的车牌位置时，这些方法失败。另外，由于时间密集的特征表示技术或缓慢的标识检测方案，大多数方法在实时应用程序中不适用。需要进行大量的研究来开发实时VMR系统。

The previous studies do not propose solutions for cases in which vehicle logos may be partially occluded or partially undetected. In these conditions, VMR performance of existing solutions would drastically deteriorate. Huang et al. (2015) tried to mitigate these issues by using a larger logo-ROI, instead of a precise ROI, thereby increasing the chance of having the logo completely enclosed in the ROI. This technique would still fail if logos were occluded.Moreover, they assumed the logo will be in a specific area above the license plates. For vehicles with logos not in these assumed regions, the VMR systems would fail. A potential direction to explore would be to remove the need for a logo detection and localization module from VMR systems, perhaps by considering the entire vehicle face, to increase processing speed and enhance accuracy under occlusions.

以前的研究并未提出车辆标识可能部分遮挡或部分未被检测到的解决方案。在这些情况下，现有解决方案的VMR性能将急剧恶化。 Huang等人（2015年）试图通过使用更大的标志 - 投资回报率而不是精确的投资回报率来缓解这些问题，从而增加将标志完全包含在投资回报率中的机会。如果徽标被遮挡，这种技术仍然会失败。而且，他们认为徽标将位于牌照上方的特定区域。对于徽标不在这些假定区域的车辆，VMR系统会失败。探索的一个潜在方向是消除对VMR系统的标识检测和定位模块的需求，或许通过考虑整个车辆的面部来提高处理速度并提高遮挡下的准确性。