论文：Automated Vehicle Detection and Classification: Models, Methods, and Techniques翻译（二）

最新推荐文章于 2022-02-15 13:32:28 发布

zhouzhouzhou_li

最新推荐文章于 2022-02-15 13:32:28 发布

阅读量598

点赞数

分类专栏：外文翻译

外文翻译专栏收录该内容

6 篇文章 0 订阅

订阅专栏

3 VEHICLE TYPE RECOGNITION

The objective of Vehicle Type Recognition (VTR) is to classify vehicles into high-level categories such as van, mini-van, truck, sedan, bus, taxi, and so on. In VTR works, the exact make and model is not recognized. Having an automated VTR system helps in applications such as electronic toll collection and traffic studies and analyses. With the development of computer vision techniques and traffic surveillance cameras, vision-based VTR systems have gained a great deal of attention over the years. In this section,we review some of the representative VTRworks in terms of features extraction and representation, classification, and datasets.

车型识别（VTR）的目标是将车辆分类为高级别类别，如厢式货车，小型货车，卡车，轿车，公共汽车，出租车等。在VTR作品中，确切的品牌和型号是无法识别的。拥有自动VTR系统有助于电子收费和流量研究和分析等应用。随着计算机视觉技术和交通监控摄像机的发展，基于视觉的VTR系统多年来备受关注。在本节中，我们从特征提取和表示，分类和数据集的角度回顾了有代表性的VTRworks。

The challenges mentioned in Section 1 are a major hurdle for robust VTR systems. In the context of VTR, multiplicity refers to the issue of a vehicle type having diverse shapes, sizes, and appearance. Ambiguity, on the other hand, refers to the issue of different vehicle types being visually similar in terms of shape, size, or appearance. For example, many VTR systems have difficulty differentiating between an SUV and a minivan or between a sedan and a hatchback. Other challenges include occlusions and varying lighting conditions.

第1部分提到的挑战是强大的录像机系统的主要障碍。在VTR的背景下，多重性指的是具有不同形状，尺寸和外观的车辆类型的问题。另一方面，歧义指的是不同车辆类型在形状，尺寸或外观方面在视觉上相似的问题。例如，许多VTR系统难以区分SUV和小型货车，或轿车和掀背车之间的区别。其他挑战包括遮挡和变化的照明条件。

3.1 Review

To mitigate the above-mentioned challenges and issues in VTR, a variety of geometry-, texture-, and appearance-based feature extraction approaches have been explored in combination with different classification techniques. In Table 1, we summarize some of the representative VTR works in terms of features extraction and classification techniques.

为了缓解VTR中的上述挑战和问题，已经结合不同的分类技术探索了各种基于几何，纹理和外观的特征提取方法。在表1中，我们总结了一些有代表性的录像机作品的特征提取和分类技术。

3.1.1 Geometry-Based Approaches. One of the earliest works to explore VC was by Gupte et al. (2000, 2002), who used a stationary camera focused on highway scenes. Features applied in VC included the length and height of rectangular patches enclosing vehicle blobs. Their work provides limited classification for vehicles into two categories (trucks and non-trucks, or cars and non-cars), based on simple length and height riterion. Since the classification was based only on the ROI’s dimensions, fine-level classification of vehicle types may not be achievable. For example, a long sedan and a small SUV may have the same length and height. The technique used by Gupte et al. (2000, 2002) fails in cases of occluding vehicles (i.e., vehicles moving close together),as it considers them as a single vehicle. Abdelbaki et al. (2001) utilized laser intensity images to extract geometrical features such as length, width, height, width change patterns as well as speed. The width change patterns were used to detect presence of a trailer attached to the vehicle passing below the laser scanner. However, their work assumed only one vehicle per image at a time. It is not clear if their approach would work in cases of multiple and close-by vehicles in an image. Another length-based work suffering from close or occluding vehicles is that of Avery et al. (2004), who is tinguished between trucks and other vehicles. To overcome the occlusion problem, Huang and Liao (2004) proposed a method to detect occlusions through motion field analysis and separated the occluded vehicles. They developed a hierarchical classifier based on a given vehicle silhouette’s aspect ratio, compact ratio, length, and height. It first differentiated vehicles as large (bus, van truck, truck, trailer) or small (sedan, van, pick-up). Then, fine level classification distinguished between different types of large and small vehicles. Such a coarse-to-fine hierarchical scheme of classification seems promising, as it helps fine-level classifiers by reducing their search space. Due to incomplete or irregular silhouettes and the use of a silhouette dimensions-based classifier, and despite the successful separation of occluded vehicles, the system is highly prone to misclassification. Hence, using richer features is necessary.

3.1.1基于几何的方法。 Gupte等人最早探索VC的作品之一。（2000年，2002年），他使用了一台专注于高速公路场景的固定照相机。在VC中应用的特征包括包围车辆斑点的矩形块的长度和高度。他们的工作根据简单的长度和高度将车辆分为两类（卡车和非卡车，或轿车和非轿车）。由于分类仅基于投资回报率的维度，所以车辆类型的精细分类可能无法实现。例如，长轿车和小型SUV可能具有相同的长度和高度。 Gupte等人使用的技术（2000,2002）在闭塞车辆（即车辆靠近在一起）的情况下失败，因为它认为它们是单个车辆。 Abdelbaki等人（2001）利用激光强度图像来提取几何特征，如长度，宽度，高度，宽度变化模式以及速度。宽度变化模式被用于检测连接到通过激光扫描器下方的车辆的拖车的存在。但是，他们的工作一次只能假设一个图像。目前还不清楚他们的方法是否适用于图像中的多辆和近距离车辆。另一种长时间工作的车辆是由Avery等人提供的。（2004年），他在卡车和其他车辆之间进行了调整。为了克服遮挡问题，Huang和Liao（2004）提出了一种通过运动场分析来检测遮挡并将遮挡车辆分开的方法。他们根据给定的车辆轮廓的纵横比，紧凑比率，长度和高度开发了一个层次分类器。它首先将车辆区分为大型（巴士，厢式货车，卡车，拖车）或小型（轿车，货车，皮卡）。然后，对不同类型的大小型车辆进行细分级别的区分。这种从粗到精的分级分类方案似乎很有前景，因为它通过减少搜索空间来帮助细分级别的分类器。由于不完整或不规则的轮廓以及使用基于轮廓尺寸的分类器，并且尽管封闭式车辆成功分离，该系统极易发生错误分类。因此，使用更丰富的功能是必要的。

Chen et al. (2011) used various geometry- and shape-based features with classifiers such as Support Vector Machines (SVMs), Random Forest, and Model-based matching. A major limitation of model-based classification approaches is the need for camera calibration. Various geometrical features based on tail-lights, license plate (LP), and bounding box were used in Kafai and Bhanu (2012). For each tail-light, its width, distance from LP, and angle with the LP is measured. Also, the perpendicular distance from the LP’s centroid to a line connecting tail-lights’ centroids is taken. Other features include bounding box width and height, LP’s distance to bottom side of the bounding box, and area of the vehicle mask. For classification, they employed a hybrid dynamic Bayesian network.Wang et al. (2014) also used geometry-based features, such as length and height of vehicle silhouettes, but applied simple euclidean distance-based matching for classification.

陈等人（2011）使用各种基于几何和形状的特征，如支持向量机（SVM），随机森林和基于模型的匹配等。基于模型的分类方法的一个主要限制是需要进行相机校准。 Kafai和Bhanu（2012）使用了基于尾灯，车牌（LP）和边框的各种几何特征。对于每个尾灯，测量它的宽度，与LP的距离以及与LP的角度。而且，从LP的质心到连接尾灯的质心的线的垂直距离被采用。其他功能还包括边框宽度和高度，LP到边界框底边的距离以及车辆蒙版的面积。为了进行分类，他们采用了混合动态贝叶斯网络.Wang et al。（2014）也使用基于几何的特征，例如车辆轮廓的长度和高度，但应用简单的基于欧式距离的匹配进行分类。

3.1.2 Appearance-Based Approaches. Such approaches use appearance features based on edges, gradients, corners, or a combination of these to classify vehicles. Buch et al. (2009) proposed appearance-based features such as 3D-HOG (as an extension to the 2D Histogram of Oriented Gradients (Dalal and Triggs 2005)), based on 3D models to describe vehicle types, and performed model-based matching for classification. Simple vehicle blob features were used in Morris and Trivedi (2008) after transformation through Fisher’s Linear Discriminant Analysis (Belhumeur et al. 1997). A weighted k-Nearest Neighbor (wkNN) classifier was then employed to classify vehicles into eight types (Sedan, Pickup, SUV, Van, Merged, Bike, Truck, and Semi). Peng et al. (2012) employed eigenvectors of vehicle front faces in an adaptive multi-class PCA-based classifier to classify vehicles into Trucks, Buses, Minivans, Passenger cars, and Sedans. However, their approach is sensitive to changes in camera angle, vehicle orientation, and lighting conditions. de S. Matos and de Souza (2012) used edge-based features such as Edge Points Number to first classify a test sample as a large or small vehicle. Then, the PCA of image blocks are used as features to further classify a small vehicle into a motorcycle or car, and a large vehicle into a bus or truck, through an adaptive k-NN classifier. Another edge-based approach is described in Zhang et al. (2013) who used the Edge Orientation Histogram (EOH) (Levi and Weiss 2004) to represent vehicles. An EOH of an image is a collection of histograms describing the local edge orientation distribution of its sub-images. The training EOHs were used to obtain prototypes for each class (bus, light truck, car, and van) through clustering techniques such as Self-Organizing Maps (SOM) (Duda et al. 2000), Neural Gas (NG) (Witoelar et al. 2008), and K-Means. The selected prototypes from each class were then used to train Kernel Auto-associator (KAA) networks (Zhang et al. 2005, 2004). The classification was done using a hybrid KAA-based classifier with a reconstruction errors-based rejection option.

3.1.2基于外观的方法。这些方法使用基于边缘，梯度，拐角或这些的组合来对车辆进行分类的外观特征。 Buch等人（2009）提出基于外观的特征，例如3D-HOG（作为二维面向梯度直方图的扩展（Dalal and Triggs 2005）），基于3D模型描述车辆类型，并执行基于模型的分类匹配。 Morris和Trivedi（2008）通过Fisher线性判别分析（Belhumeur et al。1997）转换后，使用简单的车辆斑点特征。然后采用加权k-最近邻（wkNN）分类器将车辆分为八种类型（轿车，皮卡，SUV，厢式货车，合并，自行车，卡车和半挂车）。 Peng等人（2012）在自适应多类PCA分类器中采用了车辆前脸的特征向量，将车辆分为卡车，公共汽车，小型货车，轿车和轿车。但是，他们的方法对相机角度，车辆方向和照明条件的变化很敏感。 de S. Matos和de Souza（2012）使用基于边缘的特征，例如边缘点数来首先将测试样本分类为大型或小型车辆。然后，图像块的PCA被用作特征，以通过自适应k-NN分类器将小型车辆进一步分类为摩托车或汽车，并将大型车辆分类成公共汽车或卡车。 Zhang等人描述了另一种基于边缘的方法。（2013）使用边缘定位直方图（EOH）（Levi and Weiss 2004）来表示车辆。图像的EOH是描述其子图像的局部边缘取向分布的直方图的集合。通过聚类技术，例如自组织映射（SOM）（Duda et al.2000），Neural Gas（NG）（Witoelar et al。，2000），使用培训EOH获得每个班级（公交车，轻型卡车， al。2008）和K-Means。然后将所选的每个类别的原型用于训练核心自动关联器（KAA）网络（Zhang等，2005,2004）。该分类是使用基于混合KAA的分类器和基于重建错误的拒绝选项完成的。

Peng et al. (2014a) demonstrated the use of dense Boosted Binary Features of image patches embedded into a sparse global representation via Spatial Pyramid-based Features Quantization, similar to Lazebnik et al. (2006). A non-linear SVM with an intersection kernel was used to classify the global representations of different vehicle types. The patches were obtained by virtually dividing the vehicle image into a fixed grid and not based on key-points, which made their method prone to occlusion-related failures. As an extension, Peng et al. (2014b) used Histogram of Sparse Codes (Ren and Ramanan 2013) in a Spatial Pyramid-based structure with SVM classifiers. However, their approach is easily affected by occlusions or partial vehicle views and requires the proper alignment of vehicle ROIs.

Peng等人（2014a）通过基于空间金字塔的特征量化方法证明了使用嵌入到稀疏全局表示中的图像块的稠密Boosted Binary特征，类似于Lazebnik等。（2006年）。使用具有相交核的非线性SVM来分类不同车辆类型的全局表示。通过将车辆图像实际上划分为固定网格而不是基于关键点来获得补丁，这使得他们的方法容易出现与闭塞相关的故障。作为延伸，彭等人（2014b）在基于空间金字塔的SVM分类器结构中使用稀疏直方图（Ren and Ramanan 2013）。但是，他们的方法很容易受到遮挡或部分车辆视图的影响，并且需要适当调整车辆感兴趣区域。

Recently, motivated by the success of convolution neural networks (CNN) in other image classification problems (Garcia and Delakis 2004; Krizhevsky et al. 2012; Szegedy et al. 2014; Zeiler and Fergus 2014; Zhang et al. 2016; Cao et al. 2016; Shen et al. 2016), researchers have started exploring CNNs for VTR as well. For example, He et al. (2015a) exploit a CNN for both vehicle detection and classification. The CNN layers were used to generate image features representations, while SVM was used as the final classifier. Zhou and Cheung (2016) also utilized a CNN for VTR in two settings. In the first setting, a pre-trained CNN of Krizhevsky et al. (2012) was used to extract features that were then classified by a linear SVM classifier. In the second setting, the pre-trained CNN model was fine-tuned on the authors’ dataset and directly used for features extraction and classification. The former approach yielded their best results, outperforming techniques based on Fisher Vectors (Sánchez et al. 2013).

最近，在其他图像分类问题（Garcia和Delakis 2004; Krizhevsky等2012; Szegedy等2014; Zeiler和Fergus 2014; Zhang等2016; Cao等人2016; Shen et al。2016），研究人员也开始探索VTR的CNNs。例如，He等人（2015a）利用CNN进行车辆检测和分类。 CNN层被用来生成图像特征表示，而SVM被用作最后的分类器。 Zhou和Cheung（2016）在两种设置中也使用了CNN作为录像机。在第一种情况下，Krizhevsky等人预先训练的CNN（2012）被用来提取特征，然后通过线性SVM分类器进行分类。在第二种设置中，预先训练的CNN模型在作者的数据集上进行了微调，并直接用于特征提取和分类。前一种方法取得了最好的结果，超越了基于Fisher载体的技术（Sánchezet al。2013）。

Other works based on CNN include Dong et al. (2014) and Dong et al. (2015), which proposed learning useful local and global features through a two-stage CNN and applied softmax regression as the CNN’s output layer for classification. While an unsupervised approach was employed in Dong et al. (2014), a semi-supervised approach was proposed in Dong et al. (2015). In the semisupervised CNN, a large dataset of unlabelled data was used for unsupervised learning of the convolution filters and a small dataset of labelled data was utilized for supervised learning of the output layer parameters. Their work outperformed the works of Petrovic and Cootes (2004a), Psyllos et al. (2011), and Peng et al. (2012).

其他基于CNN的作品还包括董等人（2014年）和董等（2015），他们提出了通过两阶段CNN学习有用的局部和全局特征，并应用softmax回归作为CNN的分类输出层。董等人采用了无监督的方法（2014），董等人提出了半监督方法（2015年）。在半监督的CNN中，将大量未标记数据用于卷积滤波器的无监督学习，并将标记数据的小数据集用于监督学习输出层参数。他们的作品胜过Petrovic和Cootes（2004a），Psyllos等人（2011）和Peng等人（2012年）的作品。

3.1.3 Texture-Based Approaches. Another important class of discriminative image features is texture. Many works in the computer vision field have employed texture-based features (Mammeri et al. 2014b). To detect vehicles using texture, Zhang et al. (2007) applied a texture descriptor known as Multi-Block Local Binary Patterns (MB-LBP) and an AdaBoost classifier based on multi-branch regression trees. The basic LBP operator builds a binary string for every pixel, considering its relationships with its 3×3 neighborhood pixels. In the multi-scale version of LBP, known as the MS-LBP (Ojala et al. 2002), neighborhoods of larger scales were considered for pixel comparisons. In contrast, instead of pixel-wise comparisons, the MB-LBP compares average pixel intensities in sub-blocks of an image patch. So, the central rectangle’s average intensity is compared to the neighboring rectangles’ average intensities to build the binary string representation.We find that, for vehicle detection, texture-based approaches have not gained attention, possibly owing to their high computational cost, high sensitivity to noise, and distortions.

3.1.3基于纹理的方法。另一类重要的判别性图像特征是纹理。计算机视觉领域的许多作品都采用了基于纹理的特征（Mammeri et al。2014b）。为了使用纹理检测车辆，Zhang等人（2007）应用了一种称为多块本地二进制模式（MB-LBP）的纹理描述符和基于多分支回归树的AdaBoost分类器。基本的LBP运算符为每个像素构建一个二进制字符串，并考虑它与其3×3邻域像素的关系。在被称为MS-LBP（Ojala et al。2002）的LBP的多尺度版本中，考虑用于像素比较的更大尺度的邻域。相反，MB-LBP不是逐像素比较，而是比较图像块的子块中的平均像素强度。因此，将中心矩形的平均强度与相邻矩形的平均强度进行比较以建立二进制串表示。我们发现，对于车辆检测，基于纹理的方法尚未受到关注，可能是由于其计算成本高，灵敏度高噪音和扭曲。

3.1.4 Mixed Approaches. Ma and Grimson (2005) employed implicit and explicit shape models based on edges and modified Scale-Invariant Feature Transform (SIFT) for describing vehicle types and used a two-class Bayesian Decision Rule for classification. Mithun et al. (2012) used a combination of geometrical, shape-invariant and texture-based features extracted from multiple time-spatial images. The geometrical features such as width, area, compactness, aspect ratio, ratio of fitting ellipse’s axes, and rectangularity were used to first differentiate between the size categories of vehicles (e.g., two-wheeler, four-wheeler, six-wheeler). Then, the type of vehicle within a particular size-category is determined based on shape-invariant image moments (Gonzalez and Woods 2002) and texture-based statistical features such as mean, variance, skewness, and entropy of pixel values of key vehicular blobs. The classification is done in two levels, employing a k-NN in each. At the first level, a k-NN predicts the size-category; at the second level, a k-NN predicts the type-class. Chen et al. (2012a) employed a combination of geometrical measurements-based features and appearance-based features called Intensity Pyramid HOG (IPHOG), which were classified through SVMs.

3.1.4混合方法。 Ma和Grimson（2005）采用基于边缘和修正尺度不变特征变换（SIFT）的隐式和显式形状模型来描述车辆类型，并使用两类贝叶斯决策规则进行分类。 Mithun等人。（2012）使用从多个时空图像中提取的几何，形状不变和基于纹理的特征的组合。宽度，面积，紧凑度，长宽比，拟合椭圆轴比率和矩形等几何特征用于首先区分车辆（例如，两轮车，四轮车，六轮车）的尺寸类别。然后，根据形状不变的图像矩（Gonzalez and Woods 2002）和基于纹理的统计特征（例如关键车辆斑点的像素值的均值，方差，偏度和熵）来确定特定尺寸范围内的车辆类型。分类分为两个层次，每个层次采用k-NN。在第一级，k-NN预测大小类别;在第二级，k-NN预测类型级别。陈等人。（2012a）采用了基于几何测量的特征和基于外观的特征（称为强度金字塔HOG（IPHOG））的组合，这些特征通过SVM进行分类。

Utilizing 3D laser profiles and intensity images taken by laser scanners, Sandhawalia et al. (2013) extracted geometrical features (such as width, length, height), appearance-based features such as Fisher Vectors (Perronnin et al. 2010), and shape profiles. An advantage of their methods is the ability to distinguish between trucks, trucks with a single trailer, and trucks with double trailers. Although their techniques achieved high accuracies, it is unclear if these are applicable in real-time applications, as processing speeds were not reported. Moreover, it was noted that some materials exhibited low reflectivity (e.g., due to weather conditions or high speed), which led to missing readings in the laser profile. In cases where major part(s) of the vehicle’s surface may exhibit lowreflectivity, the resulting laser scanner profiles could be incomplete, causing classifier confusion. Since theirwork seems to assume one vehicle per image at a time, it is unclear if their technique can distinguish between the case of a vehicle with a trailer and the case of two different but close-by vehicles.

利用激光扫描仪拍摄的3D激光轮廓和强度图像，Sandhawalia等人（2013）提取了几何特征（如宽度，长度，高度），基于外观的特征，如Fisher矢量（Perronnin et al。2010）和形状轮廓。他们的方法的一个优点是能够区分卡车，单拖车卡车和双拖车卡车。虽然他们的技术达到了很高的精度，但是它们是否适用于实时应用还不清楚，因为处理速度没有报告。此外，据指出，一些材料表现出低反射率（例如，由于天气条件或高速），这导致激光轮廓中的读数缺失。在车辆表面的主要部件可能表现出低反射率的情况下，产生的激光扫描仪轮廓可能不完整，导致分类器混淆。由于他们的作品似乎每次只能拍摄一张图片，因此不清楚他们的技术是否可以区分带有拖车的车辆和两辆不同但靠近的车辆。

A combination of geometrical features (width, height, fractal dimensions, average block width, and height) and edge-based points-count features were used in de S. Matos and de Souza (2013). A hierarchical classifier based on Adaptive k-NN was first used to classify a test vehicle as large or small, and then into a Bus or Truck category (if large) or into a car or motorcycle category (if small). However, they concluded that using such geometrical features may cause confusions when differentiating between a bus or a truck due to similar lengths, widths, or heights.Moreover, occlusions could produce incorrect edge-point counts leading to further misclassification.

在S. Matos和de Souza（2013）中使用了几何特征（宽度，高度，分形维数，平均块宽度和高度）和基于边缘的点数特征的组合。首先使用基于自适应k-NN的分层分类器来将测试车辆分为大或小，然后分类为公共汽车或卡车类别（如果较大）或汽车或摩托车类别（如果较小）。然而，他们的结论是，由于长度，宽度或高度相似，使用这种几何特征可能会在区分公共汽车或卡车时引起混淆。而且，遮挡会产生不正确的边缘点数，从而导致进一步的错误分类。

3.2 Discussion

Most works assume a relatively static background and rely on background estimation or vehicle tracking, which adds computational complexity and limits these approaches in cases of inaccurate segmentation of vehicle ROIs, due to occlusions or changing lighting conditions. The solution proposed by Mithun et al. (2012) attempted to use only the key vehicular frames for background estimation, eliminating the need to analyze each frame and thereby reducing computational complexity. An interesting approach for overcoming lighting-related issues was proposed by Peng et al. (2012), who trained separate classifiers for different lighting conditions. For a test image, the lighting condition (e.g., day or night) is first identified to select the respective VTR classifier. Much work is needed to develop highly robust VTR systems for varying real-world lighting conditions.

大多数作品都假设相对静态的背景，并依赖于背景估计或车辆追踪，这增加了计算复杂性，并限制了这些方法在由于遮挡或变化的照明条件导致车辆ROI分割不准确的情况下。 Mithun等人提出的解决方案（2012）试图仅使用关键车载帧进行背景估计，不需要分析每个帧，从而降低计算复杂度。 Peng等人提出了一种克服照明相关问题的有趣方法。（2012年），他们针对不同的照明条件培训了单独的分类器。对于测试图像，首先识别照明条件（例如白天或夜晚）以选择相应的VTR分类器。需要做很多工作来开发适用于不同实际照明条件的高度稳定的VTR系统。

On the other hand, works that were not based on foreground masks (e.g., Buch et al. (2009)), suffered from classifier confusion in differentiating between vehicle types of similar sizes and/or appearance, such as between an SUV and a van or between a van and a bus. The approaches based on geometrical features are sensitive to occlusions and could lead to severe confusion between vehicle classes with similar lengths, widths, or heights. Texture-based approaches have not gained popularity in the VTR literature, possibly because of a large variation in texture of vehicles belonging to the same type-class and the high sensitivity and computational costs.

另一方面，不是基于前景的作品（例如Buch等人（2009））在区分类似尺寸和/或外观的车辆类型（例如在SUV和面包车之间）时出现分类器混淆或者面包车和公共汽车之间。基于几何特征的方法对闭塞敏感，并且可能导致具有相似长度，宽度或高度的车辆类别之间的严重混淆。基于纹理的方法在VTR文献中尚未普及，可能是因为属于相同类型的车辆的纹理的大的变化以及高灵敏度和计算成本。

The most popular approach has been to use appearance-based features for VTR. However, VTR systems that are not sensitive to real-world occlusions have not yet been achieved. Although geometrical features alone may have severe limitations, they may be employed for a first-level coarse vehicle classification by size (large, medium, or small). Richer appearance-based features could then be used for fine-level classification into specific types. Such hierarchical classification schemes seem to offer a promising direction for VTR research (Mithun et al. 2012; de S. Matos and de Souza 2013). Another interesting direction would be to develop VTR systems that do not require camera calibration. Much work is needed in VTR to tackle multiplicity, ambiguity, and occlusion-related issues.

最流行的方法是使用VTR的基于外观的功能。但是，对现实世界的遮挡不敏感的录像机系统尚未实现。尽管单独的几何特征可能具有严重的局限性，但它们可以用于按大小（大，中或小）的第一级粗糙车辆分类。然后可以使用更丰富的基于外观的功能将其细分为特定的类型。这种分层分类方案似乎为VTR研究提供了有希望的方向（Mithun et al。2012; de S. Matos和de Souza 2013）。另一个有趣的方向是开发不需要照相机校准的VTR系统。 VTR需要做很多工作来解决多重性，模糊性和遮挡相关问题。