Vehicle Identification Via Sparse Representation


    In this paper, we propose a system using video cameras to perform vehicle identification.We tackle this problem by reconstructing an input by using multiple linear regression models and compressed sensing, which provide new ways to deal with three crucial issues in vehicle identification, namely, feature extraction, online vehicle identification database

buildup, and robustness to occlusions and misalignment. The results show

the capability of the proposed approach.

Index Terms—Sparserepresentation, vehicle identification.





States conduct traffic monitoring for many reasons, includinghighway planning and design or motor vehicle enforcement. Trafficmonitoring can be classified into two different types, namely, flowmonitoring and route monitoring. Flow monitoring will observe theamount of traffic flowing through an interested checkpoint, whereasroute monitoring will identify the route of an interested vehicle.Unlike flow monitoring, route monitoring generally needs to knowthe identity of the observed vehicle and is generally more difficult.This route monitoring capability can provide valuable information forfreight logistics analysis, forecast modeling, and future transportationinfrastructure planning.


For vehicle detection using a video or image sequence, the mostobvious approach has been to first compute the stationary background(BG) image and then identify the moving vehicles as those pixels inthe image that significantly differ from the BG, which is named BGsubtraction [1]. However, traffic shadows cause serious problems whendoing subtraction, and slow moving or stationary traffic is difficult todetect. This led to the emergence of the adaptive BG methods [1],[2]. After BG subtraction, connected regions in the foreground (FG)image, namely, blobs, will be associated with different vehicles andtracked over time using different algorithms, such as cross correlation[3], mean shift [4], etc. Moreover, learning-based systems and a hiddenMarkov model are proposed for on-road vehicle detection and trackingin [5]–[7], respectively.

   The ability of vehicle detection and tracking with video will enable us to further classify or identify interested vehicles. For videobased vehicle classification, there are many techniques concentratingon this work, such as support vector machines (SVMs) [8], principal component analysis (PCA) with neural networks (NNs) [9],a weighted k-nearest neighbor [10], and backpropagation NN [11].Unlike classification problems that classify different vehicles into different categories, the video-based vehicle identification problem is

to maintain the identity of a vehicle as it travels through multiplevideo camera sites. In [12], Zeng and Crisman proposed a color-based vehicle matching system with the highest reported true positiverate of 16.42%. However, their experimental setup was too ideal to reflect real traffic conditions. The proposed system needs to know the average time for vehicles to travel from site 1 to site 2 to reduce the number of candidate vehicles for matching. It is very likely that one cannot find a corresponding vehicle in the candidate set since the size of the candidate set for their systemis typically eight vehicles.

     Moreover, Kogut and Trivedi [13] combined color features and thespatial organization of vehicles within platoons to improve the identification accuracy (IA). A maximum positive match rate of 45% was reported in their work. Nevertheless, their results were based on only22 samples, which was too small to cover different traffic conditions.In addition, given a platoon of vehicles at site 2, it is very difficultto find the corresponding platoon from site 1 since the platoons of vehicles may significantly change when the two sites are far fromeach other. In this case, this algorithm will fail since its performanceideally depends on the spatial organization of the vehicles within theirplatoons. Another video-based vehicle identification system achievedimpressive performance by using multiple individual vehicle features,such as color, external dimensions, points of optical demarcation, etc. [14], [15]. However, this system needs specially designed hardware fortop-down camera views, where each camera also needs to be manuallycalibrated before performing identification. Moreover, all their results were obtained by using highly overlapped vehicle databases, wherethe overlap rate is about 85%. Thus, the performance of low overlapped data for their system is still unknown. Nevertheless, the resultsobtained from previous vehicle identification research [12]–[15] a reall under some given restraints, which makes it unclear how theperformance of a video-based identification system would be withoutthe a forementioned restraints. Recently, Wright et al. have proposeda face recognition algorithm [16] using sparse representation, whichoffers very competitive performance for face recognition. Moreover,sparse representation is also employed for scene, object, and patternclassification in [17]–[19]. Based on the idea of sparse representationfor objection classification and identification, we propose a video-based vehicle identification framework in this paper. The constructedsystem was designed and tested under a realistic setup, in contrast withthe a forementioned limitations in previous research.Here are the main contributions and accomplishments of our proposed system.

1) We use video cameras to capture the critical information of vehicles for the purpose of vehicle tracking when they enter the state, and we use additional video cameras to track their routes. Unlike [14] and [15], our system does not need specially designed hardware or the calibrated cameras. Moreover, the cameras can be placed at the side of the highway, which makes our system easier to deploy.

2) We treat the problem of vehicle identification from different video sources as a signal reconstruction out of multiple linear regression models and use rising theories from an emerging signal processing area—compressive sensing to solve this problem.



By employing Bayesian formalism to compute the l 1 minimization of the sparse weights,the proposed framework provides new

ways to deal with three crucial issues in vehicle identification,namely,feature extraction,online vehicle identification database building, and robustness to occlusion and misalignment. For feature extraction, we use the simple downsampled features that offer good identification performance as long as the features space is sparse enough. The theory also provides a validation scheme to decide if a newly entering vehicle has been already included in the database. Moreover, by taking advantage of downsample-based features, one can easily introduce features of newly entering vehicles into the vehicle identification data- base without using training algorithms, e.g., PCA [9]. Finally, Bayesian formalism provides a measure of confidence of each sparse weight.

3) Different from previous research [12]–[15], where only about 100 vehicles were used for testing and the testing databases

were highly overlapped, we conduct extensive experiments on different types of vehicles on interstate highways to verify the

efficiency and accuracy of our proposed system. In our experiments, more than 1200 vehicles were used for testing, and the

overlap rate of the testing databases is less than 48%. The results show that the proposed framework works well on all kinds of





Ourvehicleidentificationsystemisabletodetect,track,andidentifyeach vehicle and transmit vehicle information to a service center forfurther route tracking and other traffic monitoring tasks. The systemIncludes threemaincomponents,namely,videocameras,aservicecenter, and clients. Video cameras are used to gather traffic information,includingenvironmentconditions,illuminationconditions,andvehicleinformation. In addition, there are several parallel video cameras thatare setup along the side of the highway. These video cameras shouldbe reliable, network accessible, of high resolution, and high speed. Wepropose to use the Axis 223M network cameras. The service centercomponent, which is the most critical part, collects the images fromvideo cameras and employs our vehicle identification algorithm toachieve the identification results. The clients are terminals that querythe identification results from the service center and produce reports ofdesired statistics and routing information.


B. Process Flow

The process flow for the video camera feeds used for vehicleidentification is shown in Fig. 1. Each video feed is sent to the servicecenter for further processing, e.g., the ith video camera VC(i) inFig. 1. At the service center, the images from the video camera areprocessed by the video processor module, which performs FG/BGdetecting, blob detecting, blob tracking, moving direction, and speeddetecting to extract features contributing to a unique vehicle ID. Then,these vehicle IDs from different video cameras will be saved into adatabase with corresponding indices. When a vehicle ID, e.g., thevehicle ID from VC(j), is requested by the client, the given vehicleinformation of VC(j) will be compared with other VC databasesVC(1),...,VC(m) except VC(j), where m denotes the total numberof VCs. If a corresponding ID is found in VC(k), k ?= j, it will reportthat this vehicle was captured in the kth VC; otherwise, it will report−1, which means that this vehicle has not been captured by any VCsbefore.

A. Vehicle Detecting and Tracking

Vehicle detecting and tracking is the first stage for any furtheridentification processing. The four main components in our vehicledetecting and tracking scheme are shown as the video processingsection in Fig. 1. 1) FG/BG detecting: We adopt the approach in [2],which provides an adaptive BG mixture model for real-time trackingby modeling the values of any pixel as a mixture of Gaussians.This method is robust for lighting changes, tracking through clutteredregions, slow-moving objects, and so on. 2) Blob detecting: Our blob tector is implemented based on [20] to detect any newly entering object in each frame using the output from the FG/BG estimation

module. 3) Blob tracking: The blob tracking module provides a way to track blobs from the current frame to the next frame [4]. 4) Moving direction and speed detecting: it is accomplished by using optical flow estimation [21], which tries to calculate the motion between two video

frames at times t and t + τ. In our scheme, we use the blobs with the same index in different video frames to calculate the optical flow. The aforementioned algorithms offer high sensitivity for blob detecting and tracking; however, the false-positive rate (FPR) could be high due to clutter from the motion of leaves and grass. Moreover, we may only be interested in one direction of traffic flow. To tackle these issues, we utilize three filters to exclude these unwanted blobs.


1) The blob histogram (BH) filter excludes blobs where the number of observations from different video frames for each given blob ID is less than τ BH times, where τ BH is a predeterminedthreshold.

2) The motion distance (MDs) filter excludes blobs whose moving distance is less than a given threshold τ MDs (in pixels).

3) The motion direction (MDr) filter excludes blobs whose motion direction is not the same as the preassigned direction τ MDr (right, left, up, down, etc.).

B. Vehicle Identification Via Sparse Representation

and Bayesian Formalism

A basic problem in vehicle identification is to determine if a newly entering vehicle has already been registered in a database or not and to find a corresponding vehicle ID if such a record exists. The core idea of the proposed vehicle identification algorithm is based on sparse representation, where a similar idea was used in [16] for face recognition.

1) Sparse Representation of a Vehicle: Before generating sparse representation for a vehicle and finding its corresponding vehicle ID, we will first arrange the database into matrices, which are built using labeled training samples from M different vehicles. Here, we assume that k i denotes the number of training images for the ith vehicle ID, where i = 1,...,M, and k = k 1 + k 2 + ···k M denotes the number of images in the database. Then, we reshape each w × h image into a column vector ν ∈ R c , where c = wh; the k i training images from the ith vehicle ID constitute the columns of a matrix Φ i = [ν i,1 ,ν i,2 ,...,ν i,k i ] ∈ R c×k i ; and all k images from the database are combined to form a new matrix Φ = [Φ 1 ,Φ 2 ,...,Φ M ] =

[ν 1,1 ,ν 1,2 ,...,ν M,k M ] ∈ R c×k . For a newly entering vehicle u ∈ R c , if sufficient training samples in the database share the same feature as the incoming vehicle (e.g.,this happens when the incoming vehicle was previously captured, e.g., with a vehicle ID i), then the vehicle can be approximately represented as the linear combination of the training samples in Φ i , i.e., y = Φ i θ = θ i,1 ν i,1 + θ i,2 ν i,2 + ··· + θ i,k i ν i,k i (1)

where θ = [θ i,1 ,θ i,2 ,...,θ i,k i ] T , and θ i,j ∈ R, j = 1,2,...,k i . However, we do not know the identity of the incoming vehicle at the beginning. Fortunately, we can instead represent incoming vehicle y ∈ R c using the entire set of images in the database with a relatively small increase in computation complexity. The linear combination of all the training samples is written as y = Φx s = [Φ 1 ,Φ 2 ,...,Φ M ]x s (2) where, with a high probability, x s = [0,...,0,θ i,1 ,θ i,2 ,...,θ i,k i , 0,...,0] T ∈ R k is a coefficient vector that only has nonzero entries

for those associated with the ith vehicle ID.




To find x s that can accurately determine the identity of the incoming

vehicle, we need to solve linear equation y = Φx. In general, measurement data may be noisy; thus, y may not be represented as the sparse combination of training samples exactly. Thus, (2) will be rewritten as

y = Φx s + Υ z (3)

where Υ z ∈ R c is noise and has bounded energy ?Υ z ? 2 < ε. Never-

theless, this is an underdetermined equation, and it does not have a

unique solution x s . To solve sparse solution x s without NP-hard, it

turns out to be an l 1 -norm minimization problem, i.e.,

ˆ x = argmin?x? 1 subject to ?Φx − y? 2 ≤ ε. (4)



2) Sparse Solution Via Bayesian Formalism: To find the sparse

solution for the l 1 -norm minimization problem, numerous methods

have been proposed, such as orthogonal matching pursuit [22], least


methods [24], sparsity adaptive matching pursuit [25], and gradient

method [26]. However, the aforementioned methods only provide

approximate sparse solutions and do not tell how likely the given

solutions are optimum. Therefore, we will use Bayesian formalism

instead, which returns both a sparse solution x and the probability

information indicating the uncertainty of the solution from actual

sparse x. Our approach is based on [27] by extending Tipping’s

relevance vector machine (RVM) theory [28].

First, we assume that x is the sum of two parts x b and x e (thus,

x = x b + x e ), where x b ∈ R k is the vector composed of nonzero

entries only at the L largest coefficients of x, and x e ∈ R k is the

vector composed of nonzero entries only at the rest of the coefficients.

Moreover, since we assume that measurements can be noisy as in (3),

the vector corresponding to a vehicle y is rewritten as

y = Φx + Υ z = Φx b + Φx e + Υ z = Φx b + Υ e + Υ z = Φx b + Υ


where Υ e = Φx e . Using the central limit theorem [29], we assume

that both Υ e and Υ z are zero mean and approximately Gaussian

distributed, then Υ = Υ e + Υ z can be approximated as Gaussian

noise with zero mean and unknown variance σ 2 . Then, the

Gaussian likelihood is given by

p(y|x b ,σ 2 ) = (2πσ 2 ) −c/2 exp



2σ 2

?y − Φ b ? 2


. (6)

Given Φ and y, the problem now is to estimate sparse vector x b and

noise variance σ 2 . By Bayes’ rule, we have

p(x b ,σ 2 |y) =

p(y|x b ,σ 2 )p(x b ,σ 2 )


. (7)

Note that x b is sparse and can be modeled by a Laplace distribution

[30]. However, the Laplace prior is not conjugate to the Gaussian

likelihood, and thus, the inference problem cannot be written in closed

form [30]. Thus, instead of the Laplace prior, we will perform a hierar-

chical sparseness prior [28] that has similar properties as the Laplace

prior and thus allows convenient conjugate exponential analysis on x b .

Then, based on the priors defined according to [28], the posterior can

be decomposed as

p(x b ,α,σ 2 |y) = p(x b |y,α,σ 2 )p(α,σ 2 |y) (8)

