资料翻译：使用运动历史梯度信息完成运动分割与姿态识别，Motion Segmentation and Pose Recognition with Motion History Gradients

最新推荐文章于 2022-07-04 19:39:17 发布

hardVB

最新推荐文章于 2022-07-04 19:39:17 发布

阅读量5.6k

点赞数

分类专栏：资料链接文章标签： image algorithm object reference processing orthogonal

资料链接专栏收录该内容

11 篇文章 1 订阅

订阅专栏

因为需要用到这两个牛人的程序，所以就翻译了这篇文章，方便自己加深理解。

翻译太差，请别骂我.

Motion Segmentation and Pose Recognition with Motion History Gradients

Gary R. Bradski
Intel Corporation, Microcomputer Research Labs
SC12-303, 2200 Mission College Blvd.
Santa Clara, CA 95052-8 1 19 USA
parv.bradski@intel.com

James Davis
MIT Media Lab
E15-390,20 Ames St.
Cambridge, MA 02 139 USA
jdavis@media.mit.edu

使用运动历史梯度信息完成运动分割与姿态识别

Abstract
This paper uses a simple method for representing motion
in successively layered silhouettes that directly encode
system time termed the timed Motion History Image
(tMHI). This representation can be used to both (a)
determine the current pose of the object and to (b)
segment and measure the motions induced by the object
in a video scene. These segmented regions are not
“motion blobs”, but instead motion regions naturally
connected to the moving parts of the object of interest.
This method may be used as a very general gesture
recognition “toolbox”. We use it to recognize waving
and overhead clapping motions to control a music
synthesis program.
摘要
文献使用了一种简单的方法来表达运动，将运动理解为连续的与时间关联的轮廓层次，
术语上称为带时间加权的运动历史图像。(tMHI).这种表达方法不仅可以用来决定当前的物体的位置，
并且可以利用物体在视频场景中的运动信息，来分割测量这些运动。
这些被分割的区域不是“运动块”，而是自然的连接到物体的运动部分。
这种方法可以被用作非常广泛的姿态识别通用方法。我们用它来识别上下挥舞手臂与举手拍掌的运动，
这样可以控制一个音乐程序了。

1. Introduction and Related Work
Three years ago, a PC cost about $2500 and a low end
video camera and capture board cost about $300. Today
the computer could be had for under $700 and an
adequate USB camera costs about $50. It is not surprising
then that there is an increasing interest in the recognition
of human motion and action in real-time vision. For
example, during these three years this topic has been
addressed by [[5][6][7][8][9][10][11][12][15] [17][24]]
among others. Several survey papers in this period have
reviewed computer vision based motion recognition [25],
human motion capture [[22][23]] and human motion
analysis [ 11. In particular, with the advent of inexpensive
and powerful hardware, tracking/surveillance systems,
human computer interfaces, and entertainment domains
have a heightened interest in understanding and
recognizing human movements. For example, monitoring
applications may wish to signal only when a person is
seen moving in a particular area (perhaps within a
dangerous or secure area), interface systems may require
the understanding of gesture as a means of input or
control, and entertainment applications may want to
analyze the actions of the person to better aid in the
immersion or reactivity of the experience.
1.相关介绍、工作
三年前，PC价格2500$,USB 摄像头300$,现在PC价格700$,USB 摄像头 50$,
(译注，在中国是PC价格4000块钱，50块钱就可以买到USB摄像头),所以用于实时的运动识别就热门了。
例如, 三年前，这个课题可以参考[[5][6][7][8][9][10][11][12][15] [17][24]]，
还有一些调查问卷关于动作识别[25]，人体运动捕获[22][23]，人体运动分析[11]。
特别说明的是，随着硬件设备的日益廉价与计算能力更强，跟踪监视系统，人机交互，娱乐领域都加大了对人体运动的研究力度，
例如，监视应用期望了解什么时候一个人移动到某个危险的位置区域，交互系统需要了解输入手势的意思，娱乐需要分析人的动作，来更好的协助沉浸体验感觉。

One possible motion representation is found by collecting
optical flow over the image or region of interest
throughout the sequence, but this is computationally
expensive and many times not robust. For example,
hierarchical [2] and/or robust estimation [4] is often
needed, and optical flow frequently signals unwanted
motion in regions such as loose and textured clothing.
Moreover, in the absence of some type of grouping,
optical flow happens frame to frame whereas human
gestures may span seconds. Despite these difficulties,
optical flow signals have been grouped into regional blobs
and used successfully for gesture recognition [9].
一个可用的方法是使用光流法跟踪感兴趣的点，但是在计算量大，而且鲁棒性差，
,经常需要分类与鲁棒性估计，而且光流法容易被宽大的衣服，纹理不一的衣服所干扰。
进一步来讲，在缺少分类的信息，光流使用在帧间人的手势有变化（需要几秒）的情况.
尽管这些限制，光流法还是成功用到了姿态识别的领域。

An alternative approach was proposed in [13] where
successive layering of image silhouettes is used to
represent patterns of motions. Every time a new frame
arrives, the existing silhouettes are decreased in value
subject to some threshold and the new silhouette (if any)
is overlaid at maximal brightness. This layered motion
image is termed a Motion History Image (MHI). MHI
representations have the advantage that a range of times
from frame to frame to several seconds may be encoded in
a single image. Thus MHIs span the time scales of human
gestures.
一个可替代的方法在[13]提到了，是利用连续的图像轮廓来表达运动的模式，在每个时间点上，得到一个新的帧，这已经存在的轮廓减去一些阈值，新的轮廓被赋予最大的亮度。这分层的轮廓被叫做运动历史图像Motion History Image (MHI)，MHI表达了帧到帧的时间特性，而将其集中在一个图像内，因此，MHI扩展了人体姿态的时间范围。

In this paper, we generalize the Motion History Image to
directly encode actual time in a floating point format
which we call the timed Motion History Image (tMHI).
We take Hu Moment shape descriptors [19] of the current
silhouette to recognize pose. A gradient of the tMHI is
used to determine normal optical flow (e.g. motion flow
orthogonal to object boundaries). The motion is then
segmented relative to object boundaries and the motion
orientation and magnitude of each region is obtained. The
processing flow is summarized in Figure 1 where numbers
indicate which section that processing step is described in.
The end result is recognized pose, and motion to that pose
-- a general “tool” for use in object motion analysis or
gesture recognition. Section 5 compares the
computational advantages of our approach with the optical
flow approaches such as used in [9]. We use our
approach in section 6 to recognize walking, waving and
clapping motions to control musical synthesis.
在这个文献里，我们直接在运动历史图像中加入了浮点格式的时间点，称作tMHI,我们采用HU 运动形状描述[19]来表示当前的轮廓与位置。
tMHI的梯度信息用来决定光流的法向,(例如，运动流与物体的边界是垂直的),那么运动会根据物体的边界分割，并且可以得到每个区域的运动方向与速度大小。整个处理过程在例图1给出了，其中文献的各个章节部分会按照图中每一个过程的标记数字来解释，最终的结果是识别的姿态，和对应这个姿态的动作。例图1的过程是一个在运动分析与姿态识别领域的通用方法。在第五节中对比了光流方法[9]与这种方法的的优劣。

在第六节中我们用它来识别上下挥舞手臂与举手拍掌的运动，控制一个音乐程序。

2. Pose and Motion Representation
姿态与动作的表示

2.1. Silhouettes and Pose Recognition
The algorithm as shown in Figure 1 depends on generating
silhouettes of the object of interest. Almost any silhouette
generation method can be used. Possible methods of
silhouette generation include stereo disparity or stereo
depth subtraction [3], infrared back-lighting [ 121, frame
differencing [13], color histogram back-projection [6],
texture blob segmentation, range imagery foreground
segmentation etc. We chose a simple background
subtraction method for the purposes of this paper as
described below.
轮廓与姿态识别
算法如图1所示，依赖于对产生的感兴趣的对象的轮廓，(译注，就是运动部分的轮廓），
几乎所有的轮廓算法都可以使用，例如可用的方法有立体视差或者立体深度提取[3]，红外线
提取[12]，帧间差分[13]，颜色背景[6]，纹理块分割，前景分割。本文中我们选用了简单的背景提取的方法。

2.1. 1. Silhouette Generation
Although there is recent work on more sophisticated
methods of background subtraction [[ 14][ 18][21]], we use
a simplistic method here. We label as foreground those
pixels that are a set number of standard deviations from
the mean RGB background. Then a pixel dilation and
region growing method is applied to remove noise and
extract the silhouette. A limitation of using silhouettes is
that no motion inside the body region can be seen. For
example, a silhouette generated from a camera facing a
person would not show the hands moving in front of the
body. One possibility to help overcome this problem is to
simultaneously use multiple camera views. Another
approach would be to separately segment flesh-colored
regions and overlay them when they cross the foreground
silhouette.
产生轮廓
尽管现在优许多复杂的背景提取的方法，我们使用了一个最简单的方法，
我们从均值RGB中分离一系列的象素来作为前景，然后使用膨胀和区域增长的方法来消除噪音，得到轮廓。
这种方法的限制是，在身体内部区域的运动会被忽略，例如一个面对摄像头的人在身体前方挥舞手臂，那么手臂的运动信息就会被身体遮挡，

一个可用的方法是使用多摄像头。另外一个方法是使用颜色分割区域，然后在背景轮廓上覆盖它们。

2.1.2. Mahalanobis Match to Hu Moments of Silhouette Pose

For recognition of silhouette pose, seven higher-order Hu
moments [19】 provide shape descriptors that are invariant
to translation and scale. Since these moments are of
different orders. we must use the Mahalanobis distance
metric [26] for matching based on a statistical measure of
closeness to training examples.
mahaZ(x) = (x-my K-’(x-m) (1)
where x is the moment feature vector, m is the mean of
the training moment vectors, and K-’ is the inverse
covariance matrix for the training vectors. The
discriminatory power of these moment features for the
silhouette poses is indicated by a short example. For this
example, the training set consisted of 5 people doing 5
repetitions of 3 gestural poses (“Y”, “T”, and “Left Arm”)
shown in Figure 2 done by each of five people. A sixth
person who had not practiced the gestures was brought in
to perform the gestures.

Table 1 shows typical results for pose discrimination. We
can see that even the confusable poses “Y” and “T” are
separated by more than an order of magnitude making it
easy to set thresholds to recognize test poses against
trained model poses.

An alternative approach to pose recognition uses gradient
histograms of the segmented silhouette region [5].

2.1.2. Mahalanobis 匹配HU运动轮廓姿态
为了识别轮廓姿态，7个更高HU运动提供了平移和放大不变的形状描述工具，因为这些运动有不同的顺序，我们必须使用Mahalanobis 距离方法

用来匹配，匹配是基于静态的测量训练样本。

公式中x是运动特征向量，m是训练运动向量的均值，K^1是训练向量集的协方差的逆矩阵，那么如何识别这些轮廓的运动特征由下面一个例子给

出，例如，训练集由5个人做五套同样的三个姿势，Y,T,1-,那么第六个人没有练习过这些姿势，而进来试着做这些姿势。

列表1显示了一些姿态识别的结果，我们可以看到就算是姿态 "Y" 与 "T",这两个容易混淆的姿态，识别结果相差不止一个数量级，这样可以容

易的设置一个阈值就可以把它们从训练样本中识别出来，

Table1 姿态识别的结果，可以看到正确模型间的距离远远小于不正确模型匹配的距离

另一个替代的方法是使用梯度柱图来分割轮廓区域【5】。

2.2. timed Motion History Images (tMHI)
In this paper, we use a floating point Motion History
Image [IO] where new silhouette values are copied in with
a floating point timestamp in the format:
seconds.milliseconds. This MHI representation is updated
as follows:

where r is the current time-stamp, and 6 is the maximum
time duration constant (typically a few seconds)
associated with the template. This method makes our
representation independent of system speed or frame rate
(within limits) so that a given gesture will cover the same
MHI area at different capture rates. We call this
representation the timed Motion History Image (tMHI).
Figure 3 shows a schematic representation for a person
doing an upward arm movement.

2.2时间性运动历史图像
在本篇中，我们使用了一个浮点性的运动历史图像，那么新的轮廓值会被加上一个浮点的时间戳，秒.毫秒,
MHI是由公式

其中r(译注，字符无法表达，实际见图上)是当前的时间标记，6 是最大的过期常量(一般是几秒),这种方法可以摆脱系统的速度或受限制的帧的速度，就算一个特定的手势在MHI记录时速度不一样，效果也相同的。我们称作为tMHI.图中显示了一个手臂上举动作。

2.3. Motion History Gradients
Notice in the right image in Figure 3 (tMHI) that if we
took the gradient of the tMHI, we would get direction
vectors pointing in the direction of the movement of the
arm. Note that these gradient vectors will point
orthogonal to the moving object boundaries at each “step”
in the tMHI giving us a normal optical flow representation
(see middle left image, Figure 4). Gradients of the tMHI
can be calculated efficiently by convolution with
separable Sobel filters in the X and Y directions yielding
the spatial derivatives: F,(x,y) and F, ( x , Y ) . Gradient
orientation at each pixel is then:

We must be careful, though, when calculating the gradient
information because it is only valid at locations within the
tMHI. The surrounding boundary of the tMHI should not
be used because non-silhouette (zero valued) pixels would
be included in the gradient calculation, thus corrupting the
result. Only tMHI interior silhouette pixels should be
examined. Additionally, we must not use gradients of
MHI pixels that have a contrast which is too low (inside a
silhouette) or too high (large temporal disparity) in their
local neighborhood. Figure 4 center, left shows raw tMHI
gradients. Applying the above criteria to the raw gradients
yeilds a masked region of valid gradients in Figure 4
center, right.
After calculating the motion gradients, we can then extract
motion features to varying scales. For instance, we can
generate a radial histogram of the motion orientations
which then can be used directly for recognition as done in
[lo]. But, an even simpler measure is to find the global
motion orientation as discussed next.

注意在图中留意tMHI的梯度，我们会得到手臂运动的方向，
注意这些梯度矢量在梅一个边缘是与边界成直角的，在MHI中，假定我们得到一个法向量。
tMHI的梯度可以使用分离的sobel算子在x,y方向空间上有效的计算出来，每个象素的梯度方向由如下公式：

我们必须小心，因为当计算梯度信息时候只有位置信息是有效的，
那些非轮廓的边界在计算时不应该被包括进来，而且它们影响了结果。
只有外部边界的的轮廓才应该被计算。另外，我们在背景反差太低（轮廓内部），太高（大的瞬间差）的情况下也不能使用梯度信息。
应用以上的规则，一个带有原始梯度信息标注区域就产生了。
在计算完运动的梯度后，我们就可以区分不同的运动特征了。例如，我们用一个雷达图来表示运动方向，它可以用来识别做过什么。
另外一个更简单的方法用来找全局的方向，下面我们会继续讨论这个。

3. Global Gradient Orientation
Calculation of the global orientation should be weighted
by normalized tMHI values to give more influence to the
most current motion within the template. A simple
calculation for the global weighted orientation is as
follows:

3. 全局梯度方向
全局梯度方向的计算时，为了突出当下这个时刻的运动，应该考虑到tMHI 的法向值的加权，一个简单的计算公式如下，

where (b is the global motion orientation, (bre+ is the base
reference angle (peaked value in the histogram of
orientations), (b( x , y ) is the motion orientation map found
from gradient convolutions, ?zo~wz(Z, 6, thfHI,(x, y ) )
is a normalized tMHI value (linearly normalizing the
tMHI from 0-1 using the current time-stamp Z and
duration 6), and angoifS((b(x,y),(brej ) is the
minimum, signed angular difference of an orientation
from the reference angle. A histogram-based reference
angle ( (bre, ) is required due to problems associated with
averaging circular distance measurements. Figure 4 shows
from left to right a tMHI, the raw gradients, the masked
region of valid gradients and finally the orientation
histogram with global direction vector calculated.

其中 0 (译注，字符无法表达，实际见图上) 是全局方向，0ref 是基查考角度（在方向柱图的最高值），
0(x,y)是从梯度卷积中运动方向映射，norm(r,0,tMHI(x,y))是MHI的法向值，
angDIFF()是一个减去参照角度最小的带正负的角度差
因为要得到平均园距离方法（？？？），所以也需要得到基于柱图的参照角度。
图4显示了原始梯度信息，标记合法后的梯度信息，最终的全局计算的柱图方向。

4. Motion Segmentation
Any segmentation schemc begs the question as to what is
being segmented. Segincntation hy collecting “hlohs” of
similar direction motion collected frame to l‘ramc lrorn
optical flow as done in 191 doesn’t guarantcc that thc
motion corresponds to thc actual inovclncnt of objects in a
scene. Wc want to group motion regions that wcrc
produced hy the inovemcnt of parts or thc whole 01‘ thc
ohjcct of intercst. A iiovel modification to thc tMHI
gratlicnt algorithm has an advantagc i n this rcgard - by
labeling motion regions connectcd to thc currcnt
silhoucttc using a downward stepping floodfill, wc can
identify areas of motion directly attached to parts oi the
object of interest.
4。运动分割
任何运动分割都涉及到要分割什么的问题，【9】提供了
通过从帧间运动收集相同运动方向块的光流的方法，但是它不能保证动作本身是否一致。
我们想按照运动的部分，或者感兴趣的整个物体，来分组运动区域。
tMHI的优点就是使用轮廓逐渐填充的方法，这样可以利用这个特点来标记运动区域。
我们可以由物体感兴趣的各个部分直接分辨出运动区域。

4.1. Motion Attached to Object
By construction, the most recent silhouette has the
maximal values (e.g. most recent timestamp) in thc tMHI.
We scan the image until we find this value, then “walk”
along the most recent silhouette’s contour to find attached
areas of motion. Below, let dT be a time difference
threshold, for example, the time difference between each
video frame. The algorithm for creating masks to segment
motion regions is as follows (with reference to Figure 6):
Scan the tMHI until we find a pixel of the current
timestamp. This is a boundary pixel of the most
recent silhouette (Figure 6b).
“Walk” around the boundary of the current silhouette
region looking outside for recent (within dT)
unmarked motion history “steps”. When a suitable
step is found, mark it with a downward floodfill
(Figure 6b). If the size of the fill isn’t big enough,
zero out the area.
Store the segmented motion masks that were found.
Continue the boundary “walk’ until thc silhouette has
been circumnavigated.

In the algorithm above, “downfill” refers to flodfills that
will fill (replace with a labeled value) pixels with thc same
value, OR pixels of a value onc stcp (within dT) Iowcr
than thc current pixcl being fillctl. Thc seginentation
algorithm then relics on 2 paramctcrs: ( I ) ‘Ihc maximum
nllowablc downward stcp distancc dT (e.g. how far hack
in time can a past motion be considcred to hc connectcd to
the current silhouette); (2) The minimum acceptable s i x
of the downward flood fill (clsc iero it out hecausc thc
region is too small -- a motion “noise” region).
The algorithm above produces segmentation inasks that
are used to select portions of the valid motion history
gradicnl described in Section 2.3. These segmentcd
regions may then be labclcd with thcir weighted regional
orientation as described in Section 3. Since thesc
segmentation masks derive directly lrom past motion that
“spilled” from the current silhouette boundary ol’ the
object, the motion regions are directly connected to the
object itself. We give scgincntation examples in the
section below.

4。1给物体找到运动
在构造中，时间上最近的轮廓由最大的值，我们扫描图像直到找到这个值，
然后呢，沿着轮廓的边缘找到运动的区域。下面假设dT是时间差阈值，例如，
帧间的时间差。这个算法创建掩码（？？？）来分割运动的，见图6。
1。扫描tMHI直到找到当前的时间戳。图6b显示了最近的轮廓象素。
2。沿着当前轮廓向外找没有标记的运动图像，当轮廓被找到后，用前下填充标记它，如果填充区域不够大，扩充。
3。存储找到的分割掩码
4。继续直到循环完所有的轮廓

在上述的算法中，“downfill”是指用同样的值填充，或者每一步的值低于当前值，分割算法依赖与两个参数，
1。最大的可允许下降的距离（当前的动作与过去的动作之间的时间有多远？）
2。最小可接受填充面积，（因为太小会被误认为噪音，需要放大）

以上描述的算法用来选择2。3章节合法运动梯度的各个部分，这些分割的区域会被加上方向权值（见章节3）
既然运动掩码可以直接从过去的轮廓到现在轮廓"溢出"来，那么我们就可以从对象本身得到运动区域。下面给出例子。

4.2. Motion Segmentation Examples
Figure 8 shows a hand opening and closing in front of a
camera. Note that the small arrows correctly catch the
finger motion while the global motion is ambiguous.

图8显示手张开与关闭，注意小箭头为手指的方向，这时全局方向并不确定

Figure 9 shows a kicking motion from left to right. At left,
hands had just been brought down as indicated by the
large global motion arrow. The small segmentation arrow
is already catching the leftward lean of the body at right.
In the center, left image the left leg lean and right leg
motion are detected, At center right, the left hand motion
and right leg are indicated. At right, the downward leg
motion and rightward lean of the body are found.

图8显示了一个踢腿的动作，左边手部位置的降低指出全局的方向，小箭头捕捉到身体右倾，
中间的图左右腿的动作也捕获到了，在右图腿部下降与身体右倾也检测到了。

Figure IO shows segmented motion and recognized pose
for lifting the arms into a “T” position and then dropping
the arms back down. The large arrow indicates global
motion over a few seconds, the smaller arrows show
segmented motion as long as the corresponding silhouette
region moved less than 0.2 seconds ago.

图10显示了分割运动与姿态，抬起手臂到T位置，然后放下，
大箭头会在几秒后指出全局的方向，小箭头会在轮廓移动的0.2秒内显示

Reference

Aggarwal, J. and Q. Cai. Human motion analysis: a review.
IEEE Wkshp. Nonrigid and Articulated Motion Workshop,
Pages 90- 102, 1997.
Bergen, J., Anandan, P., Hanna, K., and R. Hingorani.
Hierarchical model-based motion estimation. In Proc.
European Conf. on Comp. Vis., pages 237-252, 1992.

Beymer, D. and Konolige K. Real-Time Tracking of
Multiple People Using Stereo, IEEE FRAME-RATE
Workshop, http://www.eecs.lehigh.edu/FRAME/. 1999.

Black, M. and P. Anandan. A frame-work for robust
estimation of optical flow. In Proc. Int. Conf. Comp. Vis.,
pages 23 1-236, 1993.

Bradski, G., Yeo, B-L. and M. Yeung. Gesture for video
content navigation. In SPIE’99, 3656-24 S6, 1999.

Bradski, G. Computer Vision Face Tracking For Use in a
Perceptual User Interface. In Intel Technology Joumal,
htto://dcvelo~er.intcl.com/technolo~v/iti/~? 1998/articlcs/ar
t 2.htm, Q2 1998.

Bregler, C. Leaming and recognizing human dynamics in
video sequences. In Proc. Comp. Vis. And Pattem Rec.,
pages 568-574, June 1997.

Cham, T. and J. Rehg. A multiple hypothesis approach to
figure tracking. In Proc. Perceptual User Interfaces, pages
19-24, November 1998.

Cuttler, R. and M. Turk. View-based interpretation of realtime
optical flow for gesture recognition. Int. Conf. On
Automatic Face and Gesture Recognition, page 416-421,
1998.

[21] Martins, F., Nickerson, B., Bostrom, V. and Hazra, R.
Implementation of a Real-time Foreground/Background
Segmentation System on the Intel Architecture, IEEE
FRAME-RATE Workshop,
http://www.eecs.lehigh.edu/FRAMEl. 1999.

[22] Moeslund, T. Summaries of 107 computer vision-based
human motion capture papers. University of Aalborg
Technical Report LIA 99-01, March 1999.

[23] Moeslund, T. Computer vision-based human motion
capture - a survey. University of Aalborg Technical Report
LIA 99-02, March 1999.

[24] Pinhanez, C. Representation and recognition of action in
interactive spaces. MIT Media Lab Ph.D. Thesis, June
1999.

[25] Shah, M. and R. Jain. Motion-Based Recognition. Kluwer
Academic, 1997.

Therrien, C. Decision Estimation and Classification. John
Wiley and Sons, Inc., 1989.

[27] Assembly optimized performance libraries in Image
Processing, Signal Processing, JPEG, Pattern Recognition
and Matrix math can be downloaded from at
htte://developer.intel.com/vtune/perfli bst/

[IO] Davis, J. Recognizing movement using motion histograms.
MIT Media lab Technical Report #487, March 1999.

[l 11 Davis, J. and A. Bobick. Virtual PAT: a virtual personal
aerobics trainer. In Proc. Perceptual User Interfaces, pages
13- 18, November 1998.

[I21 Davis, J. and A. Bobick. A robust human-silhouette
extraction technique for interactive virtual environments. In
Proc. Modelling and Motion capture Techniques for
Virtual Environments, pages 12-25, 1998.

[I31 Davis, J. and A. Bobick. The representation and
recognition of human movement using temporal templates.
In Proc. Comp. Vis. and Pattem Rec., pages 928-934, June
1997

[14] Elgammal, A., Harwood, D. and Davis L. Non-parametric
Model for Background Subtraction, IEEE FRAME-RATE
Workshop, http://www.eecs.lehigh.edu/FRAME/. 1999.

[ 151 Freeman, W., Anderson, D., Beardsley, P., et al. Computer
vision for interactive computer graphics. IEEE Computer
Graphics and Applications, Vol. 18, Num 3, pages 42-53,
May-June 1998.

[I61 Gavrila, D. The visual analysis of human movement: a
survey. Computer Vision and Image Understanding, Vol.
73, Num 1, pages 82-98, January, 1999

[I71 Haritaoglu, I. Harwood, D., and L. Davis. W4S: A realtime
system for detecting and tracking people in 2 % D.
European. Conf. On Comp. Vis., pages 877-892, 1998.

[IS] Horprasert T., David Harwood, D. and Davis, L. A
Statistical Approach for Real-time Robust Background
Subtraction and Shadow Detection, IEEE FRAME-RATE
Workshop, http://www.eecs.lehigh.edu/FRAME/. 1999.

[I91 Hu, M. Visual pattem recognition by moment invariants.
IRE Trans. Information Theory, Vol. IT-8, Num. 2, 1962.

[20] Krueger, M. Artificial reality 11, Addison-Wesley, 1991.