ABSTRACT
We introduce a discriminative hidden-state approach for the recognition of human gestures.Gesture sequences often have a complex underlying structure,and models that can incorporate hidden structures have proven to be advantageous for recognition tasks.Most existing approaches to gesture recognition with hidden states employ a Hidden Markov Model or suitable variant to model gesture streams;a significant limitation of these models if the requirement of conditional independence of observations.In addition,hidden states in a generative model are selected to maximize the likelihood of generating all the examples of a given gesture class,which is not necessarily optimal for discriminating the gesture class against other gestures.Previous discriminative approaches to gesture sequence recognition have shown promising results,but have not incorporated hidden states nor addressed the problem of predicting the label of an entire sequence.In this paper,we derive a discriminative sequence model with a hidden state structure,and demonstrate its utility both in a detection and in a multi-way classification formulation.We evaluate our method on the task of recognizing human arm and head gestures, and compare the performance of our method to both generative hidden state and discriminative fully-observable models.
generative models
However, these generative models assume that observations are conditionally independent.This restriction makes it difficult or impossible to accommodate long-range dependencies among observations or multiple overlapping features of the observations.
CRF
Conditional random fields (CRF) use an exponential distribution to model the entire sequence given the observation sequence [10, 9, 21]. This avoids the independence assumption between observations, and allows non-local dependencies between state and observations. A Markov assumption may still be enforced in the state sequeence, allowing inference to be performed efficiently using dynamic programming. CRFs assign a label for each observation (e.g., each time point in a sequence), and they neither capture hidden states nor directly provide a way to estimate the conditional
probability of a class label for an entire sequence.
HCRF
a gesture class detector, where a single class is discriminatively trained against all other gestures;
or as a multi-way gesture classifier, where discriminative models for multiple gestures are simultaneously trained.
The latter approach has the potential to share useful hidden state structures across the different classification tasks, allowing higher recognition rates.
The advantages of MEMMs are that they can model arbitrary features of observation sequences and can therefore accommodate overlapping features.
If we assume that s is observed and that there is a single class labely then the conditional probability ofsgivenxbecomes a regular CRF.
In this work, we modify the original HCRF approach to model sequences where the underlying graphical model captures temporal dependencies across frames, and to incorporate long range dependencies.
An HCRF can learn a discriminative state distribution and can be easily extended to incorporate long range dependencies.
2、HCRF的隐状态与label之间有什么关系?