Conditional Random Fields(CRF)

最新推荐文章于 2024-02-22 15:50:55 发布

jackfirst86

最新推荐文章于 2024-02-22 15:50:55 发布

阅读量2.4k

点赞数

分类专栏：中文分词和搜索引擎文章标签： random classification features dependencies methods processing

中文分词和搜索引擎专栏收录该内容

16 篇文章 0 订阅

订阅专栏

原文：http://www.inference.phy.cam.ac.uk/hmw26/crf/

写的不错，有空翻译过来。

This page contains material on, or relating to, conditional randomfields. I shall continue to update this page as research onconditional random fields advances, so do check back periodically. Ifyou feel there is something that should be on here but isn't, thenplease email me (hmw26 -at- srcf.ucam.org) and let me know.

introduction

Conditional random fields (CRFs) are a probabilistic framework forlabeling and segmenting structured data, such as sequences, trees andlattices. The underlying idea is that of defining a conditionalprobability distribution over label sequences given a particularobservation sequence, rather than a joint distribution over both labeland observation sequences. The primary advantage of CRFs over hiddenMarkov models is their conditional nature, resulting in the relaxationof the independence assumptions required by HMMs in order to ensuretractable inference. Additionally, CRFs avoid the label bias problem,a weakness exhibited by maximum entropy Markov models (MEMMs) andother conditional Markov models based on directed graphicalmodels. CRFs outperform both MEMMs and HMMs on a number of real-worldtasks in many fields, including bioinformatics, computationallinguistics and speech recognition.

tutorial

Hanna M. Wallach. ConditionalRandom Fields: An Introduction. Technical ReportMS-CIS-04-21. Department of Computer and Information Science,University of Pennsylvania, 2004.

papers by year

2001

John Lafferty, Andrew McCallum, Fernando Pereira. ConditionalRandom Fields: Probabilistic Models for Segmenting and LabelingSequence Data. In Proceedings of the Eighteenth InternationalConference on Machine Learning (ICML-2001), 2001.

We present conditional random fields, a framework for buildingprobabilistic models to segment and label sequence data. Conditionalrandom fields offer several advantages over hidden Markov models andstochastic grammars for such tasks, including the ability to relaxstrong independence assumptions made in those models. Conditionalrandom fields also avoid a fundamental limitation of maximum entropyMarkov models (MEMMs) and other discriminative Markov models based ondirected graphical models, which can be biased towards states with fewsuccessor states. We present iterative parameter estimation algorithmsfor conditional random fields and compare the performance of theresulting models to HMMs and MEMMs on synthetic and natural-languagedata.

2002

Hanna Wallach. EfficientTraining of Conditional Random Fields. M.Sc. thesis, Division ofInformatics, University of Edinburgh, 2002.

This thesis explores a number of parameter estimation techniques forconditional random fields, a recently introduced probabilistic modelfor labelling and segmenting sequential data. Theoretical andpractical disadvantages of the training techniques reported in currentliterature on CRFs are discussed. We hypothesise that generalnumerical optimisation techniques result in improved performance overiterative scaling algorithms for training CRFs. Experiments run on asubset of a well-known text chunking data set confirm that this isindeed the case. This is a highly promising result, indicating thatsuch parameter estimation techniques make CRFs a practical andefficient choice for labelling sequential data, as well as atheoretically sound and principled probabilistic framework.

Thomas G. Dietterich. MachineLearning for Sequential Data: A Review. In Structural,Syntactic, and Statistical Pattern Recognition; Lecture Notes inComputer Science, Vol. 2396, T. Caelli (Ed.), pp. 15–30,Springer-Verlag, 2002.

Statistical learning problems in many fields involve sequentialdata. This paper formalizes the principal learning tasks and describesthe methods that have been developed within the machine learningresearch community for addressing these problems. These methodsinclude sliding window methods, recurrent sliding windows, hiddenMarkov models, conditional random fields, and graph transformernetworks. The paper also discusses some open research issues.

2003

Fei Sha and Fernando Pereira. ShallowParsing with Conditional Random Fields. In Proceedings of the2003 Human Language Technology Conference and North American Chapterof the Association for Computational Linguistics (HLT/NAACL-03),2003.

Conditional random fields for sequence labeling offer advantages overboth generative models like HMMs and classifers applied at eachsequence position. Among sequence labeling tasks in languageprocessing, shallow parsing has received much attention, with thedevelopment of standard evaluation datasets and extensive comparisonamong methods. We show here how to train a conditional random field toachieve performance as good as any reported base noun-phrase chunkingmethod on the CoNLL task, and better than any reported singlemodel. Improved training methods based on modern optimizationalgorithms were critical in achieving these results. We presentextensive comparisons between models and training methods that confirmand strengthen previous results on shallow parsing and trainingmethods for maximum-entropy models.

Andrew McCallum. EfficientlyInducing Features of Conditional Random Fields. In Proceedingsof the 19th Conference in Uncertainty in Articifical Intelligence(UAI-2003), 2003.

Conditional Random Fields (CRFs) are undirected graphical models, aspecial case of which correspond to conditionally-trained finite statemachines. A key advantage of CRFs is their great flexibility toinclude a wide variety of arbitrary, non-independent features of theinput. Faced with this freedom, however, an important questionremains: what features should be used? This paper presents anefficient feature induction method for CRFs. The method is founded onthe principle of iteratively constructing feature conjunctions thatwould significantly increase conditional log-likelihood if added tothe model. Automated feature induction enables not only improvedaccuracy and dramatic reduction in parameter count, but also the useof larger cliques, and more freedom to liberally hypothesize atomicinput variables that may be relevant to a task. The method applies tolinear-chain CRFs, as well as to more arbitrary CRF structures, suchas Relational Markov Networks, where it corresponds to learning cliquetemplates, and can also be understood as supervised structurelearning. Experimental results on named entity extraction and nounphrase segmentation tasks are presented.

David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft. TableExtraction Using Conditional Random Fields. In Proceedings ofthe 26th Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR 2003), 2003.

The ability to find tables and extract information from them is anecessary component of data mining, question answering, and otherinformation retrieval tasks. Documents often contain tables in orderto communicate densely packed, multi-dimensional information. Tablesdo this by employing layout patterns to e ciently indicate fields andrecords in two-dimensional form. Their rich combination of formattingand content present di culties for traditional language modelingtechniques, however. This paper presents the use of conditional randomfields (CRFs) for table extraction, and compares them with hiddenMarkov models (HMMs). Unlike HMMs, CRFs support the use of many richand overlapping layout and language features, and as a result, theyperform significantly better. We show experimental results onplain-text government statistical reports in which tables are locatedwith 92% F1, and their constituent lines are classified into 12table-related categories with 94% accuracy. We also discuss futurework on undirected graphical models for segmenting columns, findingcells, and classifying them as data cells or label cells.

Andrew McCallum and Wei Li. Early Resultsfor Named Entity Recognition with Conditional Random Fields, FeatureInduction and Web-Enhanced Lexicons. In Proceedings of theSeventh Conference on Natural Language Learning (CoNLL), 2003.

Wei Li and Andrew McCallum. RapidDevelopment of Hindi Named Entity Recognition Using Conditional RandomFields and Feature Induction. In ACM Transactions on AsianLanguage Information Processing (TALIP), 2003.

This paper describes our application of conditional random fields withfeature induction to a Hindi named entity recognition task. With onlyfive days development time and little knowledge of this language, weautomatically discover relevant features by providing a large array oflexical tests and using feature induction to automatically constructthe features that most increase conditional likelihood. In an effortto reduce overfitting, we use a combination of a Gaussian prior andearly stopping based on the results of 10-fold cross validation.

Yasemin Altun and Thomas Hofmann. LargeMargin Methods for Label Sequence Learning. In Proceedings of8th European Conference on Speech Communication and Technology(EuroSpeech), 2003.

Label sequence learning is the problem of inferring a state sequencefrom an observation sequence, where the state sequence may encode alabeling, annotation or segmentation of the sequence. In this paper wegive an overview of discriminative methods developed for thisproblem. Special emphasis is put on large margin methods bygeneralizing multiclass Support Vector Machines and AdaBoost to thecase of label sequences.An experimental evaluation demonstrates theadvantages over classical approaches like Hidden Markov Models and thecompetitiveness with methods like Conditional Random Fields.

Simon Lacoste-Julien. CombiningSVM with graphical models for supervised classification: anintroduction to Max-Margin Markov Networks. CS281A Project Report,UC Berkeley, 2003.

The goal of this paper is to present a survey of the concepts neededto understand the novel Max-Margin Markov Networks (M ³-net)framework, a new formalism invented by Taskar, Guestrin and Kollerwhich combines both the advantages of the graphical models and theSupport Vector Machines (SVMs) to solve the problem of multi-labelmulti-class supervised classification. We will compare generativemodels, discriminative graphical models and SVMs for this task,introducing the basic concepts at the same time, leading at the end toa presentation of the M ³-net paper.

2004

Andrew McCallum, Khashayar Rohanimanesh and Charles Sutton. DynamicConditional Random Fields for Jointly Labeling Multiple Sequences.Workshop on Syntax, Semantics, Statistics; 16th Annual Conference onNeural Information Processing Systems (NIPS 2003), 2004.

Conditional random fields (CRFs) for sequence modeling have severaladvantages over joint models such as HMMs, including the ability torelax strong independence assumptions made in those models, and theability to incorporate arbitrary overlapping features. Previous workhas focused on linear-chain CRFs, which correspond to finite-statemachines, and have efficient exact inference algorithms. Often,however, we wish to label sequence data in multiple interactingways—for example, performing part-of-speech tagging and nounphrase segmentation simultaneously, increasing joint accuracy bysharing information between them. We present dynamic conditionalrandom fields (DCRFs), which are CRFs in which each time slice has aset of state variables and edges—a distributed staterepresentation as in dynamic Bayesian networks—and parametersare tied across slices. (They could also be calledconditionally-trained Dynamic Markov Networks.) Since exact inferencecan be intractable in these models, we perform approximate inferenceusing the tree-based reparameterization framework (TRP). We alsopresent empirical results comparing DCRFs with linear-chain CRFs onnatural-language data.

Kevin Murphy, Antonio Torralba and William T.F. Freeman. Using the forestto see the trees: a graphical model relating features, objects andscenes. In Advances in Neural Information Processing Systems16 (NIPS 2003), 2004.

Standard approaches to object detection focus on local patches of theimage, and try to classify them as background or not. We propose touse the scene context (image as a whole) as an extra sourceof (global) information, to help resolve local ambiguities. We presenta conditional random field for jointly solving the tasks of objectdetection and scene classification.

Sanjiv Kumar and Martial Hebert. DiscriminativeFields for Modeling Spatial Dependencies in Natural Images. InAdvances in Neural Information Processing Systems 16 (NIPS2003), 2004.

In this paper we present Discriminative Random Fields (DRF), adiscriminative framework for the classification of natural imageregions by incorporating neighborhood spatial dependencies in thelabels as well as the observed data. The proposed model exploits localdiscriminative models and allows to relax the assumption ofconditional independence of the observed data given the labels,commonly used in the Markov Random Field (MRF) framework. Theparameters of the DRF model are learned using penalized maximumpseudo-likelihood method. Furthermore, the form of the DRF modelallows the MAP inference for binary classification problems using thegraph min-cut algorithms. The performance of the model was verified onthe synthetic as well as the real-world images. The DRF modeloutperforms the MRF model in the experiments.

Ben Taskar, Carlos Guestrin and Daphne Koller. Max-MarginMarkov Networks. In Advances in Neural Information ProcessingSystems 16 (NIPS 2003), 2004.

In typical classification tasks, we seek a function which assigns alabel to a single object. Kernel-based approaches, such as supportvector machines (SVMs), which maximize the margin of confidence of theclassifier, are the method of choice for many such tasks. Theirpopularity stems both from the ability to use high-dimensional featurespaces, and from their strong theoretical guarantees. However, manyreal-world tasks involve sequential, spatial, or structured data,where multiple labels must be assigned. Existing kernel-based methodsignore structure in the problem, assigning labels independently toeach object, losing much useful information. Conversely, probabilisticgraphical models, such as Markov networks, can represent correlationsbetween labels, by exploiting problem structure, but cannot handlehigh-dimensional feature spaces, and lack strong theoreticalgeneralization guarantees. In this paper, we present a new frameworkthat combines the advantages of both approaches: Maximum margin Markov(M ³) networks incorporate both kernels, which efficientlydeal with high-dimensional features, and the ability to capturecorrelations in structured data. We present an efficient algorithm forlearning M ³ networks based on a compact quadratic programformulation. We provide a new theoretical bound for generalization instructured domains. Experiments on the task of handwritten characterrecognition and collective hypertext classification demonstrate verysignificant gains over previous approaches.

Burr Settles. BiomedicalNamed Entity Recognition Using Conditional Random Fields and RichFeature Sets. To appear in Proceedings of the InternationalJoint Workshop on Natural Language Processing in Biomedicine and itsApplications (NLPBA), 2004.

A demo of the system can be downloaded here.

As the wealth of biomedical knowledge in the form of literatureincreases, there is a rising need for effective natural languageprocessing tools to assist in organizing, curating, and retrievingthis information. To that end, named entity recognition (the task ofidentifying words and phrases in free text that belong to certainclasses of interest) is an important first step for many of theselarger information management goals. In recent years, much attentionhas been focused on the problem of recognizing gene and proteinmentions in biomedical abstracts. This paper presents a framework forsimultaneously recognizing occurrences of PROTEIN, DNA, RNA,CELL-LINE, and CELL-TYPE entity classes using Conditional RandomFields with a variety of traditional and novel features. I show thatthis approach can achieve an overall F measure around 70, which seemsto be the current state of the art.

Charles Sutton, Khashayar Rohanimanesh and Andrew McCallum. DynamicConditional Random Fields: Factorized Probabilistic Models forLabeling and Segmenting Sequence Data. In Proceedings of theTwenty-First International Conference on Machine Learning (ICML2004), 2004.

In sequence modeling, we often wish to represent complex interactionbetween labels, such as when performing multiple, cascaded labelingtasks on the same sequence, or when long-range dependencies exist. Wepresent dynamic conditional random fields (DCRFs), a generalization oflinear-chain conditional random fields (CRFs) in which each time slicecontains a set of state variables and edges—a distributed staterepresentation as in dynamic Bayesian networks (DBNs)—andparameters are tied across slices. Since exact inference can beintractable in such models, we perform approximate inference usingseveral schedules for belief propagation, including tree-basedreparameterization (TRP). On a natural-language chunking task, we showthat a DCRF performs better than a series of linear-chain CRFs,achieving comparable performance using only half the training data.

John Lafferty, Xiaojin Zhu and Yan Liu. Kernelconditional random fields: representation and clique selection. InProceedings of the Twenty-First International Conference on MachineLearning (ICML 2004), 2004.

Kernel conditional random fields (KCRFs) are introduced as a frameworkfor discriminative modeling of graph-structured data. A representertheorem for conditional graphical models is given which shows howkernel conditional random fields arise from risk minimizationprocedures defined using Mercer kernels on labeled graphs. A procedurefor greedily selecting cliques in the dual representation is thenproposed, which allows sparse representations. By incorporatingkernels and implicit feature spaces into conditional graphical models,the framework enables semi-supervised learning algorithms forstructured data through the use of graph kernels. The framework andclique selection methods are demonstrated in synthetic dataexperiments, and are also applied to the problem of protein secondarystructure prediction.

Xuming He, Richard Zemel, and MiguelÁ. Carreira-Perpiñán. Multiscaleconditional random fields for image labelling. In Proceedingsof the 2004 IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR 2004), 2004.

We propose an approach to include contextual features for labelingimages, in which each pixel is assigned to one of a finite set oflabels. The features are incorporated into a probabilistic frameworkwhich combines the outputs of several components. Components differ inthe information they encode. Some focus on the image-label mapping,while others focus solely on patterns within the labelfield. Components also differ in their scale, as some focus onfine-resolution patterns while others on coarser, more globalstructure. A supervised version of the contrastive divergencealgorithm is applied to learn these features from labeled imagedata. We demonstrate performance on two real-world image databases andcompare it to a classifier and a Markov random field.

Yasemin Altun, Alex J. Smola, Thomas Hofmann. ExponentialFamilies for Conditional Random Fields. In Proceedings of the20th Conference on Uncertainty in Artificial Intelligence(UAI-2004), 2004.

In this paper we define conditional random fields in reproducingkernel Hilbert spaces and show connections to Gaussian Processclassification. More specifically, we prove decomposition results forundirected graphical models and we give constructions forkernels. Finally we present efficient means of solving theoptimization problem using reduced rank decompositions and we show howstationarity can be exploited efficiently in the optimization process.

Michelle L. Gregory and Yasemin Altun. UsingConditional Random Fields to Predict Pitch Accents in ConversationalSpeech. In Proceedings of the 42^nd Annual Meeting ofthe Association for Computational Linguistics (ACL 2004),2004.

The detection of prosodic characteristics is an important aspect ofboth speech synthesis and speech recognition. Correct placement ofpitch accents aids in more natural sounding speech, while automaticdetection of accents can contribute to better word-level recognitionand better textual understanding. In this paper we investigateprobabilistic, contextual, and phonological factors that influencepitch accent placement in natural, conversational speech in a sequencelabeling setting. We introduce Conditional Random Fields (CRFs) topitch accent prediction task in order to incorporate these factorsefficiently in a sequence model. We demonstrate the usefulness and theincremental effect of these factors in a sequence model by performingexperiments on hand labeled data from the Switchboard Corpus. Ourmodel outperforms the baseline and previous models of pitch accentprediction on the Switchboard Corpus.

Brian Roark, Murat Saraclar, Michael Collins and Mark Johnson. DiscriminativeLanguage Modeling with Conditional Random Fields and the PerceptronAlgorithm. In Proceedings of the 42^nd Annual Meetingof the Association for Computational Linguistics (ACL 2004),2004.

This paper describes discriminative language modeling for a largevocabulary speech recognition task. We contrast two parameterestimation methods: the perceptron algorithm, and a method based onconditional random fields (CRFs). The models are encoded asdeterministic weighted finite state automata, and are applied byintersecting the automata with word-lattices that are the output froma baseline recognizer. The perceptron algorithm has the benefit ofautomatically selecting a relatively small feature set in just acouple of passes over the training data. However, using the featureset output from the perceptron algorithm (initialized with theirweights), CRF training provides an additional 0.5% reduction in worderror rate, for a total 1.8% absolute reduction from the baseline of39.2%.

Ryan McDonald and Fernando Pereira. IdentifyingGene and Protein Mentions in Text Using Conditional Random Fields.BioCreative, 2004.

Trausti T. Kristjansson, Aron Culotta, Paul Viola and AndrewMcCallum. InteractiveInformation Extraction with Constrained Conditional Random Fields.In Proceedings of the Nineteenth National Conference on ArtificialIntelligence (AAAI 2004), 2004.

Information Extraction methods can be used to automatically "fill-in"database forms from unstructured data such as Web documents oremail. State-of-the-art methods have achieved low error rates butinvariably make a number of errors. The goal of an interactiveinformation extraction system is to assist the user in filling indatabase fields while giving the user confidence in the integrity ofthe data. The user is presented with an interactive interface thatallows both the rapid verification of automatic field assignments andthe correction of errors. In cases where there are multiple errors,our system takes into account user corrections, and immediatelypropagates these constraints such that other fields are oftencorrected automatically. Linear-chain conditional random fields (CRFs) have been shown toperform well for information extraction and other language modellingtasks due to their ability to capture arbitrary, overlapping featuresof the input in a Markov model. We apply this framework with twoextensions: a constrained Viterbi decoding which finds the optimalfield assignments consistent with the fields explicitly specified orcorrected by the user; and a mechanism for estimating the confidenceof each extracted field, so that low-confidence extractions can behighlighted. Both of these mechanisms are incorporated in a novel userinterface for form filling that is intuitive and speeds the entry ofdata—providing a 23% reduction in error due to automatedcorrections.

Thomas G. Dietterich, Adam Ashenfelter and Yaroslav Bulatov. TrainingConditional Random Fields via Gradient Tree Boosting. InProceedings of the Twenty-First International Conference on MachineLearning (ICML 2004), 2004.

Conditional Random Fields (CRFs; Lafferty, McCallum, & Pereira, 2001)provide a flexible and powerful model for learning to assign labels toelements of sequences in such applications as part-of-speech tagging,text-to-speech mapping, protein and DNA sequence analysis, andinformation extraction from web pages. However, existing learningalgorithms are slow, particularly in problems with large numbers ofpotential input features. This paper describes a new method fortraining CRFs by applying Friedman's (1999) gradient tree boostingmethod. In tree boosting, the CRF potential functions are representedas weighted sums of regression trees. Regression trees are learned bystage-wise optimizations similar to Adaboost, but with the objectiveof maximizing the conditional likelihood P(Y|X) of the CRFmodel. By growing regression trees, interactions among features areintroduced only as needed, so although the parameter space ispotentially immense, the search algorithm does not explicitly considerthe large space. As a result, gradient tree boosting scales linearlyin the order of the Markov model and in the order of the featureinteractions, rather than exponentially like previous algorithms basedon iterative scaling and gradient descent.

John Lafferty, Yan Liu and Xiaojin Zhu. KernelConditional Random Fields: Representation, Clique Selection, andSemi-Supervised Learning. Technical Report CMU-CS-04-115, CarnegieMellon University, 2004.

Kernel conditional random fields are introduced as a framework fordiscriminative modeling of graph-structured data. A representertheorem for conditional graphical models is given which shows howkernel conditional random fields arise from risk minimizationprocedures defined using Mercer kernels on labeled graphs. A procedurefor greedily selecting cliques in the dual representation is thenproposed, which allows sparse representations. By incorporatingkernels and implicit feature spaces into conditional graphical models,the framework enables semi-supervised learning algorithms forstructured data through the use of graph kernels. The clique selectionand semi-supervised methods are demonstrated in synthetic dataexperiments, and are also applied to the problem of protein secondarystructure prediction.

Fuchun Peng and Andrew McCallum (2004). AccurateInformation Extraction from Research Papers using Conditional RandomFields. In Proceedings of Human Language Technology Conferenceand North American Chapter of the Association for ComputationalLinguistics (HLT/NAACL-04), 2004.

With the increasing use of research paper search engines, such asCiteSeer, for both literature search and hiring decisions, theaccuracy of such systems is of paramount importance. This paperemploys Conditional Random Fields (CRFs) for the task of extractingvarious common fields from the headers and citation of researchpapers. The basic theory of CRFs is becoming well-understood, butbest-practices for applying them to real-world data requiresadditional exploration. This paper makes an empirical exploration ofseveral factors, including variations on Gaussian, exponential andhyperbolic priors for improved regularization, and several classes offeatures and Markov order. On a standard benchmark data set, weachieve new state-of-the-art performance, reducing error in average F1by 36%, and word error rate by 78% in comparison with the previousbest SVM results. Accuracy compares even more favorably against HMMs.

Yasemin Altun, Thomas Hofmann and Alexander J. Smola. Gaussianprocess classification for segmenting and annotating sequences. InProceedings of the Twenty-First International Conference on MachineLearning (ICML 2004), 2004.

Many real-world classification tasks involve the prediction ofmultiple, inter-dependent class labels. A prototypical case of thissort deals with prediction of a sequence of labels for a sequence ofobservations. Such problems arise naturally in the context ofannotating and segmenting observation sequences. This papergeneralizes Gaussian Process classification to predict multiple labelsby taking dependencies between neighboring labels into account. Ourapproach is motivated by the desire to retain rigorous probabilisticsemantics, while overcoming limitations of parametric methods likeConditional Random Fields, which exhibit conceptual and computationaldifficulties in high-dimensional input spaces. Experiments on namedentity recognition and pitch accent prediction tasks demonstrate thecompetitiveness of our approach.

Yasemin Altun and Thomas Hofmann. GaussianProcess Classification for Segmenting and Annotating Sequences.Technical Report CS-04-12, Department of Computer Science, BrownUniversity, 2004.

Multiclass classification refers to the problem of assigning labels toinstances where labels belong to some finite set of elements. Often,however, the instances to be labeled do not occur in isolation, butrather in observation sequences. One is then interested in predictingthe joint label configuration, i.e. the sequence of labels, usingmodels that take possible interdependencies between label variablesinto account. This scenario subsumes problems of sequence segmentationand annotation. In this paper, we investigate the use of GaussianProcess (GP) classification for label sequences.

2005

Cristian Smimchisescu, Atul Kanaujia, Zhiguo Li and DimitrisMetaxus. ConditionalModels for Contextual Human Motion Recognition. In Proceedingsof the International Conference on Computer Vision, (ICCV 2005),Beijing, China, 2005.

We present algorithms for recognizing human motion inmonocular video sequences, based on discriminative Conditional RandomField (CRF) and Maximum Entropy Markov Models (MEMM). Existingapproaches to this problem typically use generative (joint) structureslike the Hidden Markov Model (HMM). Therefore they have to makesimplifying, often unrealistic assumptions on the conditionalindependence of observations given the motion class labels and cannotaccommodate overlapping features or long term contextual dependenciesin the observation sequence. In contrast, conditional models like theCRFs seamlessly represent contextual dependencies, support efficient,exact inference using dynamic programming, and their parameters can betrained using convex optimization. We introduce conditional graphicalmodels as complementary tools for human motion recognition and presentan extensive set of experiments that show how these typicallyoutperform HMMs in classifying not only diverse human activities likewalking, jumping, running, picking or dancing, but also fordiscriminating among subtle motion styles like normal walk and wanderwalk.

Ariadna Quattoni, Michael Collins and Trevor Darrel. Conditional Random Fields for Object Recognition. In Advancesin Neural Information Processing Systems 17 (NIPS 2004), 2005.

We present a discriminative part-based approach for the recognition ofobject classes from unsegmented cluttered scenes. Objects are modeledas flexible constellations of parts conditioned on local observationsfound by an interest operator. For each object class the probabilityof a given assignment of parts to local features is modeled by aConditional Random Field (CRF). We propose an extension of the CRFframework that incorporates hidden variables and combines classconditional CRFs into a unified framework for part-based objectrecognition. The parameters of the CRF are estimated in a maximumlikelihood framework and recognition proceeds by finding the mostlikely class under our model. The main advantage of the proposed CRFframework is that it allows us to relax the assumption of conditionalindependence of the observed data (i.e. local features) often used ingenerative approaches, an assumption that might be too restrictive fora considerable number of object classes. We illustrate the potentialof the model in the task of recognizing cars from rear and side views.

Jospeh Bockhorst and Mark Craven. MarkovNetworks for Detecting Overlapping Elements in Sequence Data. InAdvances in Neural Information Processing Systems 17 (NIPS2004), 2005.

Many sequential prediction tasks involve locating instances of pat-terns in sequences. Generative probabilistic language models, such ashidden Markov models (HMMs), have been successfully applied to many ofthese tasks. A limitation of these models however, is that they cannotnaturally handle cases in which pattern instances overlap in arbitraryways. We present an alternative approach, based on conditional Markovnetworks, that can naturally represent arbitrarily overlappingelements. We show how to efficiently train and perform inference withthese models. Experimental results from a genomics domain show thatour models are more accurate at locating instances of overlappingpatterns than are baseline models based on HMMs.

Antonio Torralba, Kevin P. Murphy, William T. Freeman. Contextualmodels for object detection using boosted random fields. InAdvances in Neural Information Processing Systems 17 (NIPS2004), 2005.

We seek to both detect and segment objects in images. To exploit bothlocal image data as well as contextual information, we introduceBoosted Random Fields (BRFs), which uses Boosting to learn the graphstructure and local evidence of a conditional random field (CRF). Thegraph structure is learned by assembling graph fragments in anadditive model. The connections between individual pixels are not veryinformative, but by using dense graphs, we can pool information fromlarge regions of the image; dense models also support efficientinference. We show how contextual information from other objects canimprove detection performance, both in terms of accuracy and speed, byusing a computational cascade. We apply our system to detect stuff andthings in office and street scenes.

Sunita Sarawagi and William W. Cohen. Semi-MarkovConditional Random Fields for Information Extraction. InAdvances in Neural Information Processing Systems 17 (NIPS2004), 2005.

We describe semi-Markov conditional random fields (semi-CRFs), aconditionally trained version of semi-Markov chains. Intuitively, asemi-CRF on an input sequence x outputs a "segmentation" of x, inwhich labels are assigned to segments (i.e., subsequences) of x ratherthan to individual elements x_i of x. Importantly, features for semi-CRFs can measure propertiesof segments, and transitions within a segment can be non-Markovian. Inspite of this additional power, exact learning and inferencealgorithms for semi-CRFs are polynomial-time—often only a smallconstant factor slower than conventional CRFs. In experiments on fivenamed entity recognition problems, semi-CRFs generally outperformconventional CRFs.

Yuan Qi, Martin Szummer and Thomas P. Minka. BayesianConditional Random Fields. To appear in Proceedings of theTenth International W/orkshop on Artificial Intelligence andStatistics (AISTATS 2005), 2005.

We propose Bayesian Conditional Random Fields (BCRFs) for classifyinginterdependent and structured data, such as sequences, images orwebs. BCRFs are a Bayesian approach to training and inference withconditional random fields, which were previously trained by maximizinglikelihood (ML) (Lafferty et al., 2001). Our framework eliminates theproblem of overfitting, and offers the full advantages of a Bayesiantreatment. Unlike the ML approach, we estimate the posteriordistribution of the model parameters during training, and average overthis posterior during inference. We apply an extension of EP method,the power EP method, to incorporate the partition function. Foralgorithmic stability and accuracy, we flatten the approximationstructures to avoid two-level approximations. We demonstrate thesuperior prediction accuracy of BCRFs over conditional random fieldstrained with ML or MAP on synthetic and real datasets.

Aron Culotta, David Kulp and Andrew McCallum. GenePrediction with Conditional Random Fields. Technical ReportUM-CS-2005-028. University of Massachusetts, Amherst, 2005.

Given a sequence of DNA nucleotide bases, the task of gene predictionis to find subsequences of bases that encode proteins. Reasonableperformance on this task has been achieved using generatively trainedsequence models, such as hidden Markov models. We propose instead theuse of a discriminitively trained sequence model, the conditionalrandom field (CRF). CRFs can naturally incorporate arbitrary,non-independent features of the input without making conditionalindependence assumptions among the features. This can be particularlyimportant for gene finding, where including evidence from proteindatabases, EST data, or tiling arrays may improve accuracy. We eval-uate our model on human genomic data, and show that CRFs performbetter than HMM-based models at incorporating homology evidence fromprotein databases, achieving a 10% reduction in base-level errors.

Yang Wang and Qiang Ji. ADynamic Conditional Random Field Model for Object Segmentation inImage Sequences. In IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR 2005), Volume 1,2005.

This paper presents a dynamic conditional random field (DCRF) model tointegrate contextual constraints for object segmentation in imagesequences. Spatial and temporal dependencies within the segmentationprocess are unified by a dynamic probabilistic framework based on theconditional random field (CRF). An efficient approximate filteringalgorithm is derived for the DCRF model to recursively estimate thesegmentation field from the history of video frames. The segmentationmethod employs both intensity and motion cues, and it combines dynamicinformation and spatial interaction of the observed data. Experimentalresults show that the proposed approach effectively fuses contextualconstraints in video sequences and improves the accuracy of objectsegmentation.

software

MALLET: AMachine Learning for Language Toolkit.

MALLET is an integrated collection of Java code useful for statisticalnatural language processing, document classification, clustering,information extraction, and other machine learning applications totext.

ABNER: ABiomedical Named Entity Recognizer.

ABNER is a text analysis tool for molecular biology. It isessentially an interactive, user-friendly interface to a systemdesigned as part of the NLPBA/BioNLP 2004 Shared Task challenge.

MinorThird.