四篇应该仔细读的关于文本分析的tutorial类文章

来自:http://jacoxu.com/?p=415

这四篇文章经常被提及到,现原文出自:http://blog.sciencenet.cn/blog-611051-535693.html
对文本分析进行详细深入介绍的肯定不只这四篇,这是本人目前读过的,其他比较好的tutorial类文章欢迎大家推荐补充。

第一篇:详细介绍了离散数据的参数估计方法,而不是像大多数教材中使用的Gaussian分布作为例子进行介绍。个人觉得最值得一读的地方是它使用Gibbs采样对LDA进行推断,其中相关公式的推导非常详细,是许多人了解LDA及其他相关topic model的必读文献。
@TECHREPORT{Hei09,
author = {Heinrich, Gregor},
title = {Parameter Estimation for Text Analysis},
institution = {vsonix GmbH and University of Leipzig},
year = {2009},
type = {Technical Report Version 2.9},
abstract = {Presents parameter estimation methods common with discrete probability
distributions, which is of particular interest in text modeling.
Starting with maximum likelihood, a posteriori and Bayesian estimation,
central concepts like conjugate distributions and Bayesian networks
are reviewed. As an application, the model of latent Dirichlet allocation
(LDA) is explained in detail with a full derivation of an aaproximate
inference algorithm based on Gibbs sampling, including a discussion
of Dirichlet hyperparameter estimation.},
}

第二篇:正像该文文摘中所陈述的那样,特别适合于计算机科学家。其中涉及的数学知识比较少,适用于不太关心数学细节的同仁。uninitiated好像是门外汉的意思,不难看出Resnik和Hardisty写该文的目的。
@TECHREPORT{RH10,
author = {Resnik, Philip and Hardisty, Eric},
title = {Gibbs Sampling for the Uninitiated},
institution = {University of Maryland},
year = {2010},
type = {Technical Report CS-TR-4956, UMIACS-TR-2010-04, LAMP-153},
abstract = {This document is intended for computer scientists who would like to
try out a Markov Chain Monte Carlo (MCMC) technique, particularly
in order to do inference with Bayesian models on problems related
to text processing. We try to keep theory to the absolute minimum
needed, though we work through the details much more explicitly than
you usually see even in “introductory” explanations. That means we’ve
attempted to be ridiculously explicit in our exposition and notation.

After providing the reasons and reasoning behind Gibbs sampling (and
at least nodding our heads in the direction of theory), we work through
an example application in detail—the derivation of a Gibbs sampler
for a Na\”{i}ve Bayes model. Along with the example, we discuss some
practical implementation issues, including the integrating out of
continuous parameters when possible. We conclude with some pointers
to literature that we’ve found to be somewhat more friendly to uninitiated
readers.

Note: as of June 3, 2010 we have corrected some small errors in the
original April 2010 report.},
keywords = {Gibbs Sampling; Markov Chain Monte Carlo; Na\”{i}ve Bayes; Bayesian
Inference; Tutorial},
url = {http://drum.lib.umd.edu/bitstream/1903/10058/3/gsfu.pdf}
}

第三篇:Knight是做NLP的同仁们非常熟悉的大牛,就不多介绍了。
@ELECTRONIC{Kni09,
author = {Knight, Kevin},
title = {Bayesian Inference with Tears: A Tutorial Workbook for Natural Language
Researchers},
url = {http://www.isi.edu/natural-language/people/bayes-with-tears.pdf},
}

第四篇,LDA之父Blei和他的学生Gershman共同撰写的,对Bayesian非参数模型进行了详细介绍,特别对Chinese Restaurant Process (CRP)和Indian Buffet Process以非常直观的方式进行了讨论。
@ARTICLE{GB11,
author = {Gershman, Samuel J. and Blei, David M.},
title = {A Tutorial on Bayesian Nonparametric Models},
journal = {Journal of Mathematical Psychology},
year = {2011},
abstract = {A key problem in statistical modeling is model selection, that is,
how to choose a model at an appropriate level of complexity. This
problem appears in many settings, most prominently in choosing the
number of clusters in mixture models or the number of factors in
factor analysis. In this tutorial, we describe Bayesian nonparametric
methods, a class of methods that side-steps this issue by allowing
the data to determine the complexity of the model. This tutorial
is a high-level introduction to Bayesian nonparametric methods and
contains several examples of their application.},
keywords = {Bayesian Methods; Chinese Restaurant Process; Indian Buffer Process},
}



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值