概率估计（极大似然估计、拉普拉斯平滑定理、M-估计的关系）

最新推荐文章于 2024-07-07 14:35:39 发布

cyningsun

最新推荐文章于 2024-07-07 14:35:39 发布

阅读量1.2w

点赞数 2

分类专栏：文本分类文章标签：机器学习文本分类

文本分类专栏收录该内容

9 篇文章 0 订阅

订阅专栏

英文原文链接：http://www.temida.si/~bojan/probability_estimation.php

原文：

Probability estimation

1 Introduction

Let us assume that in an experiment we have conducted n independent trials, of which there are r successes. The rest of the trials (n-r) are failures. Examples of such experiments are tossing a coin (e.g. success is a head), throwing a dice (e.g. success is 6 on top), shooting a free throw in basketball (success is a scored point), etc.

In this report we will discuss the following question: how to estimate the probability of success in the next (n+1) trial.

In addition, we would like to devote special attention to the cases when the sample size is effectively small. In this context we will discuss and compare three methods: relative frequency, Laplace's law of succession and m-estimate.

What exactly do we mean by effectively small sample size? Good (1965) states then when there are more than three successes and more than three failures in our examples set, there is little difference between any of the described methods for many practical purposes. So, we will basically be dealing with the cases when either the number of successes or the number of failures are small (e.g. 0, 1, 2). Note that such situations can occur also when the actual sample size is large,

but we divide the trial set to subsets that fulfill certain conditions for the purpose of estimating conditional probabilities in such subsets.

2 Estimates for success in the next trial

2.1 Relative frequency

Relative frequency is sometimes called also maximum likelihood estimate. Probability of success in the nest trial is computed according to the following formula:

P=r/n

The estimation of probabilities can be regarded as a relatively simple task when the sample size is large enough. In such case we would hardly require any theory. Bernoulli's theorem states that when n is large enough, the probability of success in the next trial can be reliably estimated by the relative frequency P=r/n More formally, for any arbitrary small ε and δ such n₀ can be found that for every n≥n₀ the following inequation holds:

P(|r/n -P| <ε) >1 - δ

However, after completing just one trial, which was a failure, the relative frequency probability estimate of a success in the next trial would be 0.

2.2 Laplace's law of succession

In order to alleviate such zero probability estimations, a modified formula was proposed:

P = (r+1)/(n+2)

In this formula a uniform prior probability is assumed (Good, 1965). In fact, the rationale behind adding 1 in the numerator and 2 in denominator is the following:

before performing the actual experiment, we assume that there were two trials, one successful and one failure.

2.3 Bayesian m-estimate

More general Bayesian probability estimate is described in (Cestnik, 1990, 1991). To calculate a general Bayesian probability estimate on the basis of evidence,

a prior probability distribution has to be assumed first. Then, given the evidence, the prior distribution is updated to a posterior one, from which the expectation can be taken as a point estimate of p. The task of determining a prior probability distribution has been identified as an intrinsic difficulty of the Bayesian approach.

P=(r + P_am)/(n + m)

3 Estimation of conditional probabilities

The basic idea behind the proposed m-estimate (Cestnik, 1990, 1991) for estimating conditional probabilities is that the prior probabilities can be estimated from an unconditional sample. The remaining parameter m, which is related to the variance, has to be determined also. The prior variance is computed by the following formula:

Var(p) = P_a(1-P_a)/(m+1)

The parameter m is inversely proportional to the variance of the prior distribution. It also leverages the tradeoff between relative frequency and prior probability, as can be observed from the following form of m-estimate:

P=n/(n+m) * r/n + m/(n+m) * P_a

4 Literature

B. Cestnik: Estimating probabilities in machine learning, Ph.D. thesis, University of Ljubljana, Faculty of Computer and Information Science, 1991.

B. Cestnik: Estimating probabilities: A crucial task in machine learning. In: Carlucci Aiello, Luigia (ed.). ECAI 90. London: Pitman, 1990, str. 147-149.

B. Cestnik, I. Bratko: On estimating probabilities in tree pruning. In: Kodratoff, Yves. Machine learning - EWSL-91 : European working session on learning, Porto, Portugal, March 6-8, 1991: proceedings, (Lecture notes in computer science, Lecture notes in artificial intelligence, 482). Berlin [etc.]: Springer-Verlag, 1991, str. 138-150.

S. Džeroski, B. Cestnik, I. Petrovski: Using the m-estimate in rule induction. CIT. J. Comput. Inf. Technol., 1993, vol. 1, str. 37-46

I. J. Good: The Estimation of Probabilities: An Essay on Modern Bayesian Methods, Cambridge, Mass., M.I.T. Press, 1965.

译文：

概率估计

1引言

假设在一次实验中，我们进行了n次独立试验，其中有r次成功。其余的试验（n-r）失败。这样的试验例如：掷硬币（例如，头像的面为成功），掷骰子（例如，六朝上为成功），篮球中的罚球投篮（一个得分点是成功），等等。

在这份报告中，我们将讨论以下问题：如何估计下次（第n +1次）试验成功的概率。此外，我们将特别关注样例大小足够小的情况。在这种情况下我们将讨论和比较三种方法：相对频率，拉普拉斯平滑定理和M-估计。

足够小究竟是什么意思？Good（1965）指出当在我们的样例集中有三个以上的成功和三个以上的失败时，对于许多实际应用来说以上鸟叔的任何方法之间差别不大。所以，我们基本上要处理的情况是失败的数目或者成功的数目是很小的（如，0,1，2）。注意在这种情况下也常发生在样本大小很大时，但我们将分割试验集成子集以满足一定条件来估计这些子集上的条件概率。

2估计下次试验成功的概率

2.1相对频率

有时也称相对概率为极大似然估计。下次试验成功的概率按照以下公式计算：

P=r/n

当样本数量足够大的概率估计，可以看作是一个相对简单的任务。在这种情况下，我们就不需要任何理论。伯努利定理之处，当n足够大时，下次试验成功的概率可以可信的按相对频率P=r/n估计。更正式地说，对于任何任意小ε和δ，存在n₀使得对于每一个n≥n₀，有以下不等式：