Beta Distribution

Beta Distribution


Introduction

Being a computer science student, I had a hard time understanding the Beta Distribution. What makes it worse is that, everytime I tries to find out something from the internet, most of them only focus on the pdf (probability density function).

In this post, I am going to talk about Beta Distribution and some intuitive interpretations behind it.

An Example

Suppose we have two coins (A and B), and we are making a statistical experiment to identify whether these coins are biased or not.

For coin A, we tossed 5 times and the results are: 1,0,0,0,0. (1 indicates Head and 0 indicates Tail)

For coin B, we tossed 10 times and the results are: 1,1,0,0,0,0,0,0,0,0.

The probability for theses two coins to be Tail are identical: 0.2. Is it safe to say, both coins equally favor the Tail?

The answer is NOT, from the perspective of law of large numbers (LLN), the results obtained from a large number of trials should be close to the expeted value, and will tend to become closer as more trials are performed.

That means, although the standard deviation of the two distributions are the same,the standard error of A will be larger than that of B, because of a smaller sample size. (Please note that the standard error of a sample average measures the rough size of the difference between the population average and the sample average.)

Now that we know, the expected outcome of coin A and B are the same, while the confidence for these two events are different. Let  PA  denotes the probability for A to be Tail, and  PB  denotes the probability for B to be Tail. We want to know, the uncertainty for  PA  and  PB  to take different values, ranging from 0 to 1. In other words, that is the probability (uncertainty) for  PA  and  PB  to be different probilities. That is, the probability (uncertainty) for probability.

Below is the probability (uncertainty) that we wanted. (Please find the code that generates the below image here)

In the above graph, the red line illustrates coin A, the green line illustrates coin B, and the Blue line coorespondes to another coin with 80 Heads and 20 Tails.

From the red curve (coin A), we can find that, even though there is one Tail in the five tosses, the probability for it to be Tale has a peak around zero. That means, the most probable probability for coin A to be Tail is close to zero.

From the green curve (coin B), we can see that, the peak of probability is close to 0.15. That means, the most probable probability for coin B to be Tail is close to 0.15.

From the blue curve, the peak is close to 0.2. That means, the most probable probability for it to be Tail is close to 0.2.

Also, we can see that, although the expected probabilities for these coins to be Tail are the same: 0.2, the shapes of the probability distribution are different. And, the more data we get, the shape of probability distribution becomes more condensed in a small area.

That is Beta Distribution.

Definition

The pdf of Beta Distribution is:

P(x)=(1x)β1xα1B(α,β)0,x[0,1],otherwise

where  B(α,β)  is a normalizing constant to make the outcome of the formula ranging from 0 to 1.

B(α,β)=10yα1(1y)β1dy=yα1((1y)ββ)10+α1β10yα2(1y)βdy=0+α1β10yα2(1y)βdy=α1βB(α1,β+1)=(α1)(α2)1β(β+1)(β+α2)10(1y)α+β2dy=(α1)(α2)1β(β+1)(β+α1)=Γ(α)Γ(β)Γ(α+β)

where  Γ(x)  is the Gamma Function.

Γ(x)=(x1)!

Beta Distribution can express a wide range of different shapes for pdf, the above graph shows a variety of pdf from Beta Distribution.

Mean

The expected value of Beta Distribution is  αα+β , which answers the intuitive question why coin A and coin B has the same expected value.

η=10xP(x)=10x(1x)β1xα1B(α,β)=αα+β

Variance

The variance of a Beta distribution is:

var(X)=E[(xη)2]=αβ(α+β)2(α+β+1)

This answers the question that, if expected values are the same, as the number of trial are becoming larger and larger, the dispersion of the Beta distribution is becoming smaller and smaller.

Conjugate Prior

One important application of Beta distribution is that, it can be used as a conjugate prior for binomial distributions in Bayesian analysis.

In Bayesian probability theory, if the posterior distributions  P(h|D)  are in the same family as the prior probability distribution  P(h) , the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior.

A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior: otherwise a difficult numerical integration may be necessary. Further, conjugate priors may give intuition, by more transparently showing how a likelihood function updates a prior distribution.

For a Binomial distribution, we got  α  successes and  β  failures, we use this information as a prior to model the further  s  successes and  f  failures.

The prior is a Beta distribution:

P(q)=(1x)β1xα1B(α,β)

The likelihood is a Binomial distribution:

P(s,fx)=(ss+f)xs(1x)f

The posterior is another Beta distribution:

P(x|s,f)=P(s,f|x)P(x)P(s,f|x)P(x)dx=xs+α1(1x)f+β1B(s+α,f+β)=Beta(s+α,f+β)

This posterior distribution could then be used as the prior for more samples, with the hyperparameters simply adding each extra piece of information as it comes.

转自: http://xiangacadia.github.io/statistics/2014/07/29/Beta-Distribution.html
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值