Notes on the Dirichlet Distribution and Dirichlet Process

Notes on the Dirichlet Distribution and Dirichlet Process

In [3]:
%matplotlib inline
 

Note: I wrote this post in an IPython notebook. It might be rendered better on NBViewer.

Dirichlet Distribution

The symmetric Dirichlet distribution (DD) can be considered a distribution of distributions. Each sample from the DD is acategorial distribution over K categories. It is parameterized G0, a distribution over K categories and α, a scale factor.

The expected value of the DD is G0. The variance of the DD is a function of the scale factor. When α is large, samples from DD(αG0) will be very close to G0. When α is small, samples will vary more widely.

We demonstrate below by setting G0=[.2,.2,.6] and varying α from 0.1 to 1000. In each case, the mean of the samples is roughly G0, but the standard deviation is decreases as α increases.

In [10]:
import numpy as np
from scipy.stats import dirichlet np.set_printoptions(precision=2) def stats(scale_factor, G0=[.2, .2, .6], N=10000): samples = dirichlet(alpha = scale_factor * np.array(G0)).rvs(N) print " alpha:", scale_factor print " element-wise mean:", samples.mean(axis=0) print "element-wise standard deviation:", samples.std(axis=0) print for scale in [0.1, 1, 10, 100, 1000]: stats(scale) 
 
                          alpha: 0.1
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.38  0.38  0.47]

                          alpha: 1
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.28  0.28  0.35]

                          alpha: 10
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.12  0.12  0.15]

                          alpha: 100
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.04  0.04  0.05]

                          alpha: 1000
              element-wise mean: [ 0.2  0.2  0.6]
element-wise standard deviation: [ 0.01  0.01  0.02]

 

Dirichlet Process

 

The Dirichlet Process can be considered a way to generalizethe Dirichlet distribution. While the Dirichlet distribution is parameterized by a discrete distribution G0 and generates samples that are similar discrete distributions, the Dirichlet process is parameterized by a generic distribution H0 and generates samples which are distributions similar to H0. The Dirichlet process also has a parameter α that determines how similar how widely samples will vary from H0.

We can construct a sample H (recall that H is a probability distribution) from a Dirichlet process DP(αH0) by drawing a countably infinite number of samples θk from H0) and setting:

H=k=1πkδ(xθk)

where the πk are carefully chosen weights (more later) that sum to 1. (δ is the Dirac delta function.)

H, a sample from DP(αH0), is a probability distributionthat looks similar to H0 (also a distribution). In particular, His a discrete distribution that takes the value θk with probability πk. This sampled distribution H is a discrete distribution even if H0 has continuous support; the support ofH is a countably infinite subset of the support H0.

The weights (pk values) of a Dirichlet process sample related the Dirichlet process back to the Dirichlet distribution.

Gregor Heinrich writes:

The defining property of the DP is that its samples have weights πk and locations θk distributed in such a way that when partitioning S(H) into finitely many arbitrary disjoint subsets S1,,Sj J<, the sums of the weights πk in each of these J subsets are distributed according to a Dirichlet distribution that is parameterized by α and a discrete base distribution (likeG0) whose weights are equal to the integrals of the base distribution H0 over the subsets Sn.

As an example, Heinrich imagines a DP with a standard normal base measure H0N(0,1). Let H be a sample fromDP(H) and partition the real line (the support of a normal distribution) as S1=(,1]S2=(1,1], and S3=(1,] then

H(S1),H(S2),H(S3)Dir(αerf(1),α(erf(1)erf(1)),α(1erf(1)))

where H(Sn) be the sum of the πk values whose θk lie in Sn.

These Sn subsets are chosen for convenience, however similar results would hold for any choice of Sn. For any sample from a Dirichlet process, we can construct a sample from a Dirichletdistribution by partitioning the support of the sample into a finite number of bins.

There are several equivalent ways to choose the πk so that this property is satisfied: the Chinese restaurant process, the stick-breaking process, and the Pólya urn scheme.

To generate {πk} according to a stick-breaking process we define βk to be a sample from Beta(1,α)π1 is equal to β1. Successive values are defined recursively as

πk=βkj=1k1(1βj).

Thus, if we want to draw a sample from a Dirichlet distribution, we could, in theory, sample an infinite number of θk values from the base distribution H0, an infinite number of βk values from the Beta distribution. Of course, sampling an infinite number of values is easier in theory than in practice.

However, by noting that the πk values are positive values summing to 1, we note that, in expectation, they must get increasingly small as k. Thus, we can reasonably approximate a sample HDP(αH0) by drawing enoughsamples such that Kk=1πk1.

We use this method below to draw approximate samples from several Dirichlet processes with a standard normal (N(0,1)) base distribution but varying α values.

Recall that a single sample from a Dirichlet process is a probability distribution over a countably infinite subset of the support of the base measure.

The blue line is the PDF for a standard normal. The black lines represent the θk and πk values; θk is indicated by the position of the black line on the x-axis; πk is proportional to the height of each line.

We generate enough πk values are generated so their sum is greater than 0.99. When α is small, very few θk's will have corresponding πk values larger than 0.01. However, as αgrows large, the sample becomes a more accurate (though still discrete) approximation of N(0,1).

In [13]:
import matplotlib.pyplot as plt
from scipy.stats import beta, norm def dirichlet_sample_approximation(base_measure, alpha, tol=0.01): betas = [] pis = [] betas.append(beta(1, alpha).rvs()) pis.append(betas[0]) while sum(pis) < (1.-tol): s = np.sum([np.log(1 - b) for b in betas]) new_beta = beta(1, alpha).rvs() betas.append(new_beta) pis.append(new_beta * np.exp(s)) pis = np.array(pis) thetas = np.array([base_measure() for _ in pis]) return pis, thetas def plot_normal_dp_approximation(alpha): plt.figure() plt.title("Dirichlet Process Sample with N(0,1) Base Measure") plt.suptitle("alpha: %s" % alpha) pis, thetas = dirichlet_sample_approximation(lambda: norm().rvs(), alpha) pis = pis * (norm.pdf(0) / pis.max()) plt.vlines(thetas, 0, pis, ) X = np.linspace(-4,4,100) plt.plot(X, norm.pdf(X)) plot_normal_dp_approximation(.1) plot_normal_dp_approximation(1) plot_normal_dp_approximation(10) plot_normal_dp_approximation(1000

转载于:https://www.cnblogs.com/yymn/p/4687198.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值