【ML】Markov Chain Monte Carlo(MCMC)---Slice sampler(切片采样)和Hierarchical Models(层次模型)

Slice sampler

另一种MCMC算法是切片采样,和Gibbs sampling类似,采样过程中没有tunning processes并且所有提取的样本都被接受.
slice sampling

In slice sampling, the Markov chain is constructed by using an auxiliary variable representing slices through the (unnormalized) posterior that is constructed using only the current parameter value. Like Gibbs sampling, there is no tunning process and all proposals are accepted. For slice sampling, you either need the inverse distribution function or some way to estimate it.

假设要从后验分布 N ( 0 , 1 ) \mathcal{N}(0, 1) N(0,1)中采集随机样本

Start with some value x x x-sample y y y from U ( 0 , f ( x ) ) \mathcal{U}(0, f(x)) U(0,f(x)) this is the horizontal slice that gives the method its name - sample the next x x x from f − 1 ( x ) f^{-1}(x) f1(x) - this is typically done numerically.

在两个方向上交替采样,在均匀分布 U ( 0 , f ( x ) ) \mathcal{U}(0, f(x)) U(0,f(x))上采集样本 y y y,再从 f − 1 ( y ) f^{-1}(y) f1(y)上采集 x x x

def slice_sampler():
    dist = ss.norm(5, 3) # Gaussian分布
    w = 0.5
    x = dist.rvs()
    
    niters = 1000 # 采集1000个样本
    xs = []
    while len(xs)<niters:
        y = np.random.uniform(0, dist.pdf(x)) # y为U[0, f(x)]上的样本
        lb = x
        rb = x
        while y<dist.pdf(lb): lb-=w
        while y<dist.pdf(rb): rb+=w
        x = np.random.uniform(lb, rb) # x样本
        if y > dist.pdf(x):
            if np.abs(x-lb)<np.abs(x-rb): lb=x
            else: lb=y
        else: xs.append(x)

    plt.hist(xs, bins=20)
    plt.grid(True)
    plt.legend()
    plt.show()
slice_sampler()

slice
Notes:

  • the slice may consist of disjoint pieces for multimodal distributions.
  • the slice can be rectangular hyperslab for multivariable psterior distributions.
  • sampling from the slice is non-trivial and may involve iterative rejection steps.

2D slice sample

在两个方向上交替迭代取值,即在第i次迭代中
u ( t + 1 ) ∼ U [ 0 , f ( x t ) ] x ( t + 1 ) ∼ U A ( t + 1 ) A ( t + 1 ) = { x : f ( x ) ≥ u ( t + 1 ) } \begin{aligned} &u^{(t+1)}\sim U_{[0, f(x^t)]}\\ &x^{(t+1)}\sim U_{A^{(t+1)}}\\ &A^{(t+1)}=\{x: f(x)\geq u^{(t+1)}\} \end{aligned} u(t+1)U[0,f(xt)]x(t+1)UA(t+1)A(t+1)={x:f(x)u(t+1)}
给定pdf
f ( x ) = 1 Z exp ⁡ ( − − ( x + 1 ) 2 2 ) , x ∈ ( 0 , 1 ) f(x)=\frac{1}{Z}\exp(-\frac{-(x+1)^2}{2}), x\in(0, 1) f(x)=Z1exp(2(x+1)2),x(0,1)
设置起点值为 x = 0.10 x=0.10 x=0.10,采集前30个样本点的轨迹如下(Matlab Code)

% 2D slice sampler
T = 0:10000;
T = T/10000;
% N(-1,1)
y = exp(-(T+1).^2/2);
plot(T,y);
hold on;
x = 0.10; % 设置初始点
u = rand *(exp(-(x+1).^2/2));
x_s = [x];
u_s = [u];
for k = 1:30;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
plot(x_s,u_s,'-*');
hold off;

2d slice设置不同起始点对比采样结果

%%
x = 0.01;
u = 0.01;
x_s = [x];
u_s = [u];
for k = 1:50;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
figure;
subplot(1,3,1);
plot(x_s,u_s,'*');hold on;plot(T,y);
x = 0.99;
u = 0.0001;
x_s = [x];
u_s = [u];
for k = 1:50;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
subplot(1,3,2);
plot(x_s,u_s,'*');hold on;plot(T,y);
x = 0.25;
u = 0.0025;
x_s = [x];
u_s = [u];
for k = 1:50;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
subplot(1,3,3);
plot(x_s,u_s,'*');hold on;plot(T,y);

2d

General Slice Sampler

在一般slice sampler中,由于pdf函数的复杂性,集合 A ( t + 1 ) A^{(t+1)} A(t+1)的范围比较难以确定,可以将pdf函数分解为多个较为简单函数积处理
f ( x ) ∝ ∏ i = 1 k f i ( x ) ( x , ω 1 , … , ω k ) ∼ p ( x , ω 1 , … , ω k ) ∝ ∏ i = 1 k π 0 ≤ ω i ≤ f i ( x ) f i ( x ) = ∫ 0 ≤ w i ≤ f i ( x ) d ω i \begin{aligned} &f(x)\propto \prod_{i=1}^k f_i(x)\\ &(x, \omega_1, \dots, \omega_k)\sim p(x, \omega_1, \dots, \omega_k)\propto \prod_{i=1}^k \pi_{0\leq \omega_i\leq f_i(x)}\\ &f_i(x)=\int 0\leq w_i\leq f_i(x)d\omega_i \end{aligned} f(x)i=1kfi(x)(x,ω1,,ωk)p(x,ω1,,ωk)i=1kπ0ωifi(x)fi(x)=0wifi(x)dωi
slice sampler algorithm
{ ω 1 t + 1 ∼ U [ 0 , f 1 ( x ( t ) ) ] ⋮ w k t + 1 ∼ U [ 0 , f k ( x ( t ) ) ] x ( t + 1 ) ∼ U A ( t + 1 ) A ( t + 1 ) = { y ; f i ( y ) ≥ w i ( t + 1 ) , i = 1 , 2 , … , k } \left\{ \begin{aligned} &\omega_1^{t+1}\sim U_{[0, f_1(x(t))]}\\ &\vdots\\ &w_k^{t+1}\sim U_{[0, f_k(x(t))]}\\ &x^{(t+1)}\sim U_{A^{(t+1)}}\\ &A^{(t+1)}=\{y; f_i(y)\geq w_i^{(t+1)}, i=1, 2, \dots, k\} \end{aligned} \right. ω1t+1U[0,f1(x(t))]wkt+1U[0,fk(x(t))]x(t+1)UA(t+1)A(t+1)={y;fi(y)wi(t+1),i=1,2,,k}

f ( x ) = ( 1 + sin ⁡ 2 ( 3 x ) ) ( 1 + cos ⁡ 4 ( 5 x ) ) exp ⁡ ( − x 2 / 2 ) = f 1 ( x ) f 2 ( x ) f 3 ( x ) f(x)=(1+\sin^2(3x))(1+\cos^4(5x))\exp(-x^2/2)=f_1(x)f_2(x)f_3(x) f(x)=(1+sin2(3x))(1+cos4(5x))exp(x2/2)=f1(x)f2(x)f3(x)

% general slice sample
x=0;
u1 = rand*(1+sin(3*x)^2);
u2 = rand*(1+cos(5*x)^4);
u3 = rand*(exp(-x^2/2));
x_s = zeros(1, 10000);
for k=1:10000
    limit = sqrt(-2*log(u3));
    x=-limit+2*limit*rand;
    while ((sin(3*x))^2<u1-1 || (cos(5*x))^4<u2-1)
        x=-limit+2*limit*rand;
    end
    u1 = rand*(1+sin(3*x)^2);
    u2 = rand*(1+cos(5*x)^4);
    u3 = rand*(exp(-x^2/2));
    x_s(k)=x;
end
histogram(x_s, 150, 'FaceColor', [0.55, 0.55, 0.55], 'FaceAlpha', 0.90, 'EdgeColor', [0.9, 0.9, 0.9], 'Normalization', 'probability'); % 采样直方图
hold on;
yyaxis right; % 开启右轴
t = -4:0.1:4;
yt = (1+sin(3.*t).^2).*(1+cos(5.*t).^4).*exp(-t.^2/2);
plot(t, yt, 'Color', 'red', 'LineStyle', '-.');
hold off

sample

Hierarchical models

层次模型的结构如下,首先指定数据来自参数为 θ \theta θ的分布
X ∼ f ( X ∣ θ ) X\sim f(X\mid\theta) Xf(Xθ)
参数本身也来自另一个以 λ \lambda λ为参数的分布
θ ∼ g ( θ ∣ λ ) \theta\sim g(\theta\mid\lambda) θg(θλ)
λ \lambda λ来自先验分布
λ ∼ h ( λ ) \lambda\sim h(\lambda) λh(λ)

The essential idea of the hierarchical model is because the θ \theta θs are not independent but rather are drawn from a common distribution with parameter λ \lambda λ, we can share information across the θ \theta θs by also estimating λ \lambda λ at the same time.

例:对一批来自相同铸币厂的缺陷硬币进行检测,可以通过多次投掷结果使用统计推断的方法找出bias,一般有两种方法解决该问题

  1. estimate the bias of each coin from its coin toss data independently of all the others.(孤立样本点估计)
  2. pool the results together and estimate the same bias for all coins.(所有样本点聚集估计)

由于层次模型的条件独立性,一般使用Gibbs sampling作为MCMC的采样策略.

Note that because of the conditionally independent structure of hierarchial models, Gibbs sampling is often a natural choice for the MCMC sampling strategy.

假设核电站的泵每10个有 y i y_i yi个是次品,每次观测泵的时间点为 t i t_i ti,使用Poisson likelihood对失败的次数进行建模,其中强度参数 λ i \lambda_i λi(表示失败的次数)在每个泵中都不相同,由于每个泵的观测时间点是不同的,需要将 λ i \lambda_i λi放缩至 λ i t i \lambda_it_i λiti.

似然函数 f f f
∏ i = 1 10 P o i s s o n ( λ i t i ) \prod_{i=1}^{10}Poisson(\lambda_it_i) i=110Poisson(λiti)
其中参数 λ \lambda λ服从先验概率密度函数 g g g
Γ ( α , β ) \Gamma(\alpha, \beta) Γ(α,β)
设置 α = 1.8 \alpha=1.8 α=1.8,超参数 β \beta β的分布为
Γ ( γ , δ ) \Gamma(\gamma, \delta) Γ(γ,δ)
其中 γ = 0.01 ; δ = 1 \gamma=0.01;\delta=1 γ=0.01;δ=1
层次模型中有11个未知参数(10个 λ \lambda λ β \beta β),后验分布为
p ( λ , β ∣ y , t ) = ∏ i = 1 10 P o i s s o n ( λ i t i ) × Γ ( α , β ) × Γ ( γ , δ ) p(\lambda, \beta\mid y, t)=\prod_{i=1}^{10}Poisson(\lambda_it_i)\times \Gamma(\alpha, \beta)\times \Gamma(\gamma, \delta) p(λ,βy,t)=i=110Poisson(λiti)×Γ(α,β)×Γ(γ,δ)
条件分布为
{ p ( λ i ∣ λ − i , β , y , t ) = Γ ( y i + α , t i + β ) p ( β ∣ λ , y , t ) = Γ ( 10 α + γ , δ + ∑ i = 1 10 λ i ) \left\{ \begin{aligned} &p(\lambda_i\mid \lambda_{-i}, \beta, y, t)=\Gamma(y_i+\alpha, t_i+\beta)\\ &p(\beta\mid \lambda, y, t)=\Gamma(10\alpha+\gamma, \delta+\sum_{i=1}^{10}\lambda_i) \end{aligned} \right. p(λiλi,β,y,t)=Γ(yi+α,ti+β)p(βλ,y,t)=Γ(10α+γ,δ+i=110λi)

# 根据关于lambda_i的条件概率公式进行Gamma抽样
def lambda_update(alpha, beta, y, t):
    return G(size=len(y), shape=y+alpha, scale=1.0/(t+beta))

# 根据关于beta的条件概率公式进行Gamma抽样
def beta_update(alpha, gamma, delta, lam, y):
    return G(size=1, shape=len(y)*alpha+gamma, scale=1.0/(delta+lam.sum()))

def gibbs(niter, y, t, alpha, gamma, delta):
    lambdas_ = np.zeros((niter, len(y)), np.float)
    betas_ = np.zeros((niter, 1), np.float)

    lambda_ = y/t # 设置初始值

    for i in range(niter):
        beta_ = beta_update(alpha, gamma, delta, lambda_, y)
        lambda_ = lambda_update(alpha, beta_, y, t)

        betas_[i] = beta_
        lambdas_[i, :]=lambda_

    return betas_, lambdas_

def hierarchical_model_demo():
    alpha = 1.8
    gamma = 0.01
    delta = 1.0
    beta0 = 1
    y = np.array([5, 1, 5, 14, 3, 19, 1, 1, 4, 22], np.int)
    t = np.array([94.32, 15.72, 62.88, 125.76, 5.24, 31.44, 1.05, 1.05, 2.10, 10.48], np.float)
    niter = 1000

    betas, lambdas = gibbs(niter, y, t, alpha, gamma, delta)
    print('betas mean: {:.3f}'.format(betas.mean()))
    print('betas std: {:.3f}'.format(betas.std(ddof=1)))
    print('lambdas mean:')
    print(lambdas.mean(axis=0))
    print('lambdas std:')
    print(lambdas.std(ddof=1, axis=0))
    plt.figure(figsize=(10, 20))
    plt.subplots_adjust(left=None, bottom=None, right=None, top=None,wspace=None, hspace=0.6)
    for i in range(lambdas.shape[1]):
        plt.subplot(5,2,i+1)
        plt.plot(lambdas[::10, i], ':', linewidth=1)
        plt.title('Trace for $\lambda_{}$'.format(i))
        plt.grid(True)
    plt.show()

trace

python Code download

Hierarchical model demo code

References

Computational Statistics in Python
马尔可夫链蒙特卡洛算法 (三) slice sampling
Slice sampler

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Quant0xff

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值