【ML】Markov Chain Monte Carlo(MCMC)---Slice sampler(切片采样）和Hierarchical Models(层次模型)

本文链接：https://blog.csdn.net/qq_18822147/article/details/108755863

Slice sampler

另一种MCMC算法是切片采样，和Gibbs sampling类似，采样过程中没有tunning processes并且所有提取的样本都被接受.
slice sampling

In slice sampling, the Markov chain is constructed by using an auxiliary variable representing slices through the (unnormalized) posterior that is constructed using only the current parameter value. Like Gibbs sampling, there is no tunning process and all proposals are accepted. For slice sampling, you either need the inverse distribution function or some way to estimate it.

假设要从后验分布 $\mathcal{N}(0, 1)$ 中采集随机样本

Start with some value $x$ -sample $y$ from $\mathcal{U}(0, f(x))$ this is the horizontal slice that gives the method its name - sample the next $x$ from $f^{-1}(x)$ - this is typically done numerically.

在两个方向上交替采样，在均匀分布 $\mathcal{U}(0, f(x))$ 上采集样本 $y$ ，再从 $f^{-1}(y)$ 上采集 $x$

def slice_sampler():
    dist = ss.norm(5, 3) # Gaussian分布
    w = 0.5
    x = dist.rvs()
    
    niters = 1000 # 采集1000个样本
    xs = []
    while len(xs)<niters:
        y = np.random.uniform(0, dist.pdf(x)) # y为U[0, f(x)]上的样本
        lb = x
        rb = x
        while y<dist.pdf(lb): lb-=w
        while y<dist.pdf(rb): rb+=w
        x = np.random.uniform(lb, rb) # x样本
        if y > dist.pdf(x):
            if np.abs(x-lb)<np.abs(x-rb): lb=x
            else: lb=y
        else: xs.append(x)

    plt.hist(xs, bins=20)
    plt.grid(True)
    plt.legend()
    plt.show()
slice_sampler()

slice
Notes:

the slice may consist of disjoint pieces for multimodal distributions.
the slice can be rectangular hyperslab for multivariable psterior distributions.
sampling from the slice is non-trivial and may involve iterative rejection steps.

2D slice sample

在两个方向上交替迭代取值，即在第i次迭代中
$\begin{aligned} &u^{(t+1)}\sim U_{[0, f(x^t)]}\\ &x^{(t+1)}\sim U_{A^{(t+1)}}\\ &A^{(t+1)}=\{x: f(x)\geq u^{(t+1)}\} \end{aligned}$
给定pdf
$f(x)=\frac{1}{Z}\exp(-\frac{-(x+1)^2}{2}), x\in(0, 1)$
设置起点值为 $x = 0.10$ ，采集前30个样本点的轨迹如下（Matlab Code）

% 2D slice sampler
T = 0:10000;
T = T/10000;
% N(-1,1)
y = exp(-(T+1).^2/2);
plot(T,y);
hold on;
x = 0.10; % 设置初始点
u = rand *(exp(-(x+1).^2/2));
x_s = [x];
u_s = [u];
for k = 1:30;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
plot(x_s,u_s,'-*');
hold off;

2d slice 设置不同起始点对比采样结果

%%
x = 0.01;
u = 0.01;
x_s = [x];
u_s = [u];
for k = 1:50;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
figure;
subplot(1,3,1);
plot(x_s,u_s,'*');hold on;plot(T,y);
x = 0.99;
u = 0.0001;
x_s = [x];
u_s = [u];
for k = 1:50;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
subplot(1,3,2);
plot(x_s,u_s,'*');hold on;plot(T,y);
x = 0.25;
u = 0.0025;
x_s = [x];
u_s = [u];
for k = 1:50;
    limit = -1 + sqrt(-2*log(u));
    limit = min([limit 1]);
    x = rand * limit;
    x_s = [x_s x];
    u_s = [u_s u];
    u = rand *(exp(-(x+1).^2/2));
    x_s = [x_s x];
    u_s = [u_s u];
end
subplot(1,3,3);
plot(x_s,u_s,'*');hold on;plot(T,y);

General Slice Sampler

在一般slice sampler中，由于pdf函数的复杂性，集合 $A^{(t+1)}$ 的范围比较难以确定，可以将pdf函数分解为多个较为简单函数积处理
$\begin{aligned} &f(x)\propto \prod_{i=1}^k f_i(x)\\ &(x, \omega_1, \dots, \omega_k)\sim p(x, \omega_1, \dots, \omega_k)\propto \prod_{i=1}^k \pi_{0\leq \omega_i\leq f_i(x)}\\ &f_i(x)=\int 0\leq w_i\leq f_i(x)d\omega_i \end{aligned}$
slice sampler algorithm
$\left\{ \begin{aligned} &\omega_1^{t+1}\sim U_{[0, f_1(x(t))]}\\ &\vdots\\ &w_k^{t+1}\sim U_{[0, f_k(x(t))]}\\ &x^{(t+1)}\sim U_{A^{(t+1)}}\\ &A^{(t+1)}=\{y; f_i(y)\geq w_i^{(t+1)}, i=1, 2, \dots, k\} \end{aligned} \right.$
设
$f(x)=(1+\sin^2(3x))(1+\cos^4(5x))\exp(-x^2/2)=f_1(x)f_2(x)f_3(x)$

% general slice sample
x=0;
u1 = rand*(1+sin(3*x)^2);
u2 = rand*(1+cos(5*x)^4);
u3 = rand*(exp(-x^2/2));
x_s = zeros(1, 10000);
for k=1:10000
    limit = sqrt(-2*log(u3));
    x=-limit+2*limit*rand;
    while ((sin(3*x))^2<u1-1 || (cos(5*x))^4<u2-1)
        x=-limit+2*limit*rand;
    end
    u1 = rand*(1+sin(3*x)^2);
    u2 = rand*(1+cos(5*x)^4);
    u3 = rand*(exp(-x^2/2));
    x_s(k)=x;
end
histogram(x_s, 150, 'FaceColor', [0.55, 0.55, 0.55], 'FaceAlpha', 0.90, 'EdgeColor', [0.9, 0.9, 0.9], 'Normalization', 'probability'); % 采样直方图
hold on;
yyaxis right; % 开启右轴
t = -4:0.1:4;
yt = (1+sin(3.*t).^2).*(1+cos(5.*t).^4).*exp(-t.^2/2);
plot(t, yt, 'Color', 'red', 'LineStyle', '-.');
hold off

sample

Hierarchical models

层次模型的结构如下，首先指定数据来自参数为 $\theta$ 的分布
$X\sim f(X\mid\theta)$
参数本身也来自另一个以 $\lambda$ 为参数的分布
$\theta\sim g(\theta\mid\lambda)$
$\lambda$ 来自先验分布
$\lambda\sim h(\lambda)$

The essential idea of the hierarchical model is because the $\theta$ s are not independent but rather are drawn from a common distribution with parameter $\lambda$ , we can share information across the $\theta$ s by also estimating $\lambda$ at the same time.

例：对一批来自相同铸币厂的缺陷硬币进行检测，可以通过多次投掷结果使用统计推断的方法找出bias，一般有两种方法解决该问题

estimate the bias of each coin from its coin toss data independently of all the others.(孤立样本点估计)
pool the results together and estimate the same bias for all coins.(所有样本点聚集估计)

由于层次模型的条件独立性，一般使用Gibbs sampling作为MCMC的采样策略.

Note that because of the conditionally independent structure of hierarchial models, Gibbs sampling is often a natural choice for the MCMC sampling strategy.

假设核电站的泵每10个有 $y_i$ 个是次品，每次观测泵的时间点为 $t_i$ ，使用Poisson likelihood对失败的次数进行建模，其中强度参数 $\lambda_i$ (表示失败的次数）在每个泵中都不相同，由于每个泵的观测时间点是不同的，需要将 $\lambda_i$ 放缩至 $\lambda_it_i$ .

似然函数 $f$ 为
$\prod_{i=1}^{10}Poisson(\lambda_it_i)$
其中参数 $\lambda$ 服从先验概率密度函数 $g$
$\Gamma(\alpha, \beta)$
设置 $\alpha=1.8$ ，超参数 $\beta$ 的分布为
$\Gamma(\gamma, \delta)$
其中 $\gamma=0.01;\delta=1$
层次模型中有11个未知参数（10个 $\lambda$ 和 $\beta$ ），后验分布为
$p(\lambda, \beta\mid y, t)=\prod_{i=1}^{10}Poisson(\lambda_it_i)\times \Gamma(\alpha, \beta)\times \Gamma(\gamma, \delta)$
条件分布为
$\left\{ \begin{aligned} &p(\lambda_i\mid \lambda_{-i}, \beta, y, t)=\Gamma(y_i+\alpha, t_i+\beta)\\ &p(\beta\mid \lambda, y, t)=\Gamma(10\alpha+\gamma, \delta+\sum_{i=1}^{10}\lambda_i) \end{aligned} \right.$

# 根据关于lambda_i的条件概率公式进行Gamma抽样
def lambda_update(alpha, beta, y, t):
    return G(size=len(y), shape=y+alpha, scale=1.0/(t+beta))

# 根据关于beta的条件概率公式进行Gamma抽样
def beta_update(alpha, gamma, delta, lam, y):
    return G(size=1, shape=len(y)*alpha+gamma, scale=1.0/(delta+lam.sum()))

def gibbs(niter, y, t, alpha, gamma, delta):
    lambdas_ = np.zeros((niter, len(y)), np.float)
    betas_ = np.zeros((niter, 1), np.float)

    lambda_ = y/t # 设置初始值

    for i in range(niter):
        beta_ = beta_update(alpha, gamma, delta, lambda_, y)
        lambda_ = lambda_update(alpha, beta_, y, t)

        betas_[i] = beta_
        lambdas_[i, :]=lambda_

    return betas_, lambdas_

def hierarchical_model_demo():
    alpha = 1.8
    gamma = 0.01
    delta = 1.0
    beta0 = 1
    y = np.array([5, 1, 5, 14, 3, 19, 1, 1, 4, 22], np.int)
    t = np.array([94.32, 15.72, 62.88, 125.76, 5.24, 31.44, 1.05, 1.05, 2.10, 10.48], np.float)
    niter = 1000

    betas, lambdas = gibbs(niter, y, t, alpha, gamma, delta)
    print('betas mean: {:.3f}'.format(betas.mean()))
    print('betas std: {:.3f}'.format(betas.std(ddof=1)))
    print('lambdas mean:')
    print(lambdas.mean(axis=0))
    print('lambdas std:')
    print(lambdas.std(ddof=1, axis=0))
    plt.figure(figsize=(10, 20))
    plt.subplots_adjust(left=None, bottom=None, right=None, top=None,wspace=None, hspace=0.6)
    for i in range(lambdas.shape[1]):
        plt.subplot(5,2,i+1)
        plt.plot(lambdas[::10, i], ':', linewidth=1)
        plt.title('Trace for $\lambda_{}$'.format(i))
        plt.grid(True)
    plt.show()