导航
Slice sampler
另一种MCMC
算法是切片采样,和Gibbs sampling
类似,采样过程中没有tunning processes
并且所有提取的样本都被接受.
In slice sampling, the Markov chain is constructed by using an auxiliary variable representing slices through the (unnormalized) posterior that is constructed using only the current parameter value. Like Gibbs sampling, there is no tunning process and all proposals are accepted. For slice sampling, you either need the inverse distribution function or some way to estimate it.
假设要从后验分布 N ( 0 , 1 ) \mathcal{N}(0, 1) N(0,1)中采集随机样本
Start with some value x x x-sample y y y from U ( 0 , f ( x ) ) \mathcal{U}(0, f(x)) U(0,f(x)) this is the horizontal
slice
that gives the method its name - sample the next x x x from f − 1 ( x ) f^{-1}(x) f−1(x) - this is typically done numerically.
在两个方向上交替采样,在均匀分布 U ( 0 , f ( x ) ) \mathcal{U}(0, f(x)) U(0,f(x))上采集样本 y y y,再从 f − 1 ( y ) f^{-1}(y) f−1(y)上采集 x x x
def slice_sampler():
dist = ss.norm(5, 3) # Gaussian分布
w = 0.5
x = dist.rvs()
niters = 1000 # 采集1000个样本
xs = []
while len(xs)<niters:
y = np.random.uniform(0, dist.pdf(x)) # y为U[0, f(x)]上的样本
lb = x
rb = x
while y<dist.pdf(lb): lb-=w
while y<dist.pdf(rb): rb+=w
x = np.random.uniform(lb, rb) # x样本
if y > dist.pdf(x):
if np.abs(x-lb)<np.abs(x-rb): lb=x
else: lb=y
else: xs.append(x)
plt.hist(xs, bins=20)
plt.grid(True)
plt.legend()
plt.show()
slice_sampler()
Notes:
- the slice may consist of disjoint pieces for multimodal distributions.
- the slice can be rectangular hyperslab for multivariable psterior distributions.
- sampling from the slice is non-trivial and may involve iterative rejection steps.
2D slice sample
在两个方向上交替迭代取值,即在第i
次迭代中
u
(
t
+
1
)
∼
U
[
0
,
f
(
x
t
)
]
x
(
t
+
1
)
∼
U
A
(
t
+
1
)
A
(
t
+
1
)
=
{
x
:
f
(
x
)
≥
u
(
t
+
1
)
}
\begin{aligned} &u^{(t+1)}\sim U_{[0, f(x^t)]}\\ &x^{(t+1)}\sim U_{A^{(t+1)}}\\ &A^{(t+1)}=\{x: f(x)\geq u^{(t+1)}\} \end{aligned}
u(t+1)∼U[0,f(xt)]x(t+1)∼UA(t+1)A(t+1)={x:f(x)≥u(t+1)}
给定pdf
f
(
x
)
=
1
Z
exp
(
−
−
(
x
+
1
)
2
2
)
,
x
∈
(
0
,
1
)
f(x)=\frac{1}{Z}\exp(-\frac{-(x+1)^2}{2}), x\in(0, 1)
f(x)=Z1exp(−2−(x+1)2),x∈(0,1)
设置起点值为
x
=
0.10
x=0.10
x=0.10,采集前30个样本点的轨迹如下(Matlab Code)
% 2D slice sampler
T = 0:10000;
T = T/10000;
% N(-1,1)
y = exp(-(T+1).^2/2);
plot(T,y);
hold on;
x = 0.10; % 设置初始点
u = rand *(exp(-(x+1).^2/2));
x_s = [x];
u_s = [u];
for k = 1:30;
limit = -1 + sqrt(-2*log(u));
limit = min([limit 1]);
x = rand * limit;
x_s = [x_s x];
u_s = [u_s u];
u = rand *(exp(-(x+1).^2/2));
x_s = [x_s x];
u_s = [u_s u];
end
plot(x_s,u_s,'-*');
hold off;
设置不同起始点对比采样结果
%%
x = 0.01;
u = 0.01;
x_s = [x];
u_s = [u];
for k = 1:50;
limit = -1 + sqrt(-2*log(u));
limit = min([limit 1]);
x = rand * limit;
x_s = [x_s x];
u_s = [u_s u];
u = rand *(exp(-(x+1).^2/2));
x_s = [x_s x];
u_s = [u_s u];
end
figure;
subplot(1,3,1);
plot(x_s,u_s,'*');hold on;plot(T,y);
x = 0.99;
u = 0.0001;
x_s = [x];
u_s = [u];
for k = 1:50;
limit = -1 + sqrt(-2*log(u));
limit = min([limit 1]);
x = rand * limit;
x_s = [x_s x];
u_s = [u_s u];
u = rand *(exp(-(x+1).^2/2));
x_s = [x_s x];
u_s = [u_s u];
end
subplot(1,3,2);
plot(x_s,u_s,'*');hold on;plot(T,y);
x = 0.25;
u = 0.0025;
x_s = [x];
u_s = [u];
for k = 1:50;
limit = -1 + sqrt(-2*log(u));
limit = min([limit 1]);
x = rand * limit;
x_s = [x_s x];
u_s = [u_s u];
u = rand *(exp(-(x+1).^2/2));
x_s = [x_s x];
u_s = [u_s u];
end
subplot(1,3,3);
plot(x_s,u_s,'*');hold on;plot(T,y);
General Slice Sampler
在一般slice sampler
中,由于pdf函数的复杂性,集合
A
(
t
+
1
)
A^{(t+1)}
A(t+1)的范围比较难以确定,可以将pdf函数分解为多个较为简单函数积处理
f
(
x
)
∝
∏
i
=
1
k
f
i
(
x
)
(
x
,
ω
1
,
…
,
ω
k
)
∼
p
(
x
,
ω
1
,
…
,
ω
k
)
∝
∏
i
=
1
k
π
0
≤
ω
i
≤
f
i
(
x
)
f
i
(
x
)
=
∫
0
≤
w
i
≤
f
i
(
x
)
d
ω
i
\begin{aligned} &f(x)\propto \prod_{i=1}^k f_i(x)\\ &(x, \omega_1, \dots, \omega_k)\sim p(x, \omega_1, \dots, \omega_k)\propto \prod_{i=1}^k \pi_{0\leq \omega_i\leq f_i(x)}\\ &f_i(x)=\int 0\leq w_i\leq f_i(x)d\omega_i \end{aligned}
f(x)∝i=1∏kfi(x)(x,ω1,…,ωk)∼p(x,ω1,…,ωk)∝i=1∏kπ0≤ωi≤fi(x)fi(x)=∫0≤wi≤fi(x)dωi
slice sampler algorithm
{
ω
1
t
+
1
∼
U
[
0
,
f
1
(
x
(
t
)
)
]
⋮
w
k
t
+
1
∼
U
[
0
,
f
k
(
x
(
t
)
)
]
x
(
t
+
1
)
∼
U
A
(
t
+
1
)
A
(
t
+
1
)
=
{
y
;
f
i
(
y
)
≥
w
i
(
t
+
1
)
,
i
=
1
,
2
,
…
,
k
}
\left\{ \begin{aligned} &\omega_1^{t+1}\sim U_{[0, f_1(x(t))]}\\ &\vdots\\ &w_k^{t+1}\sim U_{[0, f_k(x(t))]}\\ &x^{(t+1)}\sim U_{A^{(t+1)}}\\ &A^{(t+1)}=\{y; f_i(y)\geq w_i^{(t+1)}, i=1, 2, \dots, k\} \end{aligned} \right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎧ω1t+1∼U[0,f1(x(t))]⋮wkt+1∼U[0,fk(x(t))]x(t+1)∼UA(t+1)A(t+1)={y;fi(y)≥wi(t+1),i=1,2,…,k}
设
f
(
x
)
=
(
1
+
sin
2
(
3
x
)
)
(
1
+
cos
4
(
5
x
)
)
exp
(
−
x
2
/
2
)
=
f
1
(
x
)
f
2
(
x
)
f
3
(
x
)
f(x)=(1+\sin^2(3x))(1+\cos^4(5x))\exp(-x^2/2)=f_1(x)f_2(x)f_3(x)
f(x)=(1+sin2(3x))(1+cos4(5x))exp(−x2/2)=f1(x)f2(x)f3(x)
% general slice sample
x=0;
u1 = rand*(1+sin(3*x)^2);
u2 = rand*(1+cos(5*x)^4);
u3 = rand*(exp(-x^2/2));
x_s = zeros(1, 10000);
for k=1:10000
limit = sqrt(-2*log(u3));
x=-limit+2*limit*rand;
while ((sin(3*x))^2<u1-1 || (cos(5*x))^4<u2-1)
x=-limit+2*limit*rand;
end
u1 = rand*(1+sin(3*x)^2);
u2 = rand*(1+cos(5*x)^4);
u3 = rand*(exp(-x^2/2));
x_s(k)=x;
end
histogram(x_s, 150, 'FaceColor', [0.55, 0.55, 0.55], 'FaceAlpha', 0.90, 'EdgeColor', [0.9, 0.9, 0.9], 'Normalization', 'probability'); % 采样直方图
hold on;
yyaxis right; % 开启右轴
t = -4:0.1:4;
yt = (1+sin(3.*t).^2).*(1+cos(5.*t).^4).*exp(-t.^2/2);
plot(t, yt, 'Color', 'red', 'LineStyle', '-.');
hold off
Hierarchical models
层次模型的结构如下,首先指定数据来自参数为
θ
\theta
θ的分布
X
∼
f
(
X
∣
θ
)
X\sim f(X\mid\theta)
X∼f(X∣θ)
参数本身也来自另一个以
λ
\lambda
λ为参数的分布
θ
∼
g
(
θ
∣
λ
)
\theta\sim g(\theta\mid\lambda)
θ∼g(θ∣λ)
λ
\lambda
λ来自先验分布
λ
∼
h
(
λ
)
\lambda\sim h(\lambda)
λ∼h(λ)
The essential idea of the hierarchical model is because the θ \theta θs are not independent but rather are drawn from a common distribution with parameter λ \lambda λ,
we can share information across the
θ \theta θsby also estimating
λ \lambda λ at the same time.
例:对一批来自相同铸币厂的缺陷硬币进行检测,可以通过多次投掷结果使用统计推断的方法找出bias
,一般有两种方法解决该问题
- estimate the bias of each coin from its coin toss data independently of all the others.(孤立样本点估计)
- pool the results together and estimate the same bias for all coins.(所有样本点聚集估计)
由于层次模型的条件独立性,一般使用Gibbs sampling
作为MCMC
的采样策略.
Note that because of the conditionally independent structure of hierarchial models, Gibbs sampling is often a natural choice for the MCMC sampling strategy.
假设核电站的泵每10个有
y
i
y_i
yi个是次品,每次观测泵的时间点为
t
i
t_i
ti,使用Poisson likelihood
对失败的次数进行建模,其中强度参数
λ
i
\lambda_i
λi(表示失败的次数)在每个泵中都不相同,由于每个泵的观测时间点是不同的,需要将
λ
i
\lambda_i
λi放缩至
λ
i
t
i
\lambda_it_i
λiti.
似然函数
f
f
f为
∏
i
=
1
10
P
o
i
s
s
o
n
(
λ
i
t
i
)
\prod_{i=1}^{10}Poisson(\lambda_it_i)
i=1∏10Poisson(λiti)
其中参数
λ
\lambda
λ服从先验概率密度函数
g
g
g
Γ
(
α
,
β
)
\Gamma(\alpha, \beta)
Γ(α,β)
设置
α
=
1.8
\alpha=1.8
α=1.8,超参数
β
\beta
β的分布为
Γ
(
γ
,
δ
)
\Gamma(\gamma, \delta)
Γ(γ,δ)
其中
γ
=
0.01
;
δ
=
1
\gamma=0.01;\delta=1
γ=0.01;δ=1
层次模型中有11个未知参数(10个
λ
\lambda
λ和
β
\beta
β),后验分布为
p
(
λ
,
β
∣
y
,
t
)
=
∏
i
=
1
10
P
o
i
s
s
o
n
(
λ
i
t
i
)
×
Γ
(
α
,
β
)
×
Γ
(
γ
,
δ
)
p(\lambda, \beta\mid y, t)=\prod_{i=1}^{10}Poisson(\lambda_it_i)\times \Gamma(\alpha, \beta)\times \Gamma(\gamma, \delta)
p(λ,β∣y,t)=i=1∏10Poisson(λiti)×Γ(α,β)×Γ(γ,δ)
条件分布为
{
p
(
λ
i
∣
λ
−
i
,
β
,
y
,
t
)
=
Γ
(
y
i
+
α
,
t
i
+
β
)
p
(
β
∣
λ
,
y
,
t
)
=
Γ
(
10
α
+
γ
,
δ
+
∑
i
=
1
10
λ
i
)
\left\{ \begin{aligned} &p(\lambda_i\mid \lambda_{-i}, \beta, y, t)=\Gamma(y_i+\alpha, t_i+\beta)\\ &p(\beta\mid \lambda, y, t)=\Gamma(10\alpha+\gamma, \delta+\sum_{i=1}^{10}\lambda_i) \end{aligned} \right.
⎩⎪⎪⎨⎪⎪⎧p(λi∣λ−i,β,y,t)=Γ(yi+α,ti+β)p(β∣λ,y,t)=Γ(10α+γ,δ+i=1∑10λi)
# 根据关于lambda_i的条件概率公式进行Gamma抽样
def lambda_update(alpha, beta, y, t):
return G(size=len(y), shape=y+alpha, scale=1.0/(t+beta))
# 根据关于beta的条件概率公式进行Gamma抽样
def beta_update(alpha, gamma, delta, lam, y):
return G(size=1, shape=len(y)*alpha+gamma, scale=1.0/(delta+lam.sum()))
def gibbs(niter, y, t, alpha, gamma, delta):
lambdas_ = np.zeros((niter, len(y)), np.float)
betas_ = np.zeros((niter, 1), np.float)
lambda_ = y/t # 设置初始值
for i in range(niter):
beta_ = beta_update(alpha, gamma, delta, lambda_, y)
lambda_ = lambda_update(alpha, beta_, y, t)
betas_[i] = beta_
lambdas_[i, :]=lambda_
return betas_, lambdas_
def hierarchical_model_demo():
alpha = 1.8
gamma = 0.01
delta = 1.0
beta0 = 1
y = np.array([5, 1, 5, 14, 3, 19, 1, 1, 4, 22], np.int)
t = np.array([94.32, 15.72, 62.88, 125.76, 5.24, 31.44, 1.05, 1.05, 2.10, 10.48], np.float)
niter = 1000
betas, lambdas = gibbs(niter, y, t, alpha, gamma, delta)
print('betas mean: {:.3f}'.format(betas.mean()))
print('betas std: {:.3f}'.format(betas.std(ddof=1)))
print('lambdas mean:')
print(lambdas.mean(axis=0))
print('lambdas std:')
print(lambdas.std(ddof=1, axis=0))
plt.figure(figsize=(10, 20))
plt.subplots_adjust(left=None, bottom=None, right=None, top=None,wspace=None, hspace=0.6)
for i in range(lambdas.shape[1]):
plt.subplot(5,2,i+1)
plt.plot(lambdas[::10, i], ':', linewidth=1)
plt.title('Trace for $\lambda_{}$'.format(i))
plt.grid(True)
plt.show()
python Code download
References
Computational Statistics in Python
马尔可夫链蒙特卡洛算法 (三) slice sampling
Slice sampler