1. Introduction: Simulated annealing for global optimization:
Instead of wanting to approximate
p(x)
p
(
x
)
, we want to find the global maximum. For example, if
p(x)
p
(
x
)
is the likelihood or posterior distribution, we often want to compute the ML and maximum a posteriori (MAP) estimates. As mentioned earlier, we could run a Markov chain of invariant distribution
p(x)
p
(
x
)
and estimate the global mode by
This method is inefficient because the random samples only rarely come from the vicinity of the mode. Unless the distribution has large probability mass around the mode, computing resources will be wasted exploring areas of no interest.
A more principled strategy is to adopt simulated annealing. This technique involves simulating a non-homogeneous Markov chain whose invariant distribution at iteration i i is no longer equal to , but to
where Ti T i is a decreasing cooling schedule with limi→∞Ti=0 l i m i → ∞ T i = 0 .
The reason for doing this is that, under weak regularity assumptions on p(x) p ( x ) , p∞(x) p ∞ ( x ) is a probability density that concentrates itself on the set of global maxima of p(x) p ( x ) .
Didi Lv: p∞(x) p ∞ ( x ) make the smaller modes become to zero, and the maximum mode becomes ∞ ∞ .
2. Problem:
Running the Simulated annealing algorithm with a Gaussian proposal distribution and a bimodal target distribution p(x)∝0.3 exp(−0.2x2)+0.7 exp(−0.2(x−10)2) p ( x ) ∝ 0.3 e x p ( − 0.2 x 2 ) + 0.7 e x p ( − 0.2 ( x − 10 ) 2 ) for 5000 iterations with Ti=(C ln(i+T0))−1 T i = ( C l n ( i + T 0 ) ) − 1 , where C C and are problem-dependent. In our test, we set C=1, T0=1 C = 1 , T 0 = 1 .
3. Cases:
Case 1: General acceptance probability for Simulated annealing:
Acceptance probability: A(x,x∗)=min{1,p(x∗)1Tiq(x|x∗)p(x)1Tiq(x∗|x)} A ( x , x ∗ ) = m i n { 1 , p ( x ∗ ) 1 T i q ( x | x ∗ ) p ( x ) 1 T i q ( x ∗ | x ) }
Case 2: Metropolis algorithm for Simulated annealing:
Acceptance probability: q(x∗|x)=q(x|x∗)⇒A(x,x∗)=min{1,p(x∗)1Tip(x)1Ti} q ( x ∗ | x ) = q ( x | x ∗ ) ⇒ A ( x , x ∗ ) = m i n { 1 , p ( x ∗ ) 1 T i p ( x ) 1 T i }
Case 3: Independent sampler for Simulated annealing:
Acceptance probability: q(x∗|x)=q(x∗)⇒A(x,x∗)=min{1,p(x∗)1Tiq(x)p(x)1Tiq(x∗)} q ( x ∗ | x ) = q ( x ∗ ) ⇒ A ( x , x ∗ ) = m i n { 1 , p ( x ∗ ) 1 T i q ( x ) p ( x ) 1 T i q ( x ∗ ) }
4. Pseudo code:
5. Matlab code:
% Metropolis(-Hastings) algorithm
% true (target) pdf is p(x) where we know it but can't sample data.
% proposal (sample) pdf is q(x*|x)=N(x,10) where we can sample.
%%
clc
clear;
X(1)=0;
N=5e3;
p = @(x) 0.3*exp(-0.2*x.^2) + 0.7*exp(-0.2*(x-10).^2);
C = 1; T_0 = 2;
T = @(x) 1.0/(C*log(x+T_0));
dx=0.5; xx=-10:dx:20; fp=p(xx); plot(xx,fp) % plot the true p(x)
%% MH algorithm
sig=(10);
for i=1:N-1
u=rand;
x=X(i);
xs=normrnd(x,sig); % new sample xs based on existing x from proposal pdf.
pxs=p(xs);
px=p(x);
qxs=normpdf(xs,x,sig);
qx=normpdf(x,xs,sig); % get p,q.
if u<min(1,pxs^(1/T(i))*qx/(px^(1/T(i))*qxs)) % case 1: pesudo code
% if u<min(1,pxs^(1/T(i))/(px)^(1/T(i))) % case 2: Metropolis algorithm
% if u<min(1,pxs^(1/T(i))/qxs/(px^(1/T(i))/qx)) % case 3: independent sampler
X(i+1)=xs;
else
X(i+1)=x;
end
end
% compare pdf of the simulation result with true pdf.
N0=1; close all;figure; %N/5;
nb=histc(X(N0+1:N),xx);
bar(xx+dx/2,nb/(N-N0)/dx); % plot samples.
A=sum(fp)*dx;
hold on; plot(xx,fp/A,'r') % compare.
% figure(2); plot(N0+1:N,X(N0+1:N)) % plot the traces of x.
% compare cdf with true cdf.
F1(1)=0;
F2(1)=0;
for i=2:length(xx)
F1(i)=F1(i-1)+nb(i)/(N-N0);
F2(i)=F2(i-1)+fp(i)*dx/A;
end
figure
plot(xx,[F1' F2'])
max(F1-F2) % this is the true possible measure of accuracy.
6. Results:
Result of case 1:
Result of case 2:
Result of case 3:
Reference:
[1]. An Introduction to MCMC for Machine Learning.
[2]. https://blog.csdn.net/Eric2016_Lv/article/details/79691392