Task2-PPCA控制自由度实验
(该实验来自于Michael E. Tipping and Christopher M. Bishop的PPCA论文中的4.3节实验,是自己在完成老师任务时经过查阅资料自行推导并整理的笔记,在变量的字母符号设置上前后有所差异,但不影响阅读,如有错误,请联系我,此笔记仅用于学习,如需转载,请注明来源,谢谢!)
1 Isotropic Covariance Model
Isotropic Covariance Model 是概率PCA模型的一种特殊例子(即潜变量的维度
q
=
0
q=0
q=0)。
由于潜变量维度为0,故不存在潜变量空间,也就不存在潜变量权重矩阵W,因此模型简化为:
t
=
μ
+
ϵ
,
ϵ
∼
N
(
0
,
σ
2
I
d
)
t = \mu + \epsilon,\epsilon\sim N(\textbf{0},\sigma^{2}\textbf{I}_d)
t=μ+ϵ,ϵ∼N(0,σ2Id)
故协方差矩阵
C
=
σ
2
I
C=\sigma^2I
C=σ2I,协方差矩阵仅包含一个自由参数。
由任务1推导可知,观测数据的对数似然函数为:
L
=
l
o
g
(
l
)
=
l
o
g
(
∏
n
=
1
N
p
(
t
n
∣
C
)
)
=
−
N
2
{
d
log
(
2
π
)
+
l
o
g
∣
C
∣
+
1
N
∑
n
=
1
N
(
t
n
−
μ
)
′
C
−
1
(
t
n
−
μ
)
}
=
−
N
2
{
d
log
(
2
π
)
+
l
o
g
∣
C
∣
+
t
r
(
C
−
1
S
)
}
(1)
\begin{aligned} L&=log(l)=log(\prod_{n=1}^Np(t_n|C))\\ &=-\frac{N}{2} \{d\log(2\pi)+log|C|+\frac{1}{N}\sum_{n=1}^{N}(t_n-\mu)'C^{-1}(t_n-\mu)\} \\ &=-\frac{N}{2} \{d\log(2\pi)+log|C|+tr(C^{-1}S)\} \\ \end{aligned} \tag{1}
L=log(l)=log(n=1∏Np(tn∣C))=−2N{dlog(2π)+log∣C∣+N1n=1∑N(tn−μ)′C−1(tn−μ)}=−2N{dlog(2π)+log∣C∣+tr(C−1S)}(1)
令
σ
2
=
τ
\sigma^{2}=\tau
σ2=τ,则对(1)式关于
τ
\tau
τ微分可得:
d
L
τ
=
d
log
(
∣
C
∣
)
+
d
t
r
(
C
−
1
S
)
=
t
r
(
C
−
1
d
τ
)
−
t
r
(
C
−
1
d
τ
C
−
1
S
)
=
t
r
(
C
−
1
−
C
−
2
S
)
d
τ
(
因为
d
τ
是个标量,可以提取出来
)
(2)
\begin{aligned} dL_{\tau}&=d\log(|C|) +dtr(C^{-1}S) \\ &=tr(C^{-1}d\tau)-tr(C^{-1}d\tau C^{-1}S) \\ &=tr(C^{-1}-C^{-2}S)d\tau(因为d\tau是个标量,可以提取出来) \end{aligned} \tag{2}
dLτ=dlog(∣C∣)+dtr(C−1S)=tr(C−1dτ)−tr(C−1dτC−1S)=tr(C−1−C−2S)dτ(因为dτ是个标量,可以提取出来)(2)
d
L
τ
d
τ
=
t
r
(
C
−
1
−
C
−
2
S
)
=
0
(3)
\frac{dL_{\tau}}{d\tau}=tr(C^{-1}-C^{-2}S)=0 \tag{3}
dτdLτ=tr(C−1−C−2S)=0(3)
C
−
1
=
τ
−
1
I
d
C^{-1}=\tau^{-1} I_d
C−1=τ−1Id
根据(3)式,可得:
t
r
(
τ
−
1
I
d
−
τ
−
2
S
)
=
0
tr(\tau^{-1} I_d - \tau^{-2} S) = 0
tr(τ−1Id−τ−2S)=0
τ
d
=
t
r
(
S
)
\tau d = tr(S)
τd=tr(S)
τ
=
t
r
(
S
)
d
=
Σ
i
=
1
d
λ
i
d
(4)
\tau = \frac{tr(S)}{d}=\frac{\Sigma_{i =1}^{d}\lambda_i}{d} \tag{4}
τ=dtr(S)=dΣi=1dλi(4)
其中,
λ
\lambda
λ为S的特征值。
2 Diagonal Covariance Model
Diagonal Covariance Model 是概率PCA模型的一种特殊例子(即潜变量的维度
q
=
0
q=0
q=0).由于潜变量维度为0,故不存在潜变量空间,也就不存在潜变量权重矩阵W,因此模型简化为:
t
=
μ
+
ϵ
,
ϵ
∼
N
(
0
,
Σ
)
t = \mu + \epsilon,\epsilon\sim N(\textbf{0},\Sigma)
t=μ+ϵ,ϵ∼N(0,Σ)
故协方差矩阵
C
=
Σ
=
d
i
a
g
(
τ
1
,
.
.
.
,
τ
d
)
C=\Sigma=diag(\tau_1,...,\tau_d)
C=Σ=diag(τ1,...,τd),为对角矩阵,当维度为
d
d
d时,协方差矩阵包含
d
d
d个自由参数。
为便于计算,令
C
=
τ
1
E
11
+
.
.
.
+
τ
d
E
d
d
C=\tau_1 E_{11} + ... + \tau_d E_{dd}
C=τ1E11+...+τdEdd,其中,
E
i
i
E_{ii}
Eii表示第
i
i
i行和第
i
i
i列为1,其他均为0元素的矩阵,那么
C
−
1
=
τ
1
−
1
E
11
+
.
.
.
+
τ
d
−
1
E
d
d
C^{-1}=\tau_1^{-1} E_{11} + ... + \tau_d^{-1} E_{dd}
C−1=τ1−1E11+...+τd−1Edd,因此,将对数似然函数(1)关于
τ
i
\tau_i
τi求微分可得:
d L τ i = d log ( ∣ C ∣ ) + d t r ( C − 1 S ) = t r ( C − 1 E i i d τ i ) − t r ( C − 1 E i i d τ i C − 1 S ) ( E i i E j j = 0 ,当 i ≠ j ) = t r ( τ i − 1 E i i − τ i − 2 E i i S ) d τ i (5) \begin{aligned} dL_{\tau_i}&=d\log(|C|) +dtr(C^{-1}S) \\ &=tr(C^{-1} E_{ii} d\tau_i)-tr(C^{-1}E_{ii} d\tau_i C^{-1} S) (E_{ii}E_{jj} =0,当i \neq j)\\ &=tr(\tau_{i}^{-1} E_{ii} - \tau_{i}^{-2} E_{ii} S)d\tau_i \end{aligned} \tag{5} dLτi=dlog(∣C∣)+dtr(C−1S)=tr(C−1Eiidτi)−tr(C−1EiidτiC−1S)(EiiEjj=0,当i=j)=tr(τi−1Eii−τi−2EiiS)dτi(5)
t
r
(
τ
i
−
1
E
i
i
−
τ
i
−
2
E
i
i
S
)
=
0
tr(\tau_{i}^{-1} E_{ii} - \tau_{i}^{-2} E_{ii} S) = 0
tr(τi−1Eii−τi−2EiiS)=0
τ
i
−
1
−
τ
i
−
2
t
r
(
E
i
i
S
)
=
0
\tau_{i}^{-1} - \tau_{i}^{-2} tr(E_{ii} S) = 0
τi−1−τi−2tr(EiiS)=0
τ
i
=
t
r
(
E
i
i
S
)
\tau_{i} = tr(E_{ii} S)
τi=tr(EiiS)
C
=
d
i
a
g
(
S
)
C = diag(S)
C=diag(S)
也就是说,当C为对角矩阵时,C的极大似然估计就是样本协方差阵S的对角线元素形成的对角矩阵。
3 概率PCA
概率PCA的证明在任务1已经有详细说明,在本实验中,我们仅分别考虑q=1、2和3的情况。
4 Full Covariance Model
Full Covariance Model 是概率PCA模型的一种特殊例子(即潜变量的维度 q = d − 1 q=d-1 q=d−1)。但是协方差矩阵 C = W W T + σ 2 I C=WW^T+\sigma^2I C=WWT+σ2I。具体的公式推导如第3节概率PCA,仅仅让q=d-1即可。
5 实验
5.1 Isotropic model 实验(matlab代码)
Isotropic.m文件输出噪声方差矩阵 C C C:
function [ C] = Isotropic( data ,q)
% Isotropic
% q the dimension of latent space,Useless parameter in this case
% C noise variance matrix
[N,d] = size(data);
mu = mean(data,1);
T = data-repmat(mu,N,1); %Tn-mu N x d matrix
S = T'*T/N; % sample covirance matrix
sigma = trace(S)/d;
C = sigma*eye(d);
L = -N*(d*log(2*pi) + log(det(C)) + trace(C\S))/2;
end
5.2 Diagonal model 实验(matlab代码)
Diagonal.m文件输出噪声方差矩阵 C C C:
function [ C ] = Diagonal( data ,q)
% Diagonal
% C noise variance
[N,d] = size(data);
mu = mean(data,1);
T = data-repmat(mu,N,1); %Tn-mu N x d matrix
S = T'*T/N; % sample covirance matrix
C = diag(diag(S));
L = -N*(d*log(2*pi) + log(det(C)) + trace(C\S))/2;
end
5.3 概率PCA实验(matlab代码)
主要采用任务1的代码,不过在此仅输出一个参数 C C C:
function [ C] = myppca(data,q)
%myppca EM algorithm for ppca
% data
% q factor numbers
% mu the estimated mean vector
% w weighted matrix
% sigma noise variance
% L likelihood
[N,d] = size(data);
mu = mean(data,1);
% q = 2
T = data-repmat(mu,N,1); %Tn-mu N x d matrix
Iq = eye(q);% identity matrix q x q
Id = eye(d);
w = randn(d,q); % Initial value for w, d x q normal distribution random numbers
sigma = 1/randg; % Initial value for sigma
S = T'*T/N; % sample covirance matrix
niter = 500;
L = zeros(1,niter);
error = 1e-4;
for i = 2:niter
M = w'*w+sigma*Iq;
C = w*w'+ sigma*Id;
detC = det(C);
invC = inv(C);
invM = inv(M);
L(i) = -N*(d*log(2*pi)+log(detC)+trace(invC*S))/2;
if abs(L(i)-L(i-1))<= error*abs(L(i-1));
break;
end
% E step
Ex = invM*w'*T'; %latent space mean of x
Exx = N*sigma*invM + Ex*Ex';
% M step
w = T'*Ex'*inv(Exx);
sigma = (dot(T(:),T(:))-2*trace(Ex'*w'*T')+trace(Exx*w'*w))/(N*d);
end
L = L(2:i);
end
5.4 Full model 实验(matlab代码)
采用5.3节的myppca.m文件代码,令q=17即可。
5.5 自定义ppcaBootstrapTotal函数,执行bootstrap过程
Bootstrap流程如下:
- Step1: 从数据中随机抽取N个样本点(N为原始数据的个数,由于是可放回抽样,故会产生重复样本),新产生的N个样本点作为训练数据集trainData;原始数据中没有抽到的样本点作为testData(大约占原始数据的36.8%)。
- Step2: 用trainData训练数据,得到估计量,并将得到的估计量带入testData计算负对数似然(Prediction Error)。
- Step3: 重复Step1和Step2共m次,共产生m个预测误差;
- Step4: 计算m个预测误差的平均值。
function [ppcaNLerror,ppcaNL] = ppcaBootstrapTotal(data,niter,q,f)
%ppcaBootstrap
% input:
% data: N x d data matrix;
% q: the dimension of latent space;
% p: the degrees of freedom;
% f: a function:Isotropic or Diagonal or myppca
% output:
% ppcaNLerror: the negative log-likelihood
[N,d] = size(data);
ppcaNL = zeros(1,niter);
seed = 1:niter; %generate the seed
for i = 1:niter
rng(i); % set the seed
index = unidrnd(N,N,1); %generate discrete uniform random numbers
uniqueIndex = unique(index); % only retain the different numbers
testIndex = setdiff([1:N]', uniqueIndex); %the index of testData
trainData = data(index,:);
testData = data(testIndex,:);%; %
[C] = f(trainData,q); % output the C matrix of trainData
n = size(testData,1); % the sample number of testData
trainMu = mean(trainData,1);
testT = testData-repmat(trainMu,n,1); %Tn-mu n x d matrix
testS = testT'*testT/n; % covirance matrix of testData
testL = -n*( log(det(C)) + trace(C\testS))/2; % d*log(2*pi) +
ppcaNL(i) = -testL/n;
end
ppcaNLerror = mean(ppcaNL);
end
对virus3.txt数据分别针对上述四种模型执行bootstrap过程,产生500个bootstrap样本:
data = load('virus3.txt');
f1 = @Isotropic;
[ppcaNLerror] = ppcaBootstrapTotal(data,500,0,f1);
f2 = @Diagonal;
[ppcaNLerror] = ppcaBootstrapTotal(data,500,0,f2);
f3 = @myppca;
[ppcaNLerror] = ppcaBootstrapTotal(data,500,1,f3);
[ppcaNLerror] = ppcaBootstrapTotal(data,500,2,f3);
[ppcaNLerror] = ppcaBootstrapTotal(data,500,3,f3);
[ppcaNLerror] = ppcaBootstrapTotal(data,500,17,f3);
所得结果(去掉常数项)如下:
Model | Isotropic | Diagonal | PPCA(p=1) | PPCA(p=2) | PPCA(p=3) | FULL |
---|---|---|---|---|---|---|
Numbers of Paras | 1 | 18 | 19 | 36 | 52 | 171 |
Prediction Error | 23.65 | 18.49 | 22.92 | 19.28 | 20.14 | 180.91 |
原始论文的结果:
Model | Isotropic | Diagonal | PPCA(q=1) | PPCA(q=2) | PPCA(q=3) | FULL |
---|---|---|---|---|---|---|
Numbers of Paras | 1 | 18 | 19 | 36 | 52 | 171 |
Prediction Error | 18.6 | 19.6 | 16.8 | 14.8 | 15.6 | 3193.5 |
不知为何,尝试了很多遍依然得不到论文的结果,但针对PPCA的三个模型来说,我们发现:不管是原论文的结果还是我自己的代码,都显示当q=2(两个潜变量)预测误差最小。
6 待解决问题
- 不知为何得不到论文的结果。