我发现很难将理论与实施联系起来.我很感激帮助知道我的理解错误.
符号 – 粗体大写的矩阵和粗体字母小写字母的向量
是
观测的数据集,每个
变量.因此,给定这些观察到的
维数据向量,
维主轴为
,
为
,其中
为目标维度.
观测数据矩阵的主成分将是
,其中矩阵
,矩阵
和矩阵
.
的列形成
特征的正交基础,输出
是最小化平方重建误差的主要组件投影:
给出了
的最佳重建.
数据模型是
X(i,j) = A(i,:)*S(:,j) + noise
其中PCA应在X上完成以获得输出S. S必须等于Y.
问题1:简化数据Y不等于模型中使用的S.我的理解在哪里错了?
问题2:如何重建以使错误最小化?
请帮忙.谢谢.
clear all
clc
n1 = 5; %d dimension
n2 = 500; % number of examples
ncomp = 2; % target reduced dimension
%Generating data according to the model
% X(i,j) = A(i,:)*S(:,j) + noise
Ar = orth(randn(n1,ncomp))*diag(ncomp:-1:1);
T = 1:n2;
%generating synthetic data from a dynamical model
S = [ exp(-T/150).*cos( 2*pi*T/50 )
exp(-T/150).*sin( 2*pi*T/50 ) ];
% Normalizing to zero mean and unit variance
S = ( S - repmat( mean(S,2), 1, n2 ) );
S = S ./ repmat( sqrt( mean( Sr.^2, 2 ) ), 1, n2 );
Xr = Ar * S;
Xrnoise = Xr + 0.2 * randn(n1,n2);
h1 = tsplot(S);
X = Xrnoise;
XX = X';
[pc, ~] = eigs(cov(XX), ncomp);
Y = XX*pc;
更新[8月10日]
根据答案,这里是完整的代码
clear all
clc
n1 = 5; %d dimension
n2 = 500; % number of examples
ncomp = 2; % target reduced dimension
%Generating data according to the model
% X(i,j) = A(i,:)*S(:,j) + noise
Ar = orth(randn(n1,ncomp))*diag(ncomp:-1:1);
T = 1:n2;
%generating synthetic data from a dynamical model
S = [ exp(-T/150).*cos( 2*pi*T/50 )
exp(-T/150).*sin( 2*pi*T/50 ) ];
% Normalizing to zero mean and unit variance
S = ( S - repmat( mean(S,2), 1, n2 ) );
S = S ./ repmat( sqrt( mean( S.^2, 2 ) ), 1, n2 );
Xr = Ar * S;
Xrnoise = Xr + 0.2 * randn(n1,n2);
X = Xrnoise;
XX = X';
[pc, ~] = eigs(cov(XX), ncomp);
Y = XX*pc; %Y are the principal components of X'
%what you call pc is misleading, these are not the principal components
%These Y columns are orthogonal, and should span the same space
%as S approximatively indeed (not exactly, since you introduced noise).
%If you want to reconstruct
%the original data can be retrieved by projecting
%the principal components back on the original space like this:
Xrnoise_reconstructed = Y*pc';
%Then, you still need to project it through
%to the S space, if you want to reconstruct S
S_reconstruct = Ar'*Xrnoise_reconstructed';
plot(1:length(S_reconstruct),S_reconstruct,'r')
hold on
plot(1:length(S),S)
该图是,与答案中显示的图非常不同.只有S的一个组件与S_reconstructed的组件完全匹配.不应该重建源输入S的整个原始二维空间吗?
即使我切断了噪音,那么S的一个组成部分也是精确重建的.