演示如何使用偏最小二乘回归方法

http://www.cdpcenter.org/files/plsr/ 点击打开链接

PLSR Demonstration

This is an example of PLSR on simulated spectroscopic data to determine the concentration of red and blue dye in various solutions. The M-file may be downloaded here. The M-file implementing the NIPALS algorithm may be downloaded here (html version).

Contents

Constructing the data

Let's contruct spectra for 4 dyes (2 red, 1 blue, and 1 yellow) at 40 points (x, that presumabely correspond to wavelengths):

randn('state', 0);
numInputs = 40;
x = linspace(-8,12,numInputs);
red1Spectrum = 10*evpdf(-x,-2,2);
red2Spectrum = 9.8*evpdf(-x,-1.5,2);
blueSpectrum = 4*evpdf(-x,4,1.);
yellowSpectrum = 5*evpdf(-x,1,1.2);

Spectra plots

Combine the spectra into one array and plot them

dyeSpectra = [
    red1Spectrum
    red2Spectrum
    blueSpectrum
    yellowSpectrum];
set(gcf,'DefaultAxesColorOrder',[1 0 0; 0.5 0 0; 0 0 1; 1 1 0]);
plot(1:numInputs,dyeSpectra);
xlabel('Wavelength')
ylabel('Optical Density')
legend('red1','red2','blue','yellow')
title('Spectra for dyes')

Create stock solutions

Let's have 3 stock solutions:

  • Stock 1 consists of 67% red1, 27% red2, and 6% yellow
  • Stock 2 consists of 36% red1, 56% red2, and 8% yellow
  • Stock 3 consists of 95% blue and 5% yellow
stocks= [
    0.67 0.27 0.00 0.06
    0.36 0.56 0.00 0.08
    0.00 0.00 0.95 0.05];

Create sample solutions

Let's make samples of each stock at different concentrations, and a few mixtures:

  • Dilutions of stock 1 (samples1)
  • Dilutions of stock 2 (samples2)
  • Dilutions of stock 3 (samples3)
  • Mixtures of stocks 1 and 3 (samples13)
  • Mixtures of stocks 2 and 3 (samples23)
samples1 = [0.01 0.02 0.04 0.08]'*stocks(1,:);
samples2 = [0.03 0.06 0.09 0.12]'*stocks(2,:);
samples3 = [0.01 0.02 0.04 0.08 0.16]'*stocks(3,:);
samples13 = ([0.03 0.05 0.08]'*stocks(1,:) + [0.07 0.05 0.02]'*stocks(3,:));
samples23 = ([0.02 0.03 0.04]'*stocks(2,:) + [0.03 0.02 0.01]'*stocks(3,:));
samples = [samples1; samples2; samples3; samples13; samples23];

Create sample spectra

sampleSpectra = samples*dyeSpectra;
sampleSpectra = max(1e-6, sampleSpectra + 0.002*randn(size(sampleSpectra)) );

plot(sampleSpectra(end-2,:),'k-')
xlabel('Wavelength')
ylabel('Optical Density')
title('Spectrum for a mixture of red + blue')

Construct the raw inputs (X) and outputs (Y).

The inputs (Xraw) are just the sample spectra.

The two outputs (Yraw) are

  • the total concentration of red dyes, and
  • the concentratinon blue dye.
Xraw = sampleSpectra;
Yraw = [samples(:,1)+samples(:,2), samples(:,3)];

Write the outputs to a text file for analysis in Simca-P

csvwrite('spec.txt',[Yraw Xraw])

Apply PLSR to the data

First mean-center the data. However, we should not variance scale the data because all the are all in the same units, and larger data points really should be weighted more heavily (because they are more informative).

Then, let's extract three principal components

numSamples = size(Xraw,1);
numOutputs = size(Yraw,2);
Xmean = mean(Xraw,1);
X = Xraw - repmat(Xmean, [numSamples 1]);
% Xmean = mean(Xraw,2);
% X = Xraw - repmat(Xmean, [1 numInputs]);
Ymean = mean(Yraw,1);
Y = Yraw - repmat(Ymean, [numSamples 1]);
% Ymean = mean(Yraw,2);
% Y = Yraw - repmat(Ymean, [1 numOutputs]);
[P,Q,W,B] = nipals(X,Y,3);
T = X*W;
U = Y*Q;
Ypred = X*W*B*Q';
YrawPred = Ypred + repmat(Ymean, [numSamples 1]);
% YrawPred = Ypred + repmat(Ymean, [1 numOutputs]);

Analysis of predictive ability

plot(Yraw(:,1),Yraw(:,2),'kx',YrawPred(:,1),YrawPred(:,2),'bo')
xlabel('Red Conc.')
ylabel('Blue Conc.')
legend('Expt.', 'Pred.')
title('Comparison of Predictive Ability')

Analysis of the Y loadings

PC1 is has a positive "red" weight and a negative "blue" weight. PC2 is mostly "blue", but some "red."

clf
colormap([1 0 0; 0 0 1]);
bar(Q')
set(gca,'XTickLabel',{'PC1','PC2','PC3'})
legend('Red','Blue');
title('Y Loadings')

Analysis of the X loadings

The X loadings are linear combinations of "red" and "blue" spectra. As indicated by the Y loadings, PC1 is "red" minus "blue", and PC2 is "blue" plus a little "red". PC3 looks like a lot of noise. As we'll later see using Simca-P, PC3 is not statistically significant.

clf
plot(1:numInputs,P);
legend('PC1','PC2','PC3')
xlabel('Arbitrary Wavelength')
title('X Loadings');

Analysis of the scores

s = {samples1, samples2, samples3, samples13, samples23};
legendText = {'1','2','3','1+2','2+3'};
symbols = 'xo+s^';
colors = [1 0 0; 0.5 0 0; 0 0 1; 0.7 0 0.7; 0.3 0 0.3];
clf
hold on
idx1 = 1;
for i=1:numel(s)
    idx2 = idx1 + size(s{i},1) - 1;
    Ti = T(idx1:idx2,:);
    plot(Ti(:,1),Ti(:,2),symbols(i),'MarkerFaceColor',colors(i,:),'MarkerEdgeColor',colors(i,:));
    idx1 = idx2+1;
end
xlabel('PC1');
ylabel('PC2');
title('X Scores')
legend(legendText{:});
hold off

Training the PLSR model mostly with blue solutions

At first, I was surprised to find that each principal component doesn't correspond to a single color. Then I realized it has a lot to do with the data you use to train the model. The first principal component is the one that explains the most variance in your data. So if most of your samples are just "blue", then PC1 will be a blue spectrum. If most of your samples are mixtures, then PC1 will be a mixture of the spectra.

Let's confirm this by adding a bunch more blue solutions to our samples.

n = 100;
blueConc = 0.5*randn(n,1);
samplesB = zeros(n,4);
samplesB(:,3) = blueConc;
samplesB = [samples; samplesB];
YrawB = [samplesB(:,1)+samplesB(:,2), samplesB(:,3)];
sampleSpectraB = blueConc*blueSpectrum;
sampleSpectraB = sampleSpectraB + max(1e-6,0.002*randn(size(sampleSpectraB)));
sampleSpectraB = [sampleSpectra; sampleSpectraB];
XrawB = sampleSpectraB;

csvwrite('specB.txt',[YrawB XrawB])

numSamplesB = size(XrawB,1);
numOutputsB = size(YrawB,2);
XmeanB = mean(XrawB,1);
XB = XrawB - repmat(XmeanB, [numSamplesB 1]);
YmeanB = mean(YrawB,1);
YB = YrawB - repmat(YmeanB, [numSamplesB 1]);
[PB,QB,WB,BB] = nipals(XB,YB,3);
clf
colormap([1 0 0; 0 0 1]);
bar(QB')
set(gca,'XTickLabel',{'PC1','PC2','PC3'})
legend('Red','Blue');
title('Y Loadings')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值