偏最小二乘法(Partial Least Squares)最初不是为分类和辨别问题而设计的,但经常被用于这一目的。响应矩阵Y是定性的,在内部被重新编码为一个虚拟块矩阵,记录每个观察变量的成员资格,即每个响应类别都通过一个指标变量进行编码。然后,PLS回归(现在的PLS-DA)就像Y是一个连续矩阵一样运行,并且在线性判别分析面临碰撞问题的大型数据集上运行良好。
PLS-判别分析(PLS-DA)是一个线性分类模型,能够预测新样本的类别。
稀疏PLS-DA(sPLS-DA)能够选择数据中最具预测性或判别性的特征,帮助对样本进行分类。
方法一:mixOmics包中的plsda函数
install.packages("mixOmics")
library(mixOmics)
plsda函数,该函数无法得到R2及Q2。
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- breast.tumors$sample$treatment
plsda.breast <- plsda(X, Y, ncomp = 2)
plotIndiv(plsda.breast, ind.names = TRUE, ellipse = TRUE, legend = TRUE)
得到VIP值
library(RVAideMemoire)
PLSDA.VIP(plsda.breast, graph = TRUE)
方法二: DiscriMiner包中的plsDA函数
执行部分最小二乘法(PLS)判别分析,可选择包括随机离开-k折叠交叉验证。
## Not run:
# load iris dataset
data(iris)
# PLS discriminant analysis specifying number of components = 2
my_pls1 = plsDA(iris[,1:4], iris$Species, autosel=FALSE, comps=2)
my_pls1$confusion
my_pls1$error_rate
# plot circle of correlations
plot(my_pls1)
# PLS discriminant analysis with automatic selection of components
my_pls2 = plsDA(iris[,1:4], iris$Species, autosel=TRUE)
my_pls2$confusion
my_pls2$error_rate
# linear discriminant analysis with learn-test validation
learning = c(1:40, 51:90, 101:140)
testing = c(41:50, 91:100, 141:150)
my_pls3 = plsDA(iris[,1:4], iris$Species, validation="learntest",
learn=learning, test=testing)
my_pls3$confusion
my_pls3$error_rate
my_pls3$R2
R2X R2Xcum R2Y R2Ycum
t1 0.7332792 0.7332792 0.47105647 0.4710565
t2 0.2223789 0.9556582 0.07414135 0.5451978
my_pls3$Q2
Q2.setosa Q2.versicolor Q2.virginica Q2.global
t1 0.86919713 0.046061518 0.48123542 0.46549802
t2 0.18666695 0.143015160 0.08398902 0.12743855
t3 0.06393944 0.003635849 0.09319238 0.03866148
t4 0.01852317 -0.015969033 -0.02068680 -0.01891103
my_pls3$VIP
Component 1 Component 2 Model VIP
Sepal.Length 0.9271868 0.8856696 0.8856696
Sepal.Width 0.6638666 0.9222639 0.9222639
Petal.Length 1.1682002 1.0874663 1.0874663
Petal.Width 1.1553848 1.0873986 1.0873986
plot(my_pls3)
ref: