LDA_lda类中样本数据怎么输入-CSDN博客

本文链接：https://blog.csdn.net/qq_44675529/article/details/115799883

鸢尾花分类例子

library(MASS)

#（1）输入样本数据
nx = nrow(iris) 
irisdata = iris[1:nx, 1:4]
irisgrp = iris[1:nx, 5] #输入分类标签

#（2）build lda
(lda.sol = lda(irisdata, irisgrp))  #外层打括号会输出结果
lda.sol = lda(irisdata, irisgrp)   
lda.sol 

#（3）下面把iris的数据回代进行预测
result = predict(lda.sol, irisdata)
table(irisgrp, result$class)    # result$class是预测结果
yhat = result$class
table(irisgrp, yhat)  

#（4）下面探究各种预测结果是如何通过lda输出的四项结果得到的
result$x
P = lda.sol$scaling  #降维矩阵
means= lda.sol$means %*% P  # 将均值向量降维
total_means = as.vector(lda.sol$prior %*% means)  # 加权平均的出总的降维均值向量，权重就是lda.sol$prior
n_samples = nrow(irisdata)
x <- as.matrix(irisdata) %*% P - (rep(1, n_samples) %o% total_means) # 把样本降维并平移 rep将vector x的值循环n遍
#result$x x 二者一样
#预测后得到的两个维度数据result$x是降维后的样本减去降维后的总体均值得到的

#（5）画图  将二维的x进行可视化
plot(x, cex=1.2,   #字体放大为默认值的1.2倍，默认值是1
	 pch=rep(22:24, c(50,50,50)),  #设置点的形状
	 col=rep(2:4, c(50,50,50)),  #设置点的颜色
	 bg=rep(2:4, c(50,50,50)))  #设置图片的背景色


library(ggplot2)
library(ggpubr)
df = data.frame(x1=x[, 1], x2=x[,2])

ggplot(df, aes(x1, x2, color=result$class))+geom_point()

ggplot(df, aes(x1, x2, color=irisgrp, shape=result$class))+geom_point()

结果
在这里插入图片描述

Prior probabilities of groups:是各分类数据在总体中占得比例，是一个概率向量，用lda.sol$prior调用
Group means:是每个分类的均值向量，用lda.sol$means调用
Coefficients of linear discriminants:是降维矩阵，用lda.sol$scaling调用。这个矩阵的列空间是输入空间,行空间是输出空间,左乘一个行向量以将其降维
Proportion of trace:降维后各分量的权重。在本例中，将4维向量降为2维,LD1占绝大比重。

把iris的数据回代进行测试
result = predict(lda.sol, irisdata)
table(irisgrp, result $KaTeX parse error: Expected 'EOF', got '#' at position 11: class) #̲ result$ class是预测结果(行，列) 在这里插入图片描述
(详细看转载）
添加链接描述
只用看到降维，降维后的阈值确认基于贝叶斯方法
（案例与数据分析中素材）