R判别分析

qq_57518649

已于 2023-05-04 20:12:42 修改

阅读量871

点赞数 3

文章标签： r语言机器学习 python

于 2023-05-04 16:02:38 首次发布

本文链接：https://blog.csdn.net/qq_57518649/article/details/130475902

版权

文章介绍了三种常用的判别分析方法：距离判别法，通过马氏距离进行样本分类；贝叶斯判别法，展示了其在iris数据集上的应用；以及Fisher判别法，同样使用iris数据集进行演示，结果显示Fisher判别法能达到100%的分类准确率。

摘要由CSDN通过智能技术生成

常用的判别分析方法是距离判别、贝叶斯判别和Fisher判别等。

# 1.距离判别法

样本X离哪个总体距离最近，就判定X属于哪个总体。常见的距离判别方法有：马氏距离和欧式距离等。欧式距离需要考虑量纲问题，而马氏距离与量纲无关且排除变量之间的相关性的干扰。

马氏距离的公式为：

$d^{2}(X,G)=(X-\mu )^{T}\sum ^{-1}(X-\mu )$

1.1 实例：基于距离判别的iris数据集分类

首先利用随机种子生成100个数，再将iris分为训练集和测试集，分别包含100个样本。

set.seed(1234)
sa<-sample(1:150,100)
sa
dtrain<-iris[sa,]   
head(dtrain)
dtest<-iris[-sa,1:4]   # 测试集：剩余的50个样本
head(dtest)

> d1<-subset(dtrain,Species=="setosa");dim(d1)
[1] 32  5
> d2<-subset(dtrain,Species=="versicolor");dim(d2)
[1] 32  5
> d3<-subset(dtrain,Species=="virginica");dim(d3)
[1] 36  5

利用mahalanobis()计算马氏距离：

ma1<-mahalanobis(dtest,colMeans(d1[,1:4]),cov(d1[,1:4]))  # 第一类的均值与协方差矩阵
ma2<-mahalanobis(dtest,colMeans(d2[,1:4]),cov(d2[,1:4]))
ma3<-mahalanobis(dtest,colMeans(d3[,1:4]),cov(d3[,1:4]))
distance<-cbind(ma1,ma2,ma3,iris[-sa,5])
head(distance)
----------结果-------------
以上均为(50,)的一维数组。
       ma1       ma2      ma3  类别
1   1.232350 114.25341 172.8199 1
7   5.021288  99.33383 144.6328 1
11  3.516306 125.52716 185.0110 1

结果中，三个ma值最小的对应的类即为样本所属的类。第一条记录中ma1最小，所以应该归为第一类，即setosa，与原始分类一致，说明分类正确。下面处理将分类效率提升：

library(WMDB)
dta<-iris[,1:4]
species<-gl(3,50)
wmd(dta,species)
------结果------
[1] "num of wrong judgement"
[1]  69  73  78  84 107 130 134 135
[1] "samples divided to"
[1] 3 3 3 3 2 2 2 2
[1] "samples actually belongs to"
[1] 2 2 2 2 3 3 3 3
Levels: 1 2 3
[1] "percent of right judgement"
[1] 0.9466667

结果中，num of wrong judgement表示判别错误的样本编号，有8个样本是错判的，最后输出的正确率0.9466667。在wmd中加入权重项可以让判定正确率达到更高的0.98：

wmd(dta,species,diag(rep(0.25,4)))
---------结果----------
[1] "num of wrong judgement"
[1] 71 73 84
[1] "samples divided to"
[1] 3 3 3
[1] "samples actually belongs to"
[1] 2 2 2
Levels: 1 2 3
[1] "percent of right judgement"
[1] 0.98

2.Bayes判别法

分为标准的贝叶斯判别法和考虑错判损失的贝叶斯判别法。

实例：基于iris数据集

library(WMDB)
dta<-iris[,1:4]
species<-gl(3,50)
dbayes(dta,species)
------结果-------
[1] "num of wrong judgement"
[1] 69 71 73 78 84
[1] "samples divided to"
[1] 3 3 3 3 3
[1] "samples actually belongs to"
[1] 2 2 2 2 2
Levels: 1 2 3
[1] "percent of right judgement"
[1] 0.9666667

# 3.Fisher判别法

实例：基于iris数据集

library(MASS)
data(iris)
diris<-data.frame(rbind(iris3[,,1],iris3[,,2],iris3[,,3]),species=rep(c(1,2,3),rep(50,3)))
head(diris)
----------结果---------
     Sepal.L. Sepal.W. Petal.L. Petal.W. species
1      5.1      3.5      1.4      0.2       1
2      4.9      3.0      1.4      0.2       1
3      4.7      3.2      1.3      0.2       1

下面提取训练集和测试集，函数lda()要求训练集样本量与测试集样本量相等，故此处均为75。

sa<-sample(1:150,75)
sa
table(diris$species[sa])
--------结果--------
 1  2  3 
28 22 25

Fisher判别分析：

z<-lda(species~.,diris,prior=c(1,1,1)/3,subset=sa)
z
-------结果-------
Prior probabilities of groups:先验概率
        1         2         3 
0.3333333 0.3333333 0.3333333 
Group means:每一类样本均值
  Sepal.L. Sepal.W. Petal.L.  Petal.W.
1 5.053571 3.478571    1.475 0.2321429
2 5.845455 2.781818    4.250 1.3363636
3 6.452000 2.876000    5.516 1.9480000
Coefficients of linear discriminants:线性判别系数
                LD1        LD2
Sepal.L.  0.7514301  0.3194847
Sepal.W.  1.7350213  2.4273666
Petal.L. -1.9374791 -0.3495995
Petal.W. -2.7515642  1.5539094
Proportion of trace:比例值
   LD1    LD2 
0.9945 0.0055

下面用函数class()将判别结果展示出来，该函数的输出结果为一个列表，其中包括class、posterior和x三个子列表，分别表示分类结果、后验概率。

pre<-predict(z,diris[-sa,])
pre$class
-------结果-------
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
[29] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
[57] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3

计算Fisher判别结果的准确率：

class<-pre$class
diris$species[-sa]
sum(class==diris$species[-sa])
---------结果----------
75

故准确率为100%。

qq_57518649

关注

3
点赞
踩
13

收藏

觉得还不错? 一键收藏
1
评论
R判别分析

常用的判别分析方法是距离判别、贝叶斯判别和Fisher判别等。
复制链接

扫一扫