应用规则学习算法识别有毒的蘑菇

小墨&晓末

已于 2024-02-20 13:15:16 修改

阅读量407

点赞数 5

文章标签： r语言算法机器学习

于 2024-02-19 11:19:11 首次发布

本文链接：https://blog.csdn.net/jd1813346972/article/details/136165510

版权

🍉CSDN小墨&晓末:https://blog.csdn.net/jd1813346972

个人介绍: 研一｜统计学｜干货分享
擅长Python、Matlab、R等主流编程软件
累计十余项国家级比赛奖项，参与研究经费10w、40w级横向

文章目录

1 目的
2 数据来源
3 案例演示
- 3.1 探索数据
- - 3.1.1 读取数据并查看数据类型
  - 3.1.2 查看蘑菇数据情况
3.2 模型的建立及优化

1 目的

应用规则学习算法识别有毒的蘑菇

2 数据来源

该演示数据来源于：机器学习和智能系统中心

3 案例演示

3.1 探索数据

3.1.1 读取数据并查看数据类型

运行代码：

data1<-read.csv("G:\\机器学习\\第三次作业\\mushrooms.csv",stringsAsFactors=T)
str(data1$veil_type)  #查看数据类型

结果展示：

##  Factor w/ 1 level "partial": 1 1 1 1 1 1 1 1 1 1 ...

通过运行结果我们可以看到数据集veil_type列数据类型为因子型。

3.1.2 查看蘑菇数据情况

运行代码：

data1$veil_type<-NULL #删除“veil_type”变量
table(data1$type)     #查看有毒蘑菇比例
prop.table(table(data1$type))

结果展示：

> table(data1$type)     #查看有毒蘑菇比例
##    edible poisonous 
##      4208      3916
> prop.table(table(data1$type))
##    edible poisonous 
## 0.5179714 0.4820286

根据运行结果显示，数据样本中大约有52%的蘑菇样本（4208个）是可食用的;有48%的蘑菇样本（3916个）是有毒的。

3.2 模型的建立及优化

3.2.1 基于数据训练模型

运行代码：

library("RWeka")#加载包
data1_1R<-OneR(type~.,data=data1)  #1R规则
data1_1R

结果展示：

## odor:
##  almond  -> edible
##  anise   -> edible
##  creosote    -> poisonous
##  fishy   -> poisonous
##  foul    -> poisonous
##  musty   -> poisonous
##  none    -> edible
##  pungent -> poisonous
##  spicy   -> poisonous
## (8004/8124 instances correct)

利用单规则算法可以发现该分类器中，odor（气味）被选为规则，该规则正确地预测了8124个蘑菇样本中的8004个样本的可食性。

3.2.2 评估模型性能

运行代码：

summary(data1_1R) #模型性能

运行展示：

## 
## === Summary ===
## 
## Correctly Classified Instances        8004               98.5229 %
## Incorrectly Classified Instances       120                1.4771 %
## Kappa statistic                          0.9704
## Mean absolute error                      0.0148
## Root mean squared error                  0.1215
## Relative absolute error                  2.958  %
## Root relative squared error             24.323  %
## Total Number of Instances             8124     
## 
## === Confusion Matrix ===
## 
##     a    b   <-- classified as
##  4208    0 |    a = edible
##   120 3796 |    b = poisonous

根据结果显示，样本蘑菇中有120个有毒蘑菇被错误的分类为可食用蘑菇；4208个可食用蘑菇被正确分类为可食用蘑菇；3796个有毒蘑菇被正确分类为有毒蘑菇，模型的准确率高达99%。

3.2.3 评估模型性能

使用RIPPER优化模型。
运行代码：

data1_JRip<-JRip(type~.,data=data1)
data1_JRip           #提高性能
summary(data1_JRip)  #模型性能

结果展示：

> data1_JRip           #提高性能
## JRIP rules:
## ===========
## 
## (odor = foul) => type=poisonous (2160.0/0.0)
## (gill_size = narrow) and (gill_color = buff) => type=poisonous (1152.0/0.0)
## (gill_size = narrow) and (odor = pungent) => type=poisonous (256.0/0.0)
## (odor = creosote) => type=poisonous (192.0/0.0)
## (spore_print_color = green) => type=poisonous (72.0/0.0)
## (stalk_surface_below_ring = scaly) and (stalk_surface_above_ring = silky) => type=poisonous (68.0/0.0)
## (habitat = leaves) and (cap_color = white) => type=poisonous (8.0/0.0)
## (stalk_color_above_ring = yellow) => type=poisonous (8.0/0.0)
##  => type=edible (4208.0/0.0)
## 
## Number of Rules : 9

> summary(data1_JRip)  #模型性能
## === Summary ===
## 
## Correctly Classified Instances        8124              100      %
## Incorrectly Classified Instances         0                0      %
## Kappa statistic                          1     
## Mean absolute error                      0     
## Root mean squared error                  0     
## Relative absolute error                  0      %
## Root relative squared error              0      %
## Total Number of Instances             8124     
## 
## === Confusion Matrix ===
## 
##     a    b   <-- classified as
##  4208    0 |    a = edible
##     0 3916 |    b = poisonous

利用RIPPER规则学习算法提高模型性能，此分类器共创建9条规则，模型预测准确率达100%。