机器学习之监督学习：分类

最新推荐文章于 2024-07-14 13:30:06 发布

hahahahhavvincy

最新推荐文章于 2024-07-14 13:30:06 发布

阅读量216

点赞数

分类专栏：机器学习分类文章标签：机器学习 KNN 决策树朴素贝叶斯

本文链接：https://blog.csdn.net/weixin_44271575/article/details/98057478

版权

机器学习分类专栏收录该内容

2 篇文章 0 订阅

订阅专栏

KNN K最近邻

：通过计算待分类数据点A与已有数据集中所有数据点的距离，找出K个距离最近的点，将A划分到出现次数最多的类别中。
：一般K选取较小的值，并用交叉验证的方法选取最优K值。

决策树

：通过顺序询问分类点的属性决定最终的类别。

朴素贝叶斯Nbayes

：假设各个特征之间相互独立，通过对象的先验概率，计算出对象的后验概率，将对象划分到后验概率最大的类。

实例：人体运动状态信息评级与对比

：特征文件格式一览 在这里插入图片描述
：标签文件格式一览

import pandas as pd
import numpy as np  
//导入预处理模块 
from sklearn.preprocessing import Imputer
//导入自动生成训练集和测试集的模块
from sklearn.cross_validation import train_test_split 
//导入预测结果评估模块
from sklearn.metrics import classification_report

//导入三个分类器   
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
//读取特征文件、标签文件列表中的内容；归并后返回
def load_datasets(feature_paths, label_paths):
    //定义feature数组变量，列数量与特征维度一致为41
    feature = np.ndarray(shape=(0,41))
    //定义label数组变量，列数量与标签维度一致为1
    label = np.ndarray(shape=(0,1))
    for file in feature_paths:
        //逗号为分隔符；用'?'替换缺失值；不读取表行头
        df = pd.read_table(file, delimiter=',', na_values='?', header=None)
        //用平均值补全缺失值
        imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
        //训练预处理器
        imp.fit(df)
        //生成预处理结果
        df = imp.transform(df)
        //将新读入的数据合并到特征集合中
        feature = np.concatenate((feature, df))
     
    for file in label_paths:
        df = pd.read_table(file, header=None)
        label = np.concatenate((label, df))
    //将多维数组降至一维     
    label = np.ravel(label)
    return feature, label
 
if __name__ == '__main__':
    //设置文件路径
    featurePaths = ['A/A.feature','B/B.feature','C/C.feature','D/D.feature','E/E.feature']
    labelPaths = ['A/A.label','B/B.label','C/C.label','D/D.label','E/E.label']
    //前四个作为训练集读入
    x_train,y_train = load_datasets(featurePaths[:4],labelPaths[:4])
    //后1个作为测试集读入
    x_test,y_test = load_datasets(featurePaths[4:],labelPaths[4:])
    //使用全量数据作为训练集，并将其打乱；便于后续分类器的初始化与训练
    x_train, x_, y_train, y_ = train_test_split(x_train, y_train, test_size = 0.0)
    
    //创建各类分类器，并进行预测 
    print('Start training knn')
    knn = KNeighborsClassifier().fit(x_train, y_train)
    print('Training done')
    answer_knn = knn.predict(x_test)
    print('Prediction done')
     
    print('Start training DT')
    dt = DecisionTreeClassifier().fit(x_train, y_train)
    print('Training done')
    answer_dt = dt.predict(x_test)
    print('Prediction done')
     
    print('Start training Bayes')
    gnb = GaussianNB().fit(x_train, y_train)
    print('Training done')
    answer_gnb = gnb.predict(x_test)
    print('Prediction done')
    
    //对分类结果进行衡量 
    print('\n\nThe classification report for knn:')
    print(classification_report(y_test, answer_knn))
    print('\n\nThe classification report for DT:')
    print(classification_report(y_test, answer_dt))
    print('\n\nThe classification report for Bayes:')
    print(classification_report(y_test, answer_gnb))

//运行结果
/*
Start training knn
Training done
Prediction done
Start training DT
Training done
Prediction done
Start training Bayes
Training done
Prediction done


The classification report for knn:
             precision    recall  f1-score   support

        0.0       0.56      0.60      0.58    102341
        1.0       0.92      0.93      0.93     23699
        2.0       0.94      0.78      0.85     26864
        3.0       0.83      0.82      0.82     22132
        4.0       0.85      0.88      0.87     32033
        5.0       0.39      0.21      0.27     24646
        6.0       0.77      0.89      0.82     24577
        7.0       0.80      0.95      0.87     26271
       12.0       0.32      0.33      0.33     14281
       13.0       0.16      0.22      0.19     12727
       16.0       0.90      0.67      0.77     24445
       17.0       0.89      0.96      0.92     33034
       24.0       0.00      0.00      0.00      7733

avg / total       0.69      0.69      0.68    374783



The classification report for DT:
             precision    recall  f1-score   support

        0.0       0.50      0.73      0.60    102341
        1.0       0.66      0.96      0.78     23699
        2.0       0.83      0.86      0.84     26864
        3.0       0.93      0.73      0.82     22132
        4.0       0.62      0.84      0.71     32033
        5.0       0.70      0.51      0.59     24646
        6.0       0.06      0.01      0.02     24577
        7.0       0.33      0.15      0.20     26271
       12.0       0.63      0.62      0.63     14281
       13.0       0.66      0.55      0.60     12727
       16.0       0.56      0.07      0.13     24445
       17.0       0.85      0.85      0.85     33034
       24.0       0.37      0.32      0.34      7733

avg / total       0.58      0.61      0.57    374783



The classification report for Bayes:
             precision    recall  f1-score   support

        0.0       0.62      0.81      0.70    102341
        1.0       0.97      0.91      0.94     23699
        2.0       1.00      0.65      0.79     26864
        3.0       0.60      0.66      0.63     22132
        4.0       0.91      0.77      0.83     32033
        5.0       1.00      0.00      0.00     24646
        6.0       0.87      0.72      0.79     24577
        7.0       0.31      0.47      0.37     26271
       12.0       0.52      0.59      0.55     14281
       13.0       0.61      0.50      0.55     12727
       16.0       0.89      0.72      0.79     24445
       17.0       0.75      0.91      0.82     33034
       24.0       0.59      0.24      0.34      7733

avg / total       0.74      0.68      0.67    374783

结论：在此实例中，最近邻与朴素贝叶斯要比决策树分类地好。
*/