sklearn的简单使用

最新推荐文章于 2022-09-22 17:28:58 发布

路和远方

最新推荐文章于 2022-09-22 17:28:58 发布

阅读量358

点赞数 1

分类专栏： Python 文章标签： python 机器学习

本文链接：https://blog.csdn.net/shuzhuchengfu/article/details/115179621

版权

Python 专栏收录该内容

34 篇文章

订阅专栏

该博客展示了如何使用Python的sklearn库进行K近邻（KNN）分类。首先，它将图片数据转化为向量，然后从本地文件中获取训练和测试数据。通过train_test_split进行数据划分，接着用KNeighborsClassifier训练模型，并计算测试数据的准确率。最后，模型被保存和加载以进行预测。

摘要由CSDN通过智能技术生成

sklearn的简单使用

import numpy as np
from sklearn import neighbors
from sklearn.model_selection import train_test_split
import joblib

import os


# 将图片数据转化
def image2vector(filename):
    returnVect = np.zeros(1024)
    fr = open(filename)
    for i in range(32):
        lineStr = fr.readline()
        for j in range(32):
            returnVect[i * 32 + j] = int(lineStr[j])
    return returnVect


# 获取本地数据
def getSourceDatas(filepath):
    datas = []
    labels = []
    for files in os.listdir(filepath):
        datas.append(image2vector(filepath + "/" + files))
        labels.append(files.strip().split("_")[0])
    return datas, labels


if __name__ == "__main__":
    # 获取数据
    dir_path = os.getcwd()
    train_path = dir_path + "/trainingDigits"
    test_path = dir_path + "/testDigits"
    train_datas, train_labels = getSourceDatas(train_path)
    test_datas, test_labels = getSourceDatas(test_path)
    x_train, x_test, y_train, y_test = train_test_split(train_datas, train_labels, test_size=0.3)
    # 训练
    clf = neighbors.KNeighborsClassifier()
    clf.fit(train_datas, train_labels)
    # 获取测试数据正确率
    score = clf.score(test_datas, test_labels)
    print("正确率：" + str(score))
    # 保存模型
    joblib.dump(clf, 'clf.pkl')
    # 加载模型
    clf3 = joblib.load('clf.pkl')
    print("预测结果：")
    print(clf3.predict(test_datas[0:1]))
    print("test结果：")
    print(test_labels[0:1])

测试和训练数据以及源码