机器学习 -- KNN算法(Ⅲ 肿瘤预测案例 -- 封装成函数)

本节对上一节机器学习 -- KNN算法(肿瘤预测案例)在jupyter中的代码修改为函数式实现。

(1)首先导入所有需要的模块和包:

import numpy as np
from collections import Counter

(2)导入数据集的函数实现:

def loadData():
    """
    加载数据集(这里只是通过手动的方式构造数据集,之后会使用文件读取的方式)
    :return: 训练集的特征值X_train的numpy数组, 训练集的目标值y_train的numpy数组
    """
    raw_data_X = [[3.3935, 2.3312],
                  [3.1101, 1.7815],
                  [1.3438, 3.3684],
                  [3.5823, 4.6792],
                  [2.2804, 2.8670],
                  [7.4234, 4.6965],
                  [5.7451, 3.5340],
                  [9.1722, 2.5111],
                  [7.7928, 3.4241],
                  [7.9398, 0.7916]]
    raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
    X_train = np.array(raw_data_X)
    y_train = np.array(raw_data_y)
    return X_train, y_train

(3)kNN分类的函数实现:

def kNN_classify(k, X_train, y_train, x):
    """
    kNN分类实现
    :param k: 选取的最近k个点
    :param X_train: 训练集的特征值
    :param y_train: 训练集的目标值
    :param x: 待预测数据集
    :return: 预测结果
    """
    # 用断言保证用户输入数据合法
    assert 1 <= k <= X_train.shape[0], "k必须为有效值"
    assert X_train.shape[0] == y_train.shape[0], "训练集X和y的大小必须相同"
    assert X_train.shape[1] == x.shape[0], "待测数据x的特征数必须和训练集X一致"

    # 获取所有样本点和待测点的距离
    distances = []
    for x_train in X_train:
        d = (np.sum((x_train - x) ** 2)) ** 0.5
        distances.append(d)

    sorted_index = np.argsort(distances)

    top_K = [y_train[i] for i in sorted_index[:k]]

    return Counter(top_K).most_common(1)[0][0]  # [(1, 5)]

(4)主函数:

if __name__ == "__main__":
    # 获取数据集
    X_train, y_train = loadData()
    # 待预测数据
    x = np.array([8.0936, 3.3657])
    k = 6
    print(kNN_classify(k, X_train, y_train, x))

完整代码实现

import numpy as np
from collections import Counter


def kNN_classify(k, X_train, y_train, x):
    """
    kNN分类实现
    :param k: 选取的最近k个点
    :param X_train: 训练集的特征值
    :param y_train: 训练集的目标值
    :param x: 待预测数据集
    :return: 预测结果
    """
    # 用断言保证用户输入数据合法
    assert 1 <= k <= X_train.shape[0], "k必须为有效值"
    assert X_train.shape[0] == y_train.shape[0], "训练集X和y的大小必须相同"
    assert X_train.shape[1] == x.shape[0], "待测数据x的特征数必须和训练集X一致"

    # 获取所有样本点和待测点的距离
    distances = []
    for x_train in X_train:
        d = (np.sum((x_train - x) ** 2)) ** 0.5
        distances.append(d)

    sorted_index = np.argsort(distances)

    top_K = [y_train[i] for i in sorted_index[:k]]

    return Counter(top_K).most_common(1)[0][0]  # [(1, 5)]


def loadData():
    """
    加载数据集(这里只是通过手动的方式构造数据集,之后会使用文件读取的方式)
    :return: 训练集的特征值X_train的numpy数组, 训练集的目标值y_train的numpy数组
    """
    raw_data_X = [[3.3935, 2.3312],
                  [3.1101, 1.7815],
                  [1.3438, 3.3684],
                  [3.5823, 4.6792],
                  [2.2804, 2.8670],
                  [7.4234, 4.6965],
                  [5.7451, 3.5340],
                  [9.1722, 2.5111],
                  [7.7928, 3.4241],
                  [7.9398, 0.7916]]
    raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
    X_train = np.array(raw_data_X)
    y_train = np.array(raw_data_y)
    return X_train, y_train


if __name__ == "__main__":
    # 获取数据集
    X_train, y_train = loadData()
    # 待预测数据
    x = np.array([8.0936, 3.3657])
    k = 6

    print(kNN_classify(k, X_train, y_train, x))

 

展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客
应支付0元
点击重新获取
扫码支付

支付成功即可阅读