KNN, 数据归一化

KNN - K近邻算法
K-Nearest Neighbors

特点
  • 思想极度简单
  • 应用数学知识少
  • 效果好(缺点?)
  • 可以解释机器学习算法使用过程中的很多问题
  • 更完整的刻画机器学习应用的流程
K近邻算法

在这里插入图片描述
取一个k值,假设k=3,
根据一个点A的位置,计算距离这个点最近的3(k)个点的的类型。来进行投票产生A的类型

计算距离:
在这里插入图片描述

在这里插入图片描述
KNN的过程

import numpy as np
from math import sqrt
from collections import Counter


# knn 过程
def KNN_classify(k, X_train, y_train,x):
    assert 1 <= k <= X_train.shape[0], "k must be valid"
    assert X_train.shape[0] == y_train.shape[0], 'the size of X_train must equal\
     to the size of y_train'
    assert X_train.shape[1] == x.shape[0],'the feature number of x must be \
    equal to X_train'
    distances = [sqrt(np.sum((X_train-x)**2)) for x_train in X_train]   #计算距离
    nearest = np.argsort(distances) # # 按照大小排序是的index顺序
    topk_y = [y_train[i] for i in nearest[:k]]
    votes = Counter(topk_y)
    return votes.most_common(1)[0][0]   # 列出统计的最多个数的值

k近邻算法是非常特殊的,可以被认为是没有模型的算法
为了和其他算法统一,可以认为训练数据集就是模型本身

sklearn中的knn
from sklearn.neighbors import KNeighborsClassifier
KNN_classifier = KNeighborsClassifier(n_neighbors=6)  # 创建对象
KNN_classifier.fit(X_train, y_train)  # 拟合训练数据集
KNN_classifier.predict(x)  # 预测
重新整理 knn, 仿照sklearn中的接口
import numpy as np
from math import sqrt
from collections import Counter

## 重新整理
class KNNClassifier:
    def __init__(self, k):
        """初始化knn分类器"""
        assert k >= 1, 'k must be valid'
        self.k = k
        self._X_train = None
        self._y_train = None

    def fit(self, X_train, y_train):
        assert X_train.shape[0] == y_train.shape[0], 'the size of X_train must equal\
         to the size of y_train'
        assert self.k <= X_train.shape[0], 'the size of X_train must be at least k'
        self._X_train = X_train
        self._y_train = y_train
        return self

    def predict(self, X_predict):
        assert self._X_train is not None and self._y_train is not None,'must fit before predict'
        assert X_predict.shape[1] == self._X_train.shape[1],'the feature number of X_predict must be equal to X_train'
        y_predict = [self._predict(x) for x in X_predict]
        return np.array(y_predict)

    def _predict(self, x):
        assert x.shape[0] == self._x_train.shape[1], 'the feature number of x must be equal to X_train'
        distances = [sqrt(np.sum((x_train - x ) **2)) for x_train in self._X_train]
        nearest = np.argsort(distances)
        topk_y = [self._y_train[i] for i in nearest[:self.k]]
        votes = Counter(topk_y)
        return votes.most_common(1)[0][0]

    def __repr__(self):
        return f'KNN(k={self.k}'

判断机器学习算法的性能
  • 2
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值