KNN - K近邻算法
K-Nearest Neighbors
特点
- 思想极度简单
- 应用数学知识少
- 效果好(缺点?)
- 可以解释机器学习算法使用过程中的很多问题
- 更完整的刻画机器学习应用的流程
K近邻算法

取一个k值,假设k=3,
根据一个点A的位置,计算距离这个点最近的3(k)个点的的类型。来进行投票产生A的类型
计算距离:


KNN的过程
import numpy as np
from math import sqrt
from collections import Counter
# knn 过程
def KNN_classify(k, X_train, y_train,x):
assert 1 <= k <= X_train.shape[0], "k must be valid"
assert X_train.shape[0] == y_train.shape[0], 'the size of X_train must equal\
to the size of y_train'
assert X_train.shape[1] == x.shape[0],'the feature number of x must be \
equal to X_train'
distances = [sqrt(np.sum((X_train-x)**2)) for x_train in X_train] #计算距离
nearest = np.argsort(distances) # # 按照大小排序是的index顺序
topk_y = [y_train[i] for i in nearest[:k]]
votes = Counter(topk_y)
return votes.most_common(1)[0][0] # 列出统计的最多个数的值
k近邻算法是非常特殊的,可以被认为是没有模型的算法
为了和其他算法统一,可以认为训练数据集就是模型本身
sklearn中的knn
from sklearn.neighbors import KNeighborsClassifier
KNN_classifier = KNeighborsClassifier(n_neighbors=6) # 创建对象
KNN_classifier.fit(X_train, y_train) # 拟合训练数据集
KNN_classifier.predict(x) # 预测
重新整理 knn, 仿照sklearn中的接口
import numpy as np
from math import sqrt
from collections import Counter
## 重新整理
class KNNClassifier:
def __init__(self, k):
"""初始化knn分类器"""
assert k >= 1, 'k must be valid'
self.k = k
self._X_train = None
self._y_train = None
def fit(self, X_train, y_train):
assert X_train.shape[0] == y_train.shape[0], 'the size of X_train must equal\
to the size of y_train'
assert self.k <= X_train.shape[0], 'the size of X_train must be at least k'
self._X_train = X_train
self._y_train = y_train
return self
def predict(self, X_predict):
assert self._X_train is not None and self._y_train is not None,'must fit before predict'
assert X_predict.shape[1] == self._X_train.shape[1],'the feature number of X_predict must be equal to X_train'
y_predict = [self._predict(x) for x in X_predict]
return np.array(y_predict)
def _predict(self, x):
assert x.shape[0] == self._x_train.shape[1], 'the feature number of x must be equal to X_train'
distances = [sqrt(np.sum((x_train - x ) **2)) for x_train in self._X_train]
nearest = np.argsort(distances)
topk_y = [self._y_train[i] for i in nearest[:self.k]]
votes = Counter(topk_y)
return votes.most_common(1)[0][0]
def __repr__(self):
return f'KNN(k={self.k}'

最低0.47元/天 解锁文章
2816

被折叠的 条评论
为什么被折叠?



