这是一个系列《Web安全之机器学习入门》的笔记集合,包含书中第五章-第十七章的内容。
这一小节主要内容是讲解K近邻的基本用法,训练数据集是二维平面上的若干点,邻居数设置为2,如下所示:
from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)
print(distances)
print(indices)
print(nbrs.kneighbors_graph(X).toarray())
运行结果如下所示:
None
[[0. 1. ]
[0. 1. ]
[0. 1.41421356]
[0. 1. ]
[0. 1. ]
[0. 1.41421356]]
[[0 1]
[1 0]
[2 1]
[3 4]
[4 3]
[5 4]]
[[1. 1. 0. 0. 0. 0.]
[1. 1. 0. 0. 0. 0.]
[0. 1. 1. 0. 0. 0.]
[0. 0. 0. 1. 1. 0.]
[0. 0. 0. 1. 1. 0.]
[0. 0. 0. 0. 1. 1.]]
同样,KNN用于监督学习也很简单,这部分源码作者配套github并不包含,均为手打,具体如下:
from sklearn.neighbors import KNeighborsClassifier
X = [[0], [1], [2],[3]]
y = [0, 0, 1, 1]
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X,y)
print(neigh.predict([[1.1]]))
print(neigh.predict_proba([[0.9]]))
运行结果如下所示:
[0]
[[0.66666667 0.33333333]]