KNN 算法优缺点:
优点:精度高,对异常值不敏感
缺点:计算复杂度高,空间复杂度高
使用数据范围:数值型和标称型
有标签的分类算法:即输入一个无标签的数据系列,与有标签的现有数据属性进行对比,算法提取样本集中特征最相似的K个分类标签,最后选择K个相似数据中出现次数最多的分类。
sklearn 实现KNN 算法
def sklearn_test():
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np
np.random.seed(0)
iris = datasets.load_iris()
iris_x, iris_y = iris.data, iris.target
# indices = np.random.permutation(len(iris_x)) # 产生随机数
# iris_x_train, iris_x_test = iris_x[indices[:-10]], iris_x[indices[-10:]
iris_x_train, iris_x_test, iris_y_train, iris_y_test = train_test_split(iris_x, iris_y, test_size=0.1,random_state=42)
knn = KNeighborsClassifier()
knn.fit(iris_x_train, iris_y_train)
iris_y_predict = knn.predict(iris_x_test)
probability = knn.predict_proba(iris_x_test)
print("hrllo")
score = knn.score(iris_x_test, iris_y_test, sample_weight=None)
print('then predict result of iris is:', iris_y_predict, 'and the real result of iris is: %d', iris_y_test)
print('the accuracy is: %.2f' % score)
# print("the neighbor point of last test sample:", neighborpoint)
print("the probability is:", probability)
KNN 算法的代码实现步骤:
import numpy as np
import matplotlib.pyplot as plt
# 创建训练集
def create_data():
x_train = np.array([[1,1.1],
[1.3,0.8],
[1.4,1.2],
[1.1,0.9],
[0.8,1.5],
[2.5,2],
[3.4,2.5],
[3.7,2.5],
[2,3]])
y_train = np.array(['a','a','a','a','a','b','b','b','b'])
return x_train, y_train
# 预测点
x_tes