K临近算法实现
最近开始学习机器学习算法,这是我写的第一篇博客,主要为了记录自己每天学习了什么,课程内容来自于慕课网。
首先导入我们需要用到的numpy 和 matplotlib函数库
import numpy as np
import matplotlib.pyplot as plt
训练数据集
raw_data_X=[[3.393533211,2.331273381],
[3.110073483,1.781539638],
[1.343808831,3.368360954],
[3.582294042,4.679179110],
[2.280362439,2.866990263],
[7.423436942,4.696522875],
[5.745051997,3.533989803],
[9.172168622,2.511101045],
[7.792783481,3.424088941],
[7.939820817,0.791637231]
]
raw_data_y=[0,0,0,0,0,1,1,1,1,1]
x_train=np.array(raw_data_X)
x_train
y_train=np.array(raw_data_y)
y_train
数据可视化,通过散点图来标记不同类别
plt.scatter(x_train[y_train==0,0],x_train[y_train==0,1],color='g')
plt.scatter(x_train[y_train==1,0],x_train[y_train==1,1],color='r')
plt.show()
添加测试点,用蓝色标记进行可视化
x=np.array([8.093607318,3.365731514])
plt.scatter(x_train[y_train==0,0],x_train[y_train==0,1],color='g')
plt.scatter(x_train[y_train==1,0],x_train[y_train==1,1],color='r')
plt.scatter(x[0],x[1],color='b')
plt.show()
KNN过程
//距离
from math import sqrt
distances=[sqrt(np.sum((x_train-x)**2)) for x_train in x_train]
distances
//最近距离
nearest=np.argsort(distances)
nearest
k=6
//最近6个点的y值
topk_y=[y_train[i] for i in nearest[:k]]
topk_y
//计算相同y值个数
from collections import Counter
votes=Counter(topk_y)
//选出个数最多的y值,即为预测值
predict=votes.most_common(1)[0][0]
predict
一篇自己的读书笔记,如有错误欢迎批评指教,如有侵权删。