机器学习-kNN

在这里插入图片描述

kNN
0. 加载相关模块
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split

# 用于在jupyter中进行绘图
%matplotlib inline
1. 数据加载
1.1 数据预览
# 加载数据集
fruits_df = pd.read_table('fruit_data_with_colors.txt')

# 数据预览
fruits_df.head()
fruit_label	fruit_name	fruit_subtype	mass	width	height	color_score
0	1	apple	granny_smith	192	8.4	7.3	0.55
1	1	apple	granny_smith	180	8.0	6.8	0.59
2	1	apple	granny_smith	176	7.4	7.2	0.60
3	2	mandarin	mandarin	86	6.2	4.7	0.80
4	2	mandarin	mandarin	84	6.0	4.6	0.79
print('样本个数:', len(fruits_df))
样本个数: 59
sns.countplot(fruits_df['fruit_name'], label="Count")


1.2 数据处理
# 创建目标标签和名称的字典
fruit_name_dict = dict(zip(fruits_df['fruit_label'], fruits_df['fruit_name']))
print(fruit_name_dict)
{1: 'apple', 2: 'mandarin', 3: 'orange', 4: 'lemon'}
# 划分数据集
X = fruits_df[['mass', 'width', 'height', 'color_score']]
y = fruits_df['fruit_label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/4, random_state=0)
print('数据集样本数:{},训练集样本数:{},测试集样本数:{}'.format(len(X), len(X_train), len(X_test)))
数据集样本数:59,训练集样本数:44,测试集样本数:15
2. 可视化查看特征变量
# 查看数据集 
sns.pairplot(data=fruits_df, hue='fruit_name', vars=['mass', 'width', 'height', 'color_score'])

<seaborn.axisgrid.PairGrid at 0x17b9d446fd0>
%matplotlib notebook
from mpl_toolkits.mplot3d import Axes3D

label_color_dict = {1: 'red', 2: 'green', 3: 'blue', 4: 'yellow'}
colors = list(map(lambda label: label_color_dict[label], y_train))

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X_train['width'], X_train['height'], X_train['color_score'], c=colors, marker='o', s=100)
ax.set_xlabel('width')
ax.set_ylabel('height')
ax.set_zlabel('color_score')
plt.show()

3. 建立/选择模型
from sklearn.neighbors import KNeighborsClassifier

# 建立kNN模型
knn = KNeighborsClassifier(n_neighbors=5)
4. 训练模型
knn.fit(X_train, y_train)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')
5. 测试模型
y_pred = knn.predict(X_test)
print('预测标签:', y_pred)
预测标签: [3 1 4 4 1 1 3 3 1 4 2 1 3 1 4]
print('真实标签:', y_test.values)
真实标签: [3 3 4 3 1 1 3 4 3 1 2 1 3 3 3]
from sklearn.metrics import accuracy_score

acc = accuracy_score(y_test, y_pred)
print('准确率:', acc)
准确率: 0.5333333333333333
6. 查看k值对结果的影响
k_range = range(1, 20)
acc_scores = []

for k in k_range:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    acc_scores.append(knn.score(X_test, y_test))
    
plt.figure()
plt.xlabel('k')
plt.ylabel('accuracy')
plt.plot(k_range, acc_scores, marker='o')
plt.xticks([0, 5, 11, 15, 21])

([<matplotlib.axis.XTick at 0x17b9eb40748>,
  <matplotlib.axis.XTick at 0x17b9eb89ba8>,
  <matplotlib.axis.XTick at 0x17b9eb55550>,
  <matplotlib.axis.XTick at 0x17ba06380f0>,
  <matplotlib.axis.XTick at 0x17ba0638ba8>],
 <a list of 5 Text xticklabel objects>)
# 只查看width和height两列特征
from ml_visualization import plot_fruit_knn

plot_fruit_knn(X_train, y_train, 1)
plot_fruit_knn(X_train, y_train, 5)
plot_fruit_knn(X_train, y_train, 10)




 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值