cs231n——图像分类
K-Nearest Neighbor分类(NN&KNN)
-
NN(Nearest Neighbors)
- 概念:算法会在已知数据集中寻找与这个新点最接近的数据点
- 使用样例
-
CIFAR-10数据集
-
import numpy as np import pickle import os from sklearn.neighbors import NearestNeighbors def load_CIFAR_batch(filename): """ 从CIFAR文件中加载数据 """ with open(filename, 'rb') as f: datadict = pickle.load(f, encoding='bytes') X = datadict[b'data'] Y = datadict[b'labels'] X = X.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype("float") Y = np.array(Y) return X, Y def load_CIFAR10(ROOT): """ 加载整个CIFAR-10数据集 """ xs = [] ys = [] for b in range(1, 6): f = os.path.join(ROOT, 'data_batch_%d' % (b, )) X, Y = load_CIFAR_batch(f) xs.append(X) ys.append(Y) Xtr = np.concatenate(xs) Ytr = np.concatenate(ys) del X, Y Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch')) return Xtr, Ytr, Xte, Yte # 使用示例 Xtr, Ytr, Xte, Yte = load_CIFAR10('.\\cifar-10-batches-py\\') Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) #默认欧几里得距离 nn = NearestNeighbors() #使用曼哈顿距离 # nn = NearestNeighbors(metric='manhattan') nn.fit(Xtr_rows) # 寻找每个测试点的最近邻居 distances, indices = nn.kneighbors(Xte_rows) # 预测:最近邻居的标签 Yte_predict = Ytr[indices[:, 0]] # 计算准确度 accuracy = np.mean(Yte_predict == Yte) print('accuracy: %f' % accuracy)
-
-
-
KNN(K-Nearest Neighbors)
- 特点:具有超参数K,算法会考虑最接近的K个邻居(K是一个正整数)
- 使用样例
-
鸢尾花数据集 ^3704e6d4-9610-bc19
-
# 导入必要的库 from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import classification_report, confusion_matrix # 加载鸢尾花数据集 iris = load_iris() X = iris.data y = iris.target # 将数据集分为训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70%用于训练,30%用于测试,train可以理解为 # 创建KNN分类器实例,这里设置K为3 knn = KNeighborsClassifier(n_neighbors=3) # 训练模型 knn.fit(X_train, y_train) # 使用模型进行预测 y_pred = knn.predict(X_test) # 输出预测结果 print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred)) print("\nClassification Report:\n", classification_report(y_test, y_pred))
-
-
糖尿病数据集 ^7b0db5e8-1e2d-1a80
-
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import classification_report, confusion_matrix, accuracy_score # 加载数据:从Excel文件中读取数据 df = pd.read_excel('knn.xlsx') # 准备特征和目标变量: # 删除'Outcome'列以获取特征集(X),并将结果转换为NumPy数组 X = df.drop('Outcome', axis=1).to_numpy() # 获取目标变量(Y),即'Outcome'列,并将结果转换为NumPy数组 Y = df['Outcome'].to_numpy() # 将数据集分为训练集和测试集 # 测试集大小设为数据的30% X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3) # 初始化KNN分类器,设置邻居数量为3 knn = KNeighborsClassifier(n_neighbors=3) # 训练分类器:使用训练集数据训练KNN模型 knn.fit(X_train, y_train) # 使用训练好的模型对测试集进行预测 y_pred = knn.predict(X_test) # 输出模型性能评估结果 # 混淆矩阵:显示实际类别和预测类别之间的关系 print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred)) # 分类报告:包括精确度、召回率、F1分数等重要指标 print("\nClassification Report:\n", classification_report(y_test, y_pred)) # 准确率:正确分类的样本比例 accuracy = accuracy_score(y_test, y_pred) print('Accuracy: %f' % accuracy)
-
-
-
距离函数
- L1距离(曼哈顿距离)
- L2距离(欧几里得距离)
- L1距离(曼哈顿距离)
-
区别
- KNN是监督式学习,NN是非监督式学习
线性分类器
- 概念
- 线性分类器会尝试找到一条直线(在二维空间)、一个平面(在三维空间)或者一个超平面(在更高维度的空间)来分隔不同的类别。这种分类器的关键特性是它们做出决策的依据是线性的。
- 通用形式
-
f
(
x
,
W
)
=
W
x
+
b
f(x,W)=Wx+b
f(x,W)=Wx+b
- x x x是特征向量, W W W是模型权重矩阵, b b b是偏置项
-
f
(
x
,
W
)
=
W
x
+
b
f(x,W)=Wx+b
f(x,W)=Wx+b
- 作用
- 通常能根据特征向量 x x x来输出一个分数向量