K近邻法和Cross-validation

目录

导包和数据处理

K近邻

测试k近邻算法

交叉检验Cross-validation

画出交叉检验的结果

 最佳K的精度

值得注意的函数

np.sum

np.argsort

np.bincount

np.delete


理论部分:

CS231n笔记--图片线性分类_iwill323的博客-CSDN博客

导包和数据处理

# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n\datasets\CIFAR10'

# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')
except:
   pass

X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

# Subsample the data for morze efficient code execution in this exercise
num_training = 5000
X_train = X_train[:num_training]
y_train = y_train[:num_training]

num_test = 500
X_test = X_test[:num_test]
y_test = y_test[:num_test]
print(X_train.shape, X_test.shape)
# Reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
print(X_train.shape, X_test.shape)  # (5000, 3072) (500, 3072)

K近邻

# 辅助函数
import numpy as np

class KNearestNeighbor(object):
    """ a kNN classifier with L2 distance """

    def __init__(self):
        pass

    def train(self, X, y):
        """
        Train the classifier. For k-nearest neighbors this is just memorizing the training data.

        Inputs:
        - X: A numpy array of shape (num_train, D) containing the training data
          consisting of num_train samples each of dimension D.
        - y: A numpy array of shape (N,) containing the training labels, where
             y[i] is the label for X[i].
        """
        self.X_train = X
        self.y_train = y

    def predict(self, X, k=1, num_loops=0):
        """
        Inputs:
        - X: A numpy array of shape (num_test, D) containing test data consisting
      
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Group-wise cross-validation is a type of cross-validation that is used when the data has a group structure. It is a more appropriate approach when the samples are collected from different subjects, experiments, or measurement devices. In group-wise cross-validation, the data is divided into groups, and the validation process is performed on each group separately. This ensures that the model is evaluated on data from different groups, which helps to assess its generalization performance in real-world scenarios. Here is an example of how group-wise cross-validation can be implemented using the K-fold cross-validation technique: ```python from sklearn.model_selection import GroupKFold from sklearn.linear_model import LogisticRegression # Assuming we have features X, labels y, and groups g X = ... y = ... groups = ... # Create a group-wise cross-validation iterator gkf = GroupKFold(n_splits=5) # Initialize a model model = LogisticRegression() # Perform group-wise cross-validation for train_index, test_index in gkf.split(X, y, groups): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] # Fit the model on the training data model.fit(X_train, y_train) # Evaluate the model on the test data score = model.score(X_test, y_test) # Print the evaluation score print("Validation score: ", score) ``` In this example, the data is divided into 5 groups using the GroupKFold function. The model is then trained and evaluated on each group separately. The evaluation score for each fold is printed to assess the model's performance.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值