cs231n作业-KNN交叉验证代码

最新推荐文章于 2023-08-30 07:50:31 发布

蓝德库洛尔多

最新推荐文章于 2023-08-30 07:50:31 发布

阅读量613

点赞数 1

分类专栏：斯坦福CNN 文章标签： python 神经网络深度学习

本文链接：https://blog.csdn.net/qq_33500389/article/details/106843304

版权

斯坦福CNN 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

琢磨了一下网上其他人的，自己改写了一下，因为本身用了array_split以后会输出列表，所以网上好多人用了np.vstack之类的方法，我觉得不行太麻烦了，干脆直接转成数组做。

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

X_train_folds = []
y_train_folds = []
################################################################################
# TODO:                                                                        #
# Split up the training data into folds. After splitting, X_train_folds and    #
# y_train_folds should each be lists of length num_folds, where                #
# y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
# Hint: Look up the numpy array_split function.                                #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
X_train_folds = np.array(np.array_split(X_train, num_folds, axis = 0))
y_train_folds = np.array(np.array_split(y_train, num_folds, axis = 0))

pass

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# A dictionary holding the accuracies for different values of k that we find
# when running cross-validation. After running cross-validation,
# k_to_accuracies[k] should be a list of length num_folds giving the different
# accuracy values that we found when using that value of k.
k_to_accuracies = {}


################################################################################
# TODO:                                                                        #
# Perform k-fold cross validation to find the best value of k. For each        #
# possible value of k, run the k-nearest-neighbor algorithm num_folds times,   #
# where in each case you use all but one of the folds as training data and the #
# last fold as a validation set. Store the accuracies for all fold and all     #
# values of k in the k_to_accuracies dictionary.                               #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
for ki in k_choices:
    k_to_accuracies[ki]=[]
    for fi in range(num_folds):
        x_train_tem = np.delete(X_train_folds,fi,axis=0).reshape(-1, X_train_folds.shape[2])
        y_train_tem = np.delete(y_train_folds,fi,axis=0).reshape(-1, 1)
        x_test_tem = X_train_folds[fi]
        y_test_tem = y_train_folds[fi]
        
        classifier=KNearestNeighbor()        
        classifier.train(x_train_tem, y_train_tem)
        dists=classifier.compute_distances_no_loops(x_test_tem)
        y_test_pred = classifier.predict_labels(dists, ki)
        num_correct = np.sum(y_test_pred == y_test_tem)
        accuracy = float(num_correct) / x_test_tem.shape[0]
        k_to_accuracies[ki].append(accuracy)


# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print('k = %d, accuracy = %f' % (k, accuracy))

交叉验证这部分很多人不理解哈，主要代码也有锅，这里面的test实际上是val，也是训练集的一部分。这其中我们有5000个训练数据，然后按作者的分成5份，也就是说4个是train data，剩下1个是test(val) data，所以我就粗暴简单，循环到fi的时候，选序号为fi的作为验证（测试）集，比如fi=1的时候，我们本来有0，1，2，3，4五个集，现在去掉1，剩下序号为0，2，3，4拿来做训练集，因此我直接从原数据里删除了第fi行的数据，原先X_train_folds大小是（5，1000，3072），去掉第一个变成（4，1000，3072），但我们塞进classifier的数据是（数量，其他所有维-3072），因此保留最后的3072，让前面两维相乘，这里reshape的时候-1自适应就好了。test数据同理，因为其本身就比训练数据少一维，所以操作略有不同。