cs231n作业-KNN交叉验证代码

琢磨了一下网上其他人的,自己改写了 一下,因为本身用了array_split以后会输出列表,所以网上好多人用了np.vstack之类的方法,我觉得不行太麻烦了,干脆直接转成数组做。

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

X_train_folds = []
y_train_folds = []
################################################################################
# TODO:                                                                        #
# Split up the training data into folds. After splitting, X_train_folds and    #
# y_train_folds should each be lists of length num_folds, where                #
# y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
# Hint: Look up the numpy array_split function.                                #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
X_train_folds = np.array(np.array_split(X_train, num_folds, axis = 0))
y_train_folds = np.array(np.array_split(y_train, num_folds, axis = 0))

pass

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# A dictionary holding the accuracies for different values of k that we find
# when running cross-validation. After running cross-validation,
# k_to_accuracies[k] should be a list of length num_folds giving the different
# accuracy values that we found when using that value of k.
k_to_accuracies = {}


################################################################################
# TODO:                                                                        #
# Perform k-fold cross validation to find the best value of k. For each        #
# possible value of k, run the k-nearest-neighbor algorithm num_folds times,   #
# where in each case you use all but one of the folds as training data and the #
# last fold as a validation set. Store the accuracies for all fold and all     #
# values of k in the k_to_accuracies dictionary.                               #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
for ki in k_choices:
    k_to_accuracies[ki]=[]
    for fi in range(num_folds):
        x_train_tem = np.delete(X_train_folds,fi,axis=0).reshape(-1, X_train_folds.shape[2])
        y_train_tem = np.delete(y_train_folds,fi,axis=0).reshape(-1, 1)
        x_test_tem = X_train_folds[fi]
        y_test_tem = y_train_folds[fi]
        
        classifier=KNearestNeighbor()        
        classifier.train(x_train_tem, y_train_tem)
        dists=classifier.compute_distances_no_loops(x_test_tem)
        y_test_pred = classifier.predict_labels(dists, ki)
        num_correct = np.sum(y_test_pred == y_test_tem)
        accuracy = float(num_correct) / x_test_tem.shape[0]
        k_to_accuracies[ki].append(accuracy)


# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print('k = %d, accuracy = %f' % (k, accuracy))

交叉验证这部分很多人不理解哈,主要代码也有锅,这里面的test实际上是val,也是训练集的一部分。这其中我们有5000个训练数据,然后按作者的分成5份,也就是说4个是train data,剩下1个是test(val) data,所以我就粗暴简单,循环到fi的时候,选序号为fi的作为验证(测试)集,比如fi=1的时候,我们本来有0,1,2,3,4五个集,现在去掉1,剩下序号为0,2,3,4拿来做训练集,因此我直接从原数据里删除了第fi行的数据,原先X_train_folds大小是(5,1000,3072),去掉第一个变成(4,1000,3072),但我们塞进classifier的数据是(数量,其他所有维-3072),因此保留最后的3072,让前面两维相乘,这里reshape的时候-1自适应就好了。test数据同理,因为其本身就比训练数据少一维,所以操作略有不同。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值