kaggle常见操作及错误

最新推荐文章于 2024-07-01 15:34:32 发布

carpediemZJ

最新推荐文章于 2024-07-01 15:34:32 发布

阅读量5.1k

点赞数 1

本文链接：https://blog.csdn.net/qq_36478773/article/details/102725011

版权

本文介绍了在Kaggle平台上的常见操作，包括如何读取文件、使用sklearn绘制混淆矩阵以及封装API接口。在使用sklearn过程中，特别提醒了关于模型训练的注意事项，如fit函数对数据形状的要求。同时，文章列举了一个常见的错误案例——name 'file' is not defined，并给出了修正方案。

摘要由CSDN通过智能技术生成

常用操作

读取文件时，查看当前文件夹：

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

在这里插入图片描述

上传到kaggle的文件直接位于input文件夹下(不用管input显示的下层文件夹)
在这里插入图片描述
注意，魔术行和命令直接不要有空格 % ls错误。

kaggle一开始，默认处于kaggle/working

因而直接使用xArr, yArr = loadDataSet('../input/ex0.txt') 读取文件即可。

使用sklearn机器学习库

绘制混淆矩阵

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import numpy as np

def cm_plot(original_label, predict_label, pic=None):
    cm = confusion_matrix(original_label, predict_label)   # 直接生成n*n混淆矩阵
    plt.figure()
    plt.matshow(cm, cmap=plt.cm.YlOrRd)     # 画混淆矩阵，配色风格使用cm.Blues
    plt.colorbar()    # 添加颜色渐变标签
    for x in range(len(cm)):
        for y in range(len(cm)):
            plt.annotate(cm[x, y], xy=(x, y)