用支持向量机做鸢尾花分类预测时
敲代码有个关于KeyError的报错,如下:
KeyError: “None of [Int64Index([0, 1, 2, 3], dtype=‘int64’)] are in the [columns]”
原始代码如下
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
import numpy as np
data = pd.read_csv(r"G:\实验6/iris.csv")
x, y = data[range(4)], data[4]
y = pd.Categorical(y).codes
x = x[[0,1,2,3]]
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.6)
clf=svm.SVC(C=0.4,kernel='rbf',gamma=20,decision_function_shape='ovr')
clf.fit(x_train, y_train.ravel())
print('训练集准确率:', accuracy_score(y_train, clf.predict(x_train)))
print('测试集准确率:', accuracy_score(y_test, clf.predict(x_test)))
一开始看了很多博文
以为是数据类型的错误,或者是pandas包版本的问题,跟着他们的博文改,发现还是报错。
然后探索发现并不是,查看鸢尾花数据发现
鸢尾花数据集中没有索引行,详解见如下链接
https://www.cnblogs.com/komean/p/10629311.html
修改后代码不再报错
修改后代码如下:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
import numpy as np
data = pd.read_csv(r"G:\实验6/iris.csv",header=None)
x, y = data[range(4)], data[4]
y = pd.Categorical(y).codes
x = x[[0,1,2,3]]
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.6)
clf=svm.SVC(C=0.4,kernel='rbf',gamma=20,decision_function_shape='ovr')
clf.fit(x_train, y_train.ravel())
print('训练集准确率:', accuracy_score(y_train, clf.predict(x_train)))
print('测试集准确率:', accuracy_score(y_test, clf.predict(x_test)))
不再报错,运行成功