Python的pandas模块apply函数报KeyError: None of [['xxx', 'yyy','zzz']] are in the [index]

最新推荐文章于 2024-07-17 10:27:53 发布

荒野雄兵

最新推荐文章于 2024-07-17 10:27:53 发布

阅读量4.5w

点赞数 14

分类专栏： Python 人工智能文章标签： python pandas apply函数

本文链接：https://blog.csdn.net/daerzei/article/details/84993865

版权

Python 同时被 2 个专栏收录

9 篇文章 7 订阅

订阅专栏

人工智能

8 篇文章 0 订阅

订阅专栏

问题重现

在用:Logistic算法做鸢尾花分类预测的时候遇见这么一个错误:

Traceback (most recent call last):
  File "/home/dong/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-e2c3655ccb47>", line 1, in <module>
    runfile('/home/dong/opt/workspace/github/huanLing/src/main/com/dong/ai/learn/classification/Iris_learn.py', wdir='/home/dong/opt/workspace/github/huanLing/src/main/com/dong/ai/learn/classification')
  File "/opt/tools/pycharm-2018.3/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/opt/tools/pycharm-2018.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/dong/opt/workspace/github/huanLing/src/main/com/dong/ai/learn/classification/Iris_learn.py", line 60, in <module>
    X = datas[names[0:-1]]
  File "/home/dong/.local/lib/python3.6/site-packages/pandas/core/series.py", line 810, in __getitem__
    return self._get_with(key)
  File "/home/dong/.local/lib/python3.6/site-packages/pandas/core/series.py", line 851, in _get_with
    return self.loc[key]
  File "/home/dong/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/dong/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1901, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/dong/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1143, in _getitem_iterable
    self._validate_read_indexer(key, indexer, axis)
  File "/home/dong/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1206, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [['sepal length', 'sepal width', 'petal length', 'petal width']] are in the [index]"

在这里插入图片描述
部分代码如下:

# 拦截异常
warnings.filterwarnings(action = 'ignore', category=ConvergenceWarning)

# 数据加载
path = "datas/iris.data"
names = ['sepal length', 'sepal width', 'petal length', 'petal width', 'cla']

df = pd.read_csv(path, header=None, names=names)
df['cla'].value_counts()
df.head()


def parseRecord(record):
    result=[]
    r = zip(names,record)
    for name,v in r:
        if name == 'cla':
            if v == 'Iris-setosa':
                result.append(1)
            elif v == 'Iris-versicolor':
                result.append(2)
            elif v == 'Iris-virginica':
                result.append(3)
            else:
                result.append(np.nan)
        else:
            result.append(float(v))
    return result


# 1.数据转换数字以及分割
# 数据转换
datas = df.apply(lambda r: parseRecord(r), axis=1)
# 异常数据删除
datas = datas.dropna(how='any')
# 数据分割
X = datas[names[0:-1]]
Y = datas[names[-1]]

## 数据抽样(训练数据和测试数据分割)
X_train,X_test,Y_train,Y_test = train_test_split(X, Y, test_size=0.4, random_state=0)
print ("原始数据条数:%d；训练数据条数:%d；特征个数:%d；测试样本条数:%d" % (len(X), len(X_train), X_train.shape[1], X_test.shape[0]))

# 2.数据标准化
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)

# 3.特征选择(这里不进行特征选择操作)

# 4.降维处理(这里不做降维处理)

# 5.模型构建
lr = LogisticRegressionCV(Cs=np.logspace(-4,1,50), cv=3,fit_intercept=True, penalty='l2', solver='lbfgs', tol=0.01, multi_class='multinomial')
#solver：‘newton-cg’,'lbfgs','liblinear','sag'  default:liblinear
#'sag'=mini-batch
#'multi_clss':
lr.fit(X_train, Y_train.astype(int))

y_test_hot = label_binarize(Y_test.astype(int), classes=(1, 2, 3))
print(y_test_hot)

# 得到预测的损失值
lr_y_score = lr.decision_function(X_test)

# 计算roc值
lr_fpr, lr_tpr, lr_threasholds = metrics.roc_curve(y_test_hot.ravel(), lr_y_score.ravel())

# threasholds阈值
# 计算auc的值
lr_auc = metrics.auc(lr_fpr, lr_tpr)
print("Logistic算法R值: ", lr.score(X_train, Y_train))
print("Logistic算法AUC值: ", lr_auc)

# 7.模型预测
print(lr_y_score)
lr_y_predict = lr.predict(X_test)
print(lr.predict_proba(X_test))

分析

根据报错是X = datas[names[0:-1]]这一行代码有问题,但是找了好久,看看前人做的鸢尾花数据分析模型都是一样的代码,没有问题啊。
最后面的错误提示KeyError: "None of [['sepal length', 'sepal width', 'petal length', 'petal width']] are in the [index]"大概意思是指定的这些如sepal length, sepal width，等没有在索引列中，心中瞬间奔腾过10000匹草泥马，报错也报得不清不沌的。
后经高手指定，鸢尾花数据比较早，训练的模型也可能比较早，有可能是因为pandas的版本比较早API可能过时了，可以去看下它最新的API文档，大神就是大神一句话到底哪儿错了那多省事儿啊，不过这样我也就淡定不到多少东西了
好吧去看它的API帮助文档。
文档链接如下：
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
想看原文的就点吧。。。大把大把的英文
大致意思是：
pandas包的apply的参数有

func
axis=0
broadcast
raw
educe
result_type
args
**kwds
OK看这几个参数的说明
func 参数

func : function

Function to apply to each column or row.

意思是说你可以apply函数传一个函数作为它的参数，这个func参数可以应用到每一行或者每一列。

axis参数

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the function is applied:

0 or ‘index’: apply function to each column.
1 or ‘columns’: apply function to each row.

意思是说axis参数指定func函数是应用到数据的行还是列，你可以填0或1，也可以填column或row。0代表column（行），1代表row（列），默认为0。

broadcast参数

Only relevant for aggregation functions:

False or None : 
	returns a Series whose length is the length of the index or the number of columns (based on the axis parameter)
	
True : 
	results will be broadcast to the original shape of the frame, the original index and columns will be retained.
	
Deprecated since version 0.23.0: This argument will be removed in a future version, replaced by result_type=’broadcast’.

意思是说如果这个参数是False或者None的话返回一个序列，序列的长度等于列数（axis=0），或者等于行数（axis=1），具体根据axis参数

如果这个参数是True的话func返回的结果将被广播到原来的数据结构，原来一行的数据结构中其他的值保留不变。

但是这个参数在pandas0.23版本以后不推荐使用，它被result_type='broadcast’参数取代了。

看完这个参数的意思，心中瞬间来了灵感：是不是因为func函数没有被广播到这一行中的每一个字段呢？

OK，加上试下，果然好了，问题解决了.

解决方案

第一种
添加参数baoadcast=True

datas = df.apply(func, axis=1, baoadcast=True)

第二种：
添加参数result_type=‘broadcast’

datas = df.apply(func, axis=1, result_type='broadcast')

荒野雄兵

关注

14
点赞
踩
31

收藏

觉得还不错? 一键收藏
9
评论
Python的pandas模块apply函数报KeyError: None of [['xxx', 'yyy','zzz']] are in the [index]

问题重现在用:Logistic算法做鸢尾花分类预测的时候遇见这么一个错误:Traceback (most recent call last): File "/home/dong/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code exec(code_...
复制链接

扫一扫

专栏目录