Python_pandas不是熊猫(1)——DataFrame的几种索引操作

氵文大师

已于 2024-05-30 14:13:57 修改

阅读量1.5k

点赞数 2

分类专栏： # pandas不是熊猫文章标签： python

于 2020-03-27 23:20:14 首次发布

本文链接：https://blog.csdn.net/HaoZiHuang/article/details/105147013

版权

pandas不是熊猫专栏收录该内容

4 篇文章 0 订阅

订阅专栏

0.写在前面

pandas不是🐼熊猫！笔者才疏学浅，今儿才知道，python的pandas库，之所以叫pandas，不是因为pandas的作者喜欢熊猫，而是由三个单词组成的，至于哪三个，笔者留个彩蛋，大家自己查查🐶

在本篇博客你将看到，pandas的DataFrame数据结构的几种索引操作

这是本例要用到的DataFrame：

import pandas as pd
import numpy as np

example_array = np.random.uniform(size=(5,5))
columns = ['张飞', '关羽', '赵云', '黄忠', '马超']
index = ['血量', '智力', '敏捷', '攻击', '防御']
example_df = pd.DataFrame(example_array, columns=columns, index=index)

(随机生成的数据，见谅)

1. 直接索引

首先先说明，DataFrame不能像ndarray那样直接数字索引

>>> example_array[3, 4]
0.5356935275536827
>>> example_df[3, 4]
KeyError: (3, 4)

我们先看一例：

>>> example_df['张飞']['血量']
0.8552836040307202
>>> # 此便是直接索引，即先获得某一列的 Series, 再通过索引获得数据

注意，直接索引必须是先列后行，原因很简单,因为直接索引的原理是：
先获得某列数据的Series，再通过指明该Series中的index，来获得数据

相当于:

>>> series = example_df['张飞']
>>> series['血量']
0.8552836040307202
>>> # 也可以通过，DataFrame.列名 的形式来获得该列数据，如:
>>> series = example_df.张飞      # 虽然这样写不优雅，但是可以获得张飞列的 Series

2.名字索引

之前的直接索引，只能是先列后行
若我们想要先行后列呢？那我们就要用到.loc属性了

>>> example_df.loc['敏捷']['张飞'] # 传入索引是先行后列
0.016355149069134045

你可能也猜到了，这样索引的原理其实是：

>>> row = example_df.loc['敏捷'] # dataframe.loc['xx'] 的返回值也是 Series
>>> row['张飞']
0.016355149069134045

比较特殊的一点是，也可以用逗号分割，而不用两重索引：

>>> example_df.loc['敏捷']['张飞']
>>> example_df.loc['敏捷', '张飞']  # 两者返回值一样

3.数字索引

如果，我就是想像numpy那样访问元素，我该怎么操作呢？我们可以用到.iloc属性

>>> example_df.iloc[1][1] # 取得 DataFrame 的第二行第二列数据
0.09624632299283986

你可能猜到了，和numpy一样访问元素，也可以这样访问元素：

>>> example_df.iloc[1, 1] # 取得 DataFrame 的第二行第二列数据
0.09624632299283986

注意此处的索引是，先行后列，和numpy一样，无需特别记忆

4.组合索引

这种索引方式，只在旧版本有，我当前的版本已经取消了

>>> pd.__version__
'1.0.3'

我这里简单说一下，组合索引要用dataframe.ix
(现在会报错：AttributeError: 'DataFrame' object has no attribute 'ix')

这种方式是，既可以用名字索引，可也以用数字索引

example_df.ix[2:4, ['张飞', '关羽']]  
# 选取 2、3行的，张飞、关羽列的数据
# 现在已被弃用

但是，如果，我们就想用这种组合索引的方式呢？
我们可以用dataframe.loc和dataframe.iloc来替代：

>>> example_df.loc[ example_df.index[2:4], ['张飞', '关羽'] ]
>>> example_df.iloc[ 2:4, example_df.columns.get_indexer(['张飞', '关羽']) ]

老铁可以试试，这两种方式是一样的

5.`DataFrame.index`和`DataFrame.colums`

DataFrame.index和DataFrame.colums两者，分别返回行索引的"名字列表"和列索引的"名字列表"

>>> example_df.index
Index(['血量', '智力', '敏捷', '攻击', '防御'], dtype='object')
>>> example_df.columns
Index(['张飞', '关羽', '赵云', '黄忠', '马超'], dtype='object')

严格来说，两者是特殊的列表，属于pandas.core.indexes.base.Index类

example_df.columns.get_indexer(['张飞', '关羽'])的意思是找到['张飞', '关羽']在列索引名字列表中的位置索引

>>> example_df.columns.get_indexer(['张飞', '关羽'])
array([0, 1])

总结一下：

可以使用直接索引访问元素: dataframe['列名']['行名']
使用名字索引访问元素:dataframe.loc['行名']['列名']或dataframe.loc['行名', '列名']
可以使用数字索引访问元素:dataframe.iloc[行索引, 列索引]或dataframe.iloc[行索引][列索引]
混合索引已被取消，只能用dataframe.iloc和dataframe.loc的对应方式来替代

氵文大师

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
3
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录