dataframe 如何增加新的索引_pandas-被忽视的索引index设置

最新推荐文章于 2024-10-15 23:20:46 发布

weixin_30273263

最新推荐文章于 2024-10-15 23:20:46 发布

阅读量2.3w

点赞数 3

文章标签： dataframe 如何增加新的索引

本文链接：https://blog.csdn.net/weixin_30273263/article/details/112248795

版权

这篇博客探讨了pandas DataFrame中设置索引的重要性，包括自动设置、多级索引的创建，以及DataFrame.set_index方法的参数详解，如drop、append、inplace和verify_integrity。通过实例展示了如何使用这些参数来修改或扩展数据索引。

摘要由CSDN通过智能技术生成

索引我们一般定义为能唯一定义这条数据的一个标签

和以前理解的数据库或者excel中的概念稍有不同，excel中的的索引概念并不是很大，更多的是我们主动去为它设置一个类索引的东西，比如1，2，3等等，来标记一条数据，数据库中倒是会自动为你设置一个编号，而在pandas中索引的作用却非常大，下面一起来学习下吧！

哪里需要索引

pandas会自动为你设置一个索引，不管你愿不愿意
可以设置多级索引
数据筛选loc中需要index

为什么要设置索引

创建数据或者从外部导入数据后，如果索引不合适，或者需要设置多级索引的时候就需要设置索引

重要知识点：

设置索引有3中方式，直接传入一个列名作为索引，也可以添加一个成为多级索引，传入列表直接成为多级索引
如果需要修改原始数据就设置inplace
担心有重复就设置：verify_integrity

本文涵盖内容：

本文内容比较简单：

参数详解
索引的多种设置方式
参数的实例

pandas.DataFrame.set_index-参数详解

DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)[source]

使用现有列设置DataFrame索引。

使用一个或多个现有列或数组(长度正确)设置DataFrame索引(行标签)。索引可以替换现有索引或对其进行扩展。

以上为官方的解释，其实主要功能还是利用现在的列来当做或者扩充索引，扩充就成为多级索引，传入列表也会成为多级索引

keys : label or array-like or list of labels/arrays

此参数可以是单个列键，与调用DataFrame长度相同的单个数组，也可以是包含列键和数组的任意组合的列表。这里，“数组”包含Series，Index，np.ndarray和Iterator的实例。

drop : bool, default True

删除要用作新索引的列。

append : bool, default False

是否将列附加到现有索引。

inplace : bool, default False

True则替换原有数据，默认或者False则返回新对象

verify_integrity : bool, default False

检查新索引是否有重复项。否则，请在必要时推迟检查。设置为False将改善此方法的性能。

实例：

创建数据：

df = pd.DataFrame({'month': [1, 4, 7, 10], 'year': [2012, 2014, 2013, 2014], 'sale': [55, 40, 84, 31]})df

替换原有索引：

df.set_index('month')

增加一个索引创建多级索引：

df.set_index('month',append=True)

传入列表创建多级索引：

df.set_index(['year', 'month'])

多级索引的第二种创建方式：

df.set_index([pd.Index([1, 2, 3, 4]), 'year'])

drop:

df.set_index('month',drop=True)

inplace:如果为True就会替换原始数据

默认False：

设置为True:

df.set_index('month',inplace=True)

verify_integrity：检查会否含有重复，索引一般不要有重复的，如果担心就设置一个。

df.set_index('month',verify_integrity=True)

可以发现什么变化都没有

修改源数据：

df.loc[0,"month"]=4df

df.set_index('month',verify_integrity=True)

官方链接：

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html#pandas-dataframe-set-index

英文原版解释：

DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)[source]

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

Parameters:keys : label or array-like or list of labels/arrays

This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator.

drop : bool, default True

Delete columns to be used as the new index.

append : bool, default False

Whether to append columns to existing index.

inplace : bool, default False

Modify the DataFrame in place (do not create a new object).

verify_integrity : bool, default False

Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.

weixin_30273263

关注

3
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫