索引对象Index和索引的基本操作

最新推荐文章于 2022-01-04 15:05:30 发布

芒果去核

最新推荐文章于 2022-01-04 15:05:30 发布

阅读量1.3k

点赞数 3

分类专栏： Pandas笔记文章标签： python pandas

本文链接：https://blog.csdn.net/weixin_45499440/article/details/120982536

版权

Pandas笔记专栏收录该内容

11 篇文章 2 订阅

订阅专栏

pandas 笔记004

四、索引对象Index和索引的基本操作

import pandas as pd
import numpy as np

1. 索引对象Index

1.1 Series和DataFrame

Series和DataFrame中的索引都是Index对象。

Series：

pd1 = pd.Series(range(5),index = ['A','B','C','D','E']) #通过列表创建Series索引并指定索引名
print(pd1)
print("="*20)
print(type(pd1.index))   #Series是一个索引对象

A    0
B    1
C    2
D    3
E    4
dtype: int64
====================
<class 'pandas.core.indexes.base.Index'>

DataFrame：

pd2 = pd.DataFrame(np.arange(9).reshape(3,3),index=['A','B','C'],columns=['M','N','Q'])  
#通过一个二维数组创建DataFrame索引并指定索引行列名
print(pd2)
print("="*20)
print(type(pd2.index))   #dataframe是一个索引对象

   M  N  Q
A  0  1  2
B  3  4  5
C  6  7  8
====================
<class 'pandas.core.indexes.base.Index'>

1.2 索引对象不可变

pd1.index[1] = 2  #报错

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1226982f94cb> in <module>
----> 1 pd1.index[1] = 2  #报错

F:\Anaconda_all\Anaconda\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   4275     @final
   4276     def __setitem__(self, key, value):
-> 4277         raise TypeError("Index does not support mutable operations")
   4278 
   4279     def __getitem__(self, key):

TypeError: Index does not support mutable operations

1.3 常见的Index种类

Index，索引
Int64Index，整数索引
MultiIndex，层级索引
DatetimeIndex，时间戳类型

2. 索引的一些基本操作

重新索引
增
删
改
查
高级索引

2.1 重新索引 reindex

2.1.1 Series索引

ps1 = pd.Series(range(5),index = ['A','B','C','D','E'])
ps1

A    0
B    1
C    2
D    3
E    4
dtype: int64

ps2 = ps1.reindex(['b','A','C','d','E','F']) #重建行索引
print(ps1)   #原Series索引未改变
print("="*30)
print(ps2)   #如果新索引和原索引不同，返回NAN，相同则返回原索引对应的值，和索引顺序无关

A    0
B    1
C    2
D    3
E    4
dtype: int64
==============================
b    NaN
A    0.0
C    2.0
d    NaN
E    4.0
F    NaN
dtype: float64

2.1.2 DataFrame索引

ps3 = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
ps3

	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

重建行索引:

#重建行索引
ps4 = ps3.reindex(['e','B','A'])
print(ps3)    #原DataFrame索引未改变
print("="*20)
print(ps4)

   a  b   c   d
A  0  1   2   3
B  4  5   6   7
C  8  9  10  11
====================
     a    b    c    d
e  NaN  NaN  NaN  NaN
B  4.0  5.0  6.0  7.0
A  0.0  1.0  2.0  3.0

重建列索引:

#重建列索引
ps5 = ps3.reindex(columns = ['b','c','q','v'])
print(ps3)     #原DataFrame索引未改变
print("="*20)
print(ps5)

    b	c	q	v
A	1	2	NaN	NaN
B	5	6	NaN	NaN
C	9	10	NaN	NaN

2.2 增

2.2.1 Series索引

p1 = pd.Series(range(5),index = ['A','B','C','D','E'])
p1

A    0
B    1
C    2
D    3
E    4
dtype: int64

改变原索引:

#改变原索引
p1['F'] = 9
p1

A    0
B    1
C    2
D    3
E    4
F    9
dtype: int64

不改变原索引:

#创建一个新的索引对象，不改变原索引
s1 = pd.Series({'g':666})
p2 = p1.append(s1)
print(p1)     #原索引不变
print("="*20)
print(p2)

A    0
B    1
C    2
D    3
E    4
F    9
dtype: int64
====================
A      0
B      1
C      2
D      3
E      4
F      9
g    666
dtype: int64

2.2.2 DataFrame索引

增加列

#DataFrame索引
q = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
q

	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

默认改变列，在列最右边新增列，影响原索引

q['t'] = 9      #新增的t列全为 9
print(q)
print("="*20)
q['y'] = [10,12,14]  #指定新增列的值
print(q)
print("="*20)
q['m'] = ['19','32','24']  #指定新增列的值,加引号
print(q)

   a  b   c   d  t
A  0  1   2   3  9
B  4  5   6   7  9
C  8  9  10  11  9
====================
   a  b   c   d  t   y
A  0  1   2   3  9  10
B  4  5   6   7  9  12
C  8  9  10  11  9  14
====================
   a  b   c   d  t   y   m
A  0  1   2   3  9  10  19
B  4  5   6   7  9  12  32
C  8  9  10  11  9  14  24

向指定位置新增列(insert)

#向指定位置新增列
u = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
u

    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

insert插入会影响原索引

u.insert(0,'t',2) #在第0列前新增列t作为第0列，值全为2
print(u)   #
print("="*20)
u.insert(1,'r',[6,66,666])  #在第1列前新增列t作为第1列
print(u)
print("="*20)
u.insert(2,'s',['7','77','777'])  #在第2列前新增列t作为第2列
print(u)

   t  a  b   c   d
A  2  0  1   2   3
B  2  4  5   6   7
C  2  8  9  10  11
====================
   t    r  a  b   c   d
A  2    6  0  1   2   3
B  2   66  4  5   6   7
C  2  666  8  9  10  11
====================
   t    r    s  a  b   c   d
A  2    6    7  0  1   2   3
B  2   66   77  4  5   6   7
C  2  666  777  8  9  10  11

增加行

#增加行
qt = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
qt

	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

使用标签索引loc：

#使用标签索引loc,改变了原索引
qt.loc['D'] = [1,11,111,1111]  #增加行D
qt

    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	1	11	111	1111

使用append

row = {'a':6,'b':6,'c':6,'d':6}
qt1 = qt.append(row,ignore_index=True)  #要加上 ignore_index=True语句,(忽略掉原来的行索引名字)，否则报错
print(qt)  #原索引不变
print("="*20)
print(qt1)

   a   b    c     d
A  0   1    2     3
B  4   5    6     7
C  8   9   10    11
D  1  11  111  1111
====================
   a   b    c     d
0  0   1    2     3
1  4   5    6     7
2  8   9   10    11
3  1  11  111  1111
4  6   6    6     6

2.3 删

2.3.1 del

会改变原索引。

Series

k1 = pd.Series(range(5),index = ['A','B','C','D','E'])
k1

A    0
B    1
C    2
D    3
E    4
dtype: int64

del k1['A'] #删除行
k1

B    1
C    2
D    3
E    4
dtype: int64

DataFrame

k2 = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
k2

    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

del k2['b']   #删除列b
k2

2.3.2 drop

不改变原索引，删除后作为一个新的索引对象。

Series

kt1 = pd.Series(range(4),index = ['A','B','C','D'])
kt1

A    0
B    1
C    2
D    3
dtype: int64

删除轴上的一条数据:

#删除轴上的一条数据
kt2 = kt1.drop('A')
print(kt1)  #原索引对象未改变
print("="*20)
print(kt2)

A    0
B    1
C    2
D    3
dtype: int64
====================
B    1
C    2
D    3
dtype: int64

删除多条数据:

#删除多条数据
kt3 = kt1.drop(['A','C'])
print(kt1)  #原索引对象未改变
print("="*20)
print(kt3)

A    0
B    1
C    2
D    3
dtype: int64
====================
B    1
D    3
dtype: int64

DataFrame

tj1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
tj1

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

默认删除行（axis=0）

#默认删除行（axis=0）
tj2 = tj1.drop('B') #删除一行
print(tj1)   #原索引对象未改变
print("="*20)
print(tj2)
print("="*20)
tj3 = tj1.drop(['A','C']) #删除多行
print(tj1)    #原索引对象未改变
print("="*20)
print(tj3)

    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
A   0   1   2   3
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
B   4   5   6   7
D  12  13  14  15

删除列(axis=1或axis=‘columns’)

#删除列(axis=1或axis='columns')
tj4 = tj1.drop('m',axis=1) #删除一列
print(tj1)
print("="*20)
print(tj4)
print("="*20)
tj5 = tj1.drop(['m','o'],axis='columns') #删除多列
print(tj1)
print("="*20)
print(tj5)

    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    n   o   p
A   1   2   3
B   5   6   7
C   9  10  11
D  13  14  15
====================
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    n   p
A   1   3
B   5   7
C   9  11
D  13  15

drop()的inplace属性

在原对象上删除，不会返回新对象。

#inplace属性 在原对象上删除，不会返回新对象
bt = pd.Series(range(4),index = ['A','B','C','D'])
bt

A    0
B    1
C    2
D    3
dtype: int64

bt.drop('A',inplace=True)
bt

B    1
C    2
D    3
dtype: int64

2.4 改

2.4.1 Series索引

bpr = pd.Series(range(4),index = ['A','B','C','D'])
bpr

A    0
B    1
C    2
D    3
dtype: int64

标签索引

bpr['A'] = 666  #标签索引
bpr

A    666
B      1
C      2
D      3
dtype: int64

位置索引

bpr[1] = 777  #位置索引
bpr

A    666
B    777
C      2
D      3
dtype: int64

2.4.2 DataFrame索引

tu1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
tu1

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

默认改变列

对象[‘列’]

tu1['p'] = 4   #将p列全改为4
tu1

    m	n	o	p
A	0	1	2	4
B	4	5	6	4
C	8	9	10	4
D	12	13	14	4

对象[‘列’]

tu1['n'] = ['2','22','222','2222']
tu1

    m	n	o	p
A	0	2	2	4
B	4	22	6	4
C	8	222	10	4
D	12	2222	14	4

对象.列

# 对象.列 : 效果和上面的 对象['列'] 一样
tu1.m = [1,2,3,4]
tu1

   m	n	  o	  p
A	1	2	  2	  4
B	2	22	  6	  4
C	3	222	  10  4
D	4	2222  14  4

使用标签索引 loc 修改行

#使用标签索引loc
td1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
td1

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

loc[‘行名’]

td1.loc['A'] = 666  #修改A行，值全为666
td1

m	n	o	p
A	666	666	666	666
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

修改精确值

#修改一个值
td1.loc['B','p'] = 100  #修改B行p列的值为100
td1

    m	n	o	p
A	666	666	666	666
B	4	5	6	100
C	8	9	10	11
D	12	13	14	15

2.5 查

2.5.1 Series索引

cc = pd.Series(range(4),index = ['A','B','C','D'])
cc

A    0
B    1
C    2
D    3
dtype: int64

行索引

cc['A']  #标签索引

cc[0]   #位置索引

切片索引

#位置切片索引
cc[1:4]  #取左不取右

B    1
C    2
D    3
dtype: int64

#标签切片索引
cc['B':'D']    #左右都取

B    1
C    2
D    3
dtype: int64

不连续索引(两个中括号)

cc[['A','B']] #标签不连续索引

A    0
B    1
dtype: int64

cc[[0,1]]   #位置不连续索引

A    0
B    1
dtype: int64

布尔索引

#满足条件返回True,否则返回False
cc > 2

A    False
B    False
C    False
D     True
dtype: bool

将满足条件（True）的索引对应的值返回

cc[cc>2]   #将满足条件（True）的索引对应的值返回

D    3
dtype: int64

2.5.2 DataFrame索引

red = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
red

	m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

列索引

注意：1.默认情况只能取列索引，取行索引报错。2.只能通过索引名字取，不可通过位置索引(如red[0]）取值

#1.列索引（默认情况只能取列索引，取行索引报错）
red['n']  #只能通过索引名字取，不可通过位置索引取值

A     1
B     5
C     9
D    13
Name: n, dtype: int32

取多列（可不连续）

#取多列（可不连续）
red[['m','p']]

取某个值

#取某个值
red['m']['B']  #第一个中括号表示列，第二个中括号表示行

切片

#切片
red[1:3]  #获取的是行,获取列需要用到loc高级索引

    m	n	o	p
B	4	5	6	7
C	8	9	10	11

2.6 高级索引

loc 标签索引
iloc 位置索引
ix 标签与位置混合索引

2.6.1 loc 标签索引

基于自定义的索引名（标签索引）

Series

ts = pd.Series(range(4),index = ['A','B','C','D'])
ts

A    0
B    1
C    2
D    3
dtype: int64

ts.loc['A':'C']   #Series 中loc和ts['A':'C']的普通标签切片一样（标签切片左右皆取）

A    0
B    1
C    2
dtype: int64

DataFrame

green = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
green

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

green.loc['A','m']  #第一行第一列

green.loc['A':'C','m':'n']  #第一个参数是行的范围（可以是单行），第二个参数是列的范围（可以是单列）

2.6.2 iloc 位置索引

作用和loc一样，不过是基于索引编号来索引

Series

lol = pd.Series(range(4),index = ['A','B','C','D'])
lol

A    0
B    1
C    2
D    3
dtype: int64

lol.iloc[1]

lol.iloc[1:3] #取左不取右

B    1
C    2
dtype: int64

DataFrame

gto = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
gto

	m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	1

gto.iloc[0,1]  #第一个参数为行，第二个参数为列，这里表示取第一行第二列的值

位置切片取左不取右

gto.iloc[1:3,0:3]  #第一个参数为行，第二个参数为列(位置切片取左不取右)

	m	n	o
B	4	5	6
C	8	9	10

芒果去核

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
索引对象Index和索引的基本操作

pandas 笔记004目录pandas 笔记004四、索引对象Index和索引的基本操作1. 索引对象Index1.1 Series和DataFrame1.2 索引对象不可变1.3 常见的Index种类2. 索引的一些基本操作2.1 重新索引 reindex2.1.1 Series索引2.1.2 DataFrame索引2.2 增2.2.1 Series索引2.2.2 DataFrame索引2.3 删2.3.1 del2.3.2 drop2.4 改2.4.1 Series索引2.4.2 D
复制链接

扫一扫