数据分析课堂疑问+验证(一)2022.03.02_dateframe设置索引之后还能用吗-CSDN博客

本文链接：https://blog.csdn.net/m0_48275578/article/details/123244708

问题

DataFrame索引

1. [ ] => 列索引
注意：有名称必须使用名称索引

import numpy as np
import pandas as pd

r = np.random
r.seed(1)
df1 = pd.DataFrame(r.randint(1,100,(3,4)) ， index=list('ABC'))
df2 = pd.DataFrame(r.randint(1,100,(3,4)) , columns=list('abcd'))
print(df1)
print(df2)

运行结果

    0   1   2   3
A  38  13  73  10
B  76   6  80  65
C  17   2  77  72
    a   b   c   d
0   7  26  51  21
1  19  85  12  29
2  30  15  51  69

1. 无索引名称

print(df1[1])

运行结果

A    13
B     6
C     2
Name: 1, dtype: int32

2. 有索引名称

print(df2[1])		# 报错
print(df2['a'])

运行结果

0     7
1    19
2    30
Name: a, dtype: int32

print(df1.columns)
print(type(df1.columns[0]))
print(df2.columns)

RangeIndex(start=0, stop=4, step=1)
<class 'int'>
Index(['a', 'b', 'c', 'd'], dtype='object')

小结

DataFrame使用名称索引和位置索引时，选取的是列
DataFrame 指定columns时，不能再使用位置索引(抛异常)，只能使用名称索引
DataFrame 指定index后, 可以使用位置索引

2. 切片 => 行索引

print(df1['A':'B'])			选取'A'行与'B'行
print(df2[:2])				选取 0 行与 1行

    0   1   2   3
A  38  13  73  10
B  76   6  80  65
    a   b   c   d
0   7  26  51  21
1  19  85  12  29

3. 花式 => 列索引

print(df1[[0, 2]])			选取0列与2列
print(df2[['b', 'c']])		选取'b'列与'c'列

4. 布尔索引 => 分为series索引和DataFrame索引
注意：Series:列为True的显示
DataFrame元素为False的是NaN

1. Series索引

（1）比列

s1 = df1[0] > 50		# 第0列元素大于50 便为True
print(s1)
print(type(s1))
print(df1[s1])

A    False
B     True
C    False
Name: 0, dtype: bool
<class 'pandas.core.series.Series'>
    0  1   2   3
B  76  6  80  65

注意
比行的布尔索引为DataFrame

2. DataFrame索引

（1）整体

d1 = df2 > 50
print(d1)
print(type(d1))
print(df2[d1])

       a      b      c      d
0  False  False   True  False
1  False   True  False  False
2  False  False   True   True
<class 'pandas.core.frame.DataFrame'>
    a     b     c     d
0 NaN   NaN  51.0   NaN
1 NaN  85.0   NaN   NaN
2 NaN   NaN  51.0  69.0

（2）比行

s2 = df1[1:2] > 50
print(s2)
print(type(s2))
print(df1[s2])

      0      1     2     3
B  True  False  True  True
<class 'pandas.core.frame.DataFrame'>
      0   1     2     3
A   NaN NaN   NaN   NaN
B  76.0 NaN  80.0  65.0
C   NaN NaN   NaN   NaN