0 引言
继续学习Pandas,一个表格实际处理时,会有各种各样表格中数据的选择,主要是对DataFrame的操作,比如按列表头选取,按行索引选取等等。
1 Pandas中DataFrame的数据选择
先导入Pandas和Numpy库
import pandas as pd
import numpy as np
创建一个DataFrame表格
dates = pd.date_range('20200101',periods=6)
df1 = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['A','B','C','D'])
df1
A | B | C | D | |
---|---|---|---|---|
2020-01-01 | 0 | 1 | 2 | 3 |
2020-01-02 | 4 | 5 | 6 | 7 |
2020-01-03 | 8 | 9 | 10 | 11 |
2020-01-04 | 12 | 13 | 14 | 15 |
2020-01-05 | 16 | 17 | 18 | 19 |
2020-01-06 | 20 | 21 | 22 | 23 |
通过列表头来提取数据
df1['A']
2020-01-01 0
2020-01-02 4
2020-01-03 8
2020-01-04 12
2020-01-05 16
2020-01-06 20
Freq: D, Name: A, dtype: int32
也可以用 .A ,效果与前面一样
df1.A
2020-01-01 0
2020-01-02 4
2020-01-03 8
2020-01-04 12
2020-01-05 16
2020-01-06 20
Freq: D, Name: A, dtype: int32
获取指定行索引的表格数据
df1[0:2]
A | B | C | D | |
---|---|---|---|---|
2020-01-01 | 0 | 1 | 2 | 3 |
2020-01-02 | 4 | 5 | 6 | 7 |
通过行标签来获取指定数据
df1['20200102':'20200104']
A | B | C | D | |
---|---|---|---|---|
2020-01-02 | 4 | 5 | 6 | 7 |
2020-01-03 | 8 | 9 | 10 | 11 |
2020-01-04 | 12 | 13 | 14 | 15 |
通过标签提取指定表格数据
# 通过标签选择数据
df1.loc['20200102']
A 4
B 5
C 6
D 7
Name: 2020-01-02 00:00:00, dtype: int32
提取指定行和列数据
df1.loc['20200101',['A','B','C']]
A 0
B 1
C 2
Name: 2020-01-01 00:00:00, dtype: int32
:放在 loc 第一个参数位置,获取所有行
df1.loc[:,['A','B']]
A | B | |
---|---|---|
2020-01-01 | 0 | 1 |
2020-01-02 | 4 | 5 |
2020-01-03 | 8 | 9 |
2020-01-04 | 12 | 13 |
2020-01-05 | 16 | 17 |
2020-01-06 | 20 | 21 |
通过 .iloc 指定位置获取数据
#通过位置选择数据
df1.iloc[2]
A 8
B 9
C 10
D 11
Name: 2020-01-03 00:00:00, dtype: int32
df1.iloc[1:3,2:4] # 提取1到3行,2到4列
C | D | |
---|---|---|
2020-01-02 | 6 | 7 |
2020-01-03 | 10 | 11 |
混合标签获取数据
# 混合标签位置选择
df1.ix[2:4,['A','C']]
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
A | C | |
---|---|---|
2020-01-03 | 8 | 10 |
2020-01-04 | 12 | 14 |
df1.ix['20200102':'20200104',2:4]
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
"""Entry point for launching an IPython kernel.
C | D | |
---|---|---|
2020-01-02 | 6 | 7 |
2020-01-03 | 10 | 11 |
2020-01-04 | 14 | 15 |
df1.A
2020-01-01 0
2020-01-02 4
2020-01-03 8
2020-01-04 12
2020-01-05 16
2020-01-06 20
Freq: D, Name: A, dtype: int32
提取指定列并判断该列是否大于某个数,结果显示True或False
df1.A > 6
2020-01-01 False
2020-01-02 False
2020-01-03 True
2020-01-04 True
2020-01-05 True
2020-01-06 True
Freq: D, Name: A, dtype: bool
df1[df1.A > 6]
A | B | C | D | |
---|---|---|---|---|
2020-01-03 | 8 | 9 | 10 | 11 |
2020-01-04 | 12 | 13 | 14 | 15 |
2020-01-05 | 16 | 17 | 18 | 19 |
2020-01-06 | 20 | 21 | 22 | 23 |