pandas 数据筛选

最新推荐文章于 2024-08-19 08:30:08 发布

zzpdbk

最新推荐文章于 2024-08-19 08:30:08 发布

阅读量486

点赞数 1

分类专栏： python数据处理

本文链接：https://blog.csdn.net/zzpdbk/article/details/79270967

版权

python数据处理专栏收录该内容

7 篇文章 3 订阅

订阅专栏

本文探讨了在实际操作中如何运用pandas的loc和iloc方法进行数据选择，包括通过轴标签和整数定位行和列，以及利用函数进行筛选。

摘要由CSDN通过智能技术生成

不断更新，包含我在实际使用的例子

loc and iloc. 这两个方法能通过axis labels(loc)或integer(iloc)，来选择行或列。

data = pd.DataFrame(np.arange(12).reshape((3,4)),columns = ['one','two','three','four'],index = ['CZ','RP','HS'])

In [4]: data
Out[4]: 
    one  two  three  four
CZ    0    1      2     3
RP    4    5      6     7
HS    8    9     10    11

.loc[ index, columns ] 顺序改了会报错。index , columns 可以是 list ，但需包含在data的 index 和 columns 中

In [7]: data.loc['CZ',['one','three']]
Out[7]: 
one      0
three    2
Name: CZ, dtype: int32

In [8]: data.loc[['CZ','HS'],['one','three']]
Out[8]: 
    one  three
CZ    0      2
HS    8     10

同iloc实现相同的效果：

In [12]: data.iloc[1,[0,2]]
Out[12]: 
one      4
three    6
Name: RP, dtype: int32

.loc , 和 .iloc 中的 index 和 columns 可以用切片

In [14]: data.iloc[:, :2]
Out[14]: 
    one  two
CZ    0    1
RP    4    5
HS    8    9

使用bool类型筛选

In [16]: data[data.two > 2]
Out[16]: 
    one  two  three  four
RP    4    5      6     7
HS    8    9     10    11

当条件不止一个是用 & ，|，连接

In [17]: data
Out[17]: 
    one  two  three  four
CZ    0    1      2     3
RP    4    5      6     7
HS    8    9     10    11

In [19]: data[(data.two > 2) & (data.three < 7)]  # 筛选 data.two 大于2且data.three 小于7的数据。
Out[19]: 
    one  two  three  four
RP    4    5      6     7

In [21]: data[(data.two > 2) | (data.three < 7)] # 筛选 data.two 大于2或 data.three 小于7的数据。
Out[21]: 
    one  two  three  four
CZ    0    1      2     3
RP    4    5      6     7
HS    8    9     10    11

通过函数筛选

定义一个筛选函数f

In [22]: def f(x):
    ...:     if x > 5:
    ...:         return 'big'
    ...:     return 'small'

In [25]: data.one.map(f)
Out[25]: 
CZ    small
RP    small
HS      big
Name: one, dtype: object

此时返回的是经 f 作用的新的series，原data不变，如果要改变原data，可进行赋值。

In [26]: data.one = data.one.map(f)

In [27]: data
Out[27]: 
      one  two  three  four
CZ  small    1      2     3
RP  small    5      6     7
HS    big    9     10    11