今天还是接着昨天的那一篇文章来说,今天我们来好好聊聊Pandas
中的.loc
方法!
我们首先来看一下文档里是怎么说的:
pandas provides a suite of methods in order to have purely label based indexing.
The .loc attribute is the primary access method. The following are valid inputs:
- A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index. This use is not an integer position along the index) - A list or array of labels
['a', 'b', 'c']
- A slice object with labels
'a':'f'
(note that contrary to usual python slices, both the start and the stop are included, when present in the index! - also See Slicing with labels) - A boolean array
- A
callable
, see Selection By Callable
>>> import numpy as np
>>> import pandas as pd
>>> s1 = pd.Series(np.random.randn(6), index=list('abcdef'))
>>> s1
a -0.354041
b 0.286674
c -1.144354
d -2.290284
e -0.299573
f -0.011348
dtype: float64
复制代码
我们定义了一个包含6个随机数的pandas.Series
,这6个数的索引标签(a label of the index)分别是abcdef
这6个字符,我们可以通过索引标签来获取我们想要获取的数据:
>>> s1.loc['b']
0.28667372019035603
复制代码
我还想要再强调一下,我们是按照索引的标签(a label of the index)了来获取数据的,跟这个索引本身是几并无关系:
>>> s2 = pd.Series(np.random.randn(6), index=[5, 4, 3, 2, 1, 0])
>>> s2
5 0.063622
4 -0.789719
3 -0.916464
2 1.023828
1 -0.440047
0 0.269705
dtype: float64
>>> s2.loc[5]
0.063622356476971106
复制代码
可以看到,我们这里生成了一个新的包含6个随机数的pandas.Series
,这6个数的索引标签(a label of the index)分别是5,4,3,2,1,0
这6个整数,这与他们在这个pandas.Series中处于第几行并无关系。当我们传入整数5的时候,返回了标签5
所在行所对应的数字,而并非第5行所对应的数字。
我们也可以通过一个标签切片(a slice objects with labels)来获取多个数据也可以进行赋值:
>>> s1.loc['b':'f']
b 0.286674
c -1.144354
d -2.290284
e -0.299573
f -0.011348
dtype: float64
>>> s1.loc['b':]
b 0.286674
c -1.144354
d -2.290284
e -0.299573
f -0.011348
dtype: float64
>>> s1.loc['d':'f'] = 0
>>> s1
a -0.354041
b 0.286674
c -1.144354
d 0.000000
e 0.000000
f 0.000000
dtype: float64
复制代码
可以看到这里的切片用法和Python
原生的list
的切片是不一样的,冒号两边的start
和stop
位置都被包含了进来,要注意两者之间的差别!
在pandas
的DataFrame
中.loc
方法并没有很大区别,以下展示代码,不进行过多赘述
>>> df1 = pd.DataFrame(np.random.randn(6, 4),
index=list('abcdef'),
columns=list('ABCD'))
>>> df1
A B C D
a 0.031419 0.658151 1.069829 -1.366788
b 0.889844 -1.402487 0.183858 -0.037312
c 0.278374 -0.122152 0.429787 -1.251808
d -0.935268 -0.768464 -1.343263 -0.435845
e -0.612629 -1.538650 -1.774796 1.013778
f -1.313907 -0.472731 -1.635683 0.140725
>>> df1.loc[['a', 'b', 'd'], :]
A B C D
a 0.031419 0.658151 1.069829 -1.366788
b 0.889844 -1.402487 0.183858 -0.037312
d -0.935268 -0.768464 -1.343263 -0.435845
>>> df1.loc['d':, 'A':'C']
A B C
d -0.935268 -0.768464 -1.343263
e -0.612629 -1.538650 -1.774796
f -1.313907 -0.472731 -1.635683
>>> df1.loc['a']
A 0.031419
B 0.658151
C 1.069829
D -1.366788
Name: a, dtype: float64
>>> df1.xs('a')
A 0.031419
B 0.658151
C 1.069829
D -1.366788
Name: a, dtype: float64
>>> df1.loc['a'] > 1
A False
B False
C True
D False
Name: a, dtype: bool
>>> df1.loc[:, df1.loc['a'] > 1]
C
a 1.069829
b 0.183858
c 0.429787
d -1.343263
e -1.774796
f -1.635683
>>> df1.loc['a', 'A']
0.03141854106892028
>>> df1.at['a', 'A']
0.03141854106892028
复制代码
最后再看一点:
>>> s = pd.Series(list('abcde'), index=[0, 3, 2, 5, 4])
>>> s
0 a
3 b
2 c
5 d
4 e
dtype: object
>>> s.sort_index()
0 a
2 c
3 b
4 e
5 d
dtype: object
>>> s.sort_index().loc[1:6]
2 c
3 b
4 e
5 d
dtype: object
复制代码
由上面我们看到我们可以根据索引标签(a label of the index)来进行排序,并且可以通过索引标签来筛选数据
关于pandas.loc
方法的用法就写到这里啦!文章中涉及的所有代码都可以在我的Github中找到!文章和代码中有什么错误错误恳请大家不吝赐教!欢迎你们留言评论!