Series索引(obj[…])的工作方式类似于NumPy数组的索引,只不过Series的索引值不只是整数。下面是几个例子:
# -- coding: utf-8 --
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4.0),index=['a','b','c','d'])
print obj
结果
a 0.0
b 1.0
c 2.0
d 3.0
dtype: float64
obj['b']
# 1.0
obj[1]
# 1.0
obj[2:4]
# c 2.0
# d 3.0
obj[['b','a','d']]
# b 1.0
# a 0.0
# d 3.0
obj[[1,3]]
# b 1.0
# d 3.0
obj[obj<2]
# a 0.0
# b 1.0
利用标签的切片运算与普通的python切片运算不同,其末端是包含的(inclusive)
obj['b':'c']
# b 1.0
# c 2.0
设置方式也很简单:
obj['b':'c'] = 5
print obj
结果
a 0.0
b 5.0
c 5.0
d 3.0
dtype: float64
对DataFrame进行索引其实就是获取一个或多个列:
data = pd.DataFrame(np.arange(16).reshape((4,4)),
index=['Ohio','Colorado','Utah','New York'],
columns=['one','two','three','four'])
# one two three four
# Ohio 0 1 2 3
# Colorado 4 5 6 7
# Utah 8 9 10 11
# New York 12 13 14 15
data['two']
# Ohio 1
# Colorado 5
# Utah 9
# New York 13
data[['three','one']]
# three one
# Ohio 2 0
# Colorado 6 4
# Utah 10 8
# New York 14 12
这种索引方式有几个特殊的情况。首先通过切片或布尔型数组选取行:
print data[:2]
结果
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
print data[data['three']>5]
结果
one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
另一种用法是通过布尔型DataFrame(比如下面这个由标量比较运算得出的)进行索引:
data<5
# one two three four
# Ohio True True True True
# Colorado True False False False
# Utah False False False False
# New York False False False False
data[data<5] = 0
print data
结果
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
为了在DataFrame的行上进行标签索引,引入了专门的索引字段ix。它使你可以通过NumPy式的标记法以及轴标签从DataFrame中选取行和列的子集。
data.ix[['Colorado','Utath'],[3,0,1]]
# four one two
# Colorado 7.0 0.0 5.0
# Utath NaN NaN NaN
data.ix['Colorado',['two','three']]
# two 5
# three 6
# Name: Colorado, dtype: int32
data.ix[2]
# one 8
# two 9
# three 10
# four 11
# Name: Utah, dtype: int32
data.ix[:'Utah','two']
# Ohio 0
# Colorado 5
# Utah 9
# Name: two, dtype: int32
data.ix[data.three>5,:3]
# one two three
# Colorado 0 5 6
# Utah 8 9 10
# New York 12 13 14