1 loc
——通过行标签索引行数据
1.loc[1]
表示索引的是第1行(index 是整数)
data = [[1,2,3],[4,5,6]]
index = [0,1]
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print (df.loc[1])
'''
a 4
b 5
c 6
Name: 1, dtype: int64
'''
2.loc[‘d’]
表示索引的是第’d’行(index 是字符)
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.loc['d'])
'''
a 1
b 2
c 3
Name: d, dtype: int64
'''
使用行号索引就会出错
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print(df.loc[1])
'''
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'>
with these indexers [1] of <class 'int'>
'''
3 如果想索引列数据,像这样做会报错
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.loc['a']
'''
KeyError: 'the label [a] is not in the [index]'
'''
4 loc可以获取多行数据
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.loc['d':])
'''
a b c
d 1 2 3
e 4 5 6
'''
5 loc扩展——索引某行某列
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.loc['d',['b','c']])
'''
b 2
c 3
'''
6 loc扩展——索引某列
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.loc[:,['c']])
'''
c
d 3
e 6
'''
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
index=['cobra', 'viper', 'sidewinder'],
columns=['max_speed', 'shield'])
print(df)
print(df.loc[['viper', 'sidewinder'],['max_speed', 'shield']])
'''
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
max_speed shield
viper 4 5
sidewinder 7 8
'''
当然获取某列数据最直接的方式是df.[列标签],但是当列标签未知时可以通过这种方式获取列数据。
需要注意的是,dataframe的索引[1:3]是包含1,2,3的,与平时的不同。
2. iloc
——通过行号获取行数据
2.1 想要获取哪一行就输入该行数字
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.iloc[1])
'''
a 4
b 5
c 6
Name: e, dtype: int64
'''
2.2 通过行标签索引会报错
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.iloc['e'])
'''
TypeError: Cannot index by location index with a non-integer key
'''
2.3 同样通过行号可以索引多行
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.iloc[0:])
'''
a b c
d 1 2 3
e 4 5 6
'''
2.4 iloc索引列数据
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print (df.iloc[:,[1]])
'''
b
d 2
e 5
'''
3. ix——结合前两种的混合索引
3.1 通过行号索引
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print(df.ix[1])
'''
a 4
b 5
c 6
Name: e, dtype: int64
'''
3.2 通过行标签索引
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print(df)
'''
a b c
d 1 2 3
e 4 5 6
'''
print(df.ix['e'])
'''
a 4
b 5
c 6
Name: e, dtype: int64
'''
[ ]
或者__getitem__()
和上面的行优先相反,他是列优先,而且不能实现多维索引。此外,还可以传入slice object(label和int都可以)和 boolean数组来筛选特定的行
- 列名label,df[‘A’]
- 列名列表,df[[‘A’,‘B’]]
- slice object,df[:3],df[‘a’:‘c’]
- boolean array,df[df.A>0.5]
[ ]切片方法
使用方括号能够对DataFrame进行切片,有点类似于python的列表切片。按照索引能够实现行选择或列选择或区块选择。
# 行选择
In [7]: data[1:5]
Out[7]:
fecha rnd_1 rnd_2 rnd_3
1 2012-04-11 1 16 3
2 2012-04-12 7 6 1
3 2012-04-13 2 16 7
4 2012-04-14 4 17 7
# 列选择
In [10]: data[['rnd_1', 'rnd_3']]
Out[10]:
rnd_1 rnd_3
0 8 12
1 1 3
2 7 1
3 2 7
4 4 7
5 12 8
# 区块选择
In [11]: data[:7][['rnd_1', 'rnd_2']]
Out[11]:
rnd_1 rnd_2
0 8 17
1 1 16
2 7 6
3 2 16
4 4 17
5 12 19
6 2 7
不过对于多列选择,不能像行选择时一样使用1:5这样的方法来选择。
In [12]: data[['rnd_1':'rnd_3']]
File "<ipython-input-13-6291b6a83eb0>", line 1
data[['rnd_1':'rnd_3']]
^
SyntaxError: invalid syntax
.a
、.iat
与loc
、iloc
类似,.a
、loc
、只能通过标签获取值,.iat
、iloc
通过行号,列号获取值
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
columns=['A', 'B', 'C'])
print(df)
'''
A B C
0 0 2 3
1 0 4 1
2 10 20 30
'''
print(df.at[2, 'B'])
#20
print(df.at[2, 2])
#ValueError: At based indexing on an non-integer index can only have non-integer indexers
print(df.get_value(2,'B'))
#20
print(df.get_value(2,2))
#只能根据标签索引