bymaymay: pandas默认的方括号索引是按照列索引,例如df[‘x’]是索引为‘x’的列,类型为Series,df[[‘x’]]类型是DataFrame; 行只能用数字切片的方式获取,df[0:3]是第0行到第2行,类型是Series; df[[0:3]]是DataFrame。
- loc——通过行标签索引行数据
1.1 loc[1]表示索引的是第1行(index 是整数)
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = [0,1]
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.loc[1]
a 4
b 5
c 6
1.2 loc[‘d’]表示索引的是第’d’行(index 是字符)
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.loc['d']
a 1
b 2
c 3
1.3 如果想索引列数据,像这样做会报错
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.loc['a']
KeyError: 'the label [a] is not in the [index]'
1.4 loc可以获取多行数据
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.loc['d':]
a b c
d 1 2 3
e 4 5 6
1.5 loc扩展——索引某行某列
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.loc['d',['b','c']]
b 2
c 3
注意这里index赋值为字符列表,如果index为默认的数值列表,loc同样可实现索引某行某列
import pandas as pd
data = [[1,2,3],[4,5,6]]
columns=['a','b','c']
df = pd.DataFrame(data=data, columns=columns)
print(df.loc[0,['b','c']])
输出的是series类型
b 2
c 3
loc可实现索引某行某列,某个特定位置的值
import pandas as pd
data = [[1,2,3],[4,5,6]]
columns=['a','b','c']
df = pd.DataFrame(data=data, columns=columns)
print(df.loc[0,'b'])
输出的是给定位置上(第一行第二列)的值
2
1.6 loc扩展——索引某列
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.loc[:,['c']]
c
d 3
e 6
当然获取某列数据最直接的方式是df.[列标签],但是当列标签未知时可以通过这种方式获取列数据。
需要注意的是,dataframe的索引[1:3]是包含1,2,3的,与平时的不同。
- iloc——通过行号获取行数据
2.1 想要获取哪一行就输入该行数字
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.iloc[1]
a 4
b 5
c 6
2.2 通过行标签索引会报错
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.iloc['a']
TypeError: cannot do label indexing on <class 'pandas.core.index.Index'> with these indexers [a] of <type 'str'>
2.3 同样通过行号可以索引多行
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.iloc[0:]
a b c
d 1 2 3
e 4 5 6
2.4 iloc索引列数据
import pandas as pd
data = [[1,2,3],[4,5,6]]
index = ['d','e']
columns=['a','b','c']
df = pd.DataFrame(data=data, index=index, columns=columns)
print df.iloc[:,[1]]
b
d 2
e 5
ix 已废弃不用,建议使用loc和iloc
实例
下图为dataFrame格式的数据,名为dfPapers
cite = dfPapers.loc[1,['citation']]
print(type(cite))
print(list(cite))
<class 'pandas.core.series.Series'>
['1775098\n,1648782\n,2206608\n,2857290\n,2674433\n,85903\n,3075632\n,1229189\n,']
cite = dfPapers.loc[1,'citation']
print(type(cite)) #<class 'str'>
print(cite)
<class 'str'>
1775098
,1648782
,2206608
,2857290
,2674433
,85903
,3075632
,1229189
,
http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing