索引器
1、索引器
表的列索引:
通过列名从表中取出一列,需要指定列名:
import pandas
data=pandas.read_csv('/Users/liubingfeng/Desktop/test.csv')
print(data.head())
print(data['long'].head())
long lat start_time_format end_time_format
0 116.864643 38.310846 2020/11/23 6:35 2020/11/23 7:25
1 116.864762 38.311371 2020/11/23 7:08 2020/11/23 7:25
2 116.831244 38.309362 2020/11/23 8:23 2020/11/23 8:40
3 116.831309 38.309470 2020/11/23 8:23 2020/11/23 8:40
4 116.831309 38.309470 2020/11/23 8:23 2020/11/23 8:40
0 116.864643
1 116.864762
2 116.831244
3 116.831309
4 116.831309
Name: long, dtype: float64
取出多列:
print(data[['long','lat']].head())
long lat
0 116.864643 38.310846
1 116.864762 38.311371
2 116.831244 38.309362
3 116.831309 38.309470
4 116.831309 38.309470
也可以用.的方式取列:
print(data.long.head())
0 116.864643
1 116.864762
2 116.831244
3 116.831309
4 116.831309
Name: long, dtype: float64
列名如果是中文或者空格不能用此方法。
序列的行索引:
若 Series 只有单个值对应,则返回这个标量值,如果有
多个值对应,则返回一个 Series:
a=pandas.Series([0.1,1,-2,3.14159,4,5],index=['a','a','a','b','b','c'])
print(a['a'])
a 0.1
a 1.0
a -2.0
dtype: float64
print(a['c'])
5.0
多个索引:
print(a[['b','c']])
b 3.14159
b 4.00000
c 5.00000
dtype: float64
如果想要取出某两个索引之间的元素,并且这两个索引是在整个索引中唯一出现,则可以使用切片,同时需
要注意这里的切片会包含两个端点。
print(a['b':'c':2])
b 3.14159
c 5.00000
dtype: float64
在使用数据的读入函数时,如果不特别指定所对应的列作为索引,那么会生成从 0 开始的整数索引作为默认索引.
也可以指定索引:
a=pandas.Series(data=[1,2,'q',[4,5],{'name':'chen'}],index=[1,2,3,4,5],
dtype='object',name='chen')
print(a[4])
[4, 5]
print(a[[2,3]])
2 2
3 q
Name: chen, dtype: object
loc索引器:
loc 索引器的一般形式是 loc[*, ] ,其中第一个 * 代表行的选择,第二个 * 代表列的选择,如果省略第二个位置写作 loc[] ,这个 * 是指行的筛选。
data=data.set_index('end_time_format')
print(data.head())
print(data.loc['early'])
long lat start_time_format
end_time_format
early 116.864643 38.310846 2020/11/23 6:35
early 116.864762 38.311371 2020/11/23 7:08
late 116.831244 38.309362 2020/11/23 8:23
late 116.831309 38.309470 2020/11/23 8:23
late 116.831309 38.309470 2020/11/23 8:23
long lat start_time_format
end_time_format
early 116.864643 38.310846 2020/11/23 6:35
early 116.864762 38.311371 2020/11/23 7:08
print(data.loc['early','long'])
end_time_format
early 116.864643
early 116.864762
Name: long, dtype: float64
选定行和列输出,整理选前4行,long和lat列的数据:
print(data.loc[0:3,['long','lat']])
long lat
0 116.864643 38.310846
1 116.864762 38.311371
2 116.831244 38.309362
3 116.831309 38.309470
切片:
print(data.loc[0:3,'long':'start_time_format'])
long lat start_time_format
0 116.864643 38.310846 2020/11/23 6:35
1 116.864762 38.311371 2020/11/23 7:08
2 116.831244 38.309362 2020/11/23 8:23
3 116.831309 38.309470 2020/11/23 8:23
如果 DataFrame 使用整数索引,其使用整数切片的时候和上面字符串索引的要求一致,都是元素切片,包含端点,如0:3取的是前四行数据。起点、终点不允许有重复值.
根据条件筛选:
print(data.loc[data.long>116.85])
long lat start_time_format end_time_format
0 116.864643 38.310846 2020/11/23 6:35 2020/11/23 7:25
1 116.864762 38.311371 2020/11/23 7:08 2020/11/23 7:25
13 116.864731 38.311345 2020/11/23 10:11 2020/11/23 10:36
符合条件:
c_1=data.long>116.85
c_2=data.lat<40
c_3=(data.long+data.lat)<200
print(data.loc[(c_1 | c_2)&c_3])
long lat start_time_format end_time_format
0 116.864643 38.310846 2020/11/23 6:35 2020/11/23 7:25
1 116.864762 38.311371 2020/11/23 7:08 2020/11/23 7:25
2 116.831244 38.309362 2020/11/23 8:23 2020/11/23 8:40
3 116.831309 38.309470 2020/11/23 8:23 2020/11/23 8:40
4 116.831309 38.309470 2020/11/23 8:23 2020/11/23 8:40
5 116.831495 38.309779 2020/11/23 8:30 2020/11/23 8:40
条件写入函数:
def con(x):
c_1=x.long>116.85
c_2=x.lat<40
c_3=(x.long+x.lat)<200
return (c_1 | c_2)&c_3
print(data.loc[con])
long lat start_time_format end_time_format
0 116.864643 38.310846 2020/11/23 6:35 2020/11/23 7:25
1 116.864762 38.311371 2020/11/23 7:08 2020/11/23 7:25
2 116.831244 38.309362 2020/11/23 8:23 2020/11/23 8:40
3 116.831309 38.309470 2020/11/23 8:23 2020/11/23 8:40
4 116.831309 38.309470 2020/11/23 8:23 2020/11/23 8:40
5 116.831495 38.309779 2020/11/23 8:30 2020/11/23 8:40
lambda表达式:
print(data.loc[lambda x:3,lambda x:'long'])
116.831309
print(data.loc[lambda x:slice(1,3)])
long lat start_time_format end_time_format
1 116.864762 38.311371 2020/11/23 7:08 2020/11/23 7:25
2 116.831244 38.309362 2020/11/23 8:23 2020/11/23 8:40
3 116.831309 38.309470 2020/11/23 8:23 2020/11/23 8:40
在对表或者序列赋值时,应当在使用一层索引器后直接进行赋值操作,这样做是由于进行多次
索引后赋值是赋在临时返回的 copy 副本上的,而没有真正修改元素从而报出 SettingWithCopyWarning 警告