pandas基础03

最新推荐文章于 2024-07-18 06:35:50 发布

Tracey_Chen

最新推荐文章于 2024-07-18 06:35:50 发布

阅读量184

点赞数

分类专栏： Tracey的python编程

本文链接：https://blog.csdn.net/qq_41048228/article/details/111569532

版权

Tracey的python编程专栏收录该内容

19 篇文章 0 订阅

订阅专栏

索引器

1、索引器
表的列索引：
通过列名从表中取出一列，需要指定列名：

import pandas
data=pandas.read_csv('/Users/liubingfeng/Desktop/test.csv')
print(data.head())
print(data['long'].head())

     long        lat start_time_format  end_time_format
0  116.864643  38.310846   2020/11/23 6:35  2020/11/23 7:25
1  116.864762  38.311371   2020/11/23 7:08  2020/11/23 7:25
2  116.831244  38.309362   2020/11/23 8:23  2020/11/23 8:40
3  116.831309  38.309470   2020/11/23 8:23  2020/11/23 8:40
4  116.831309  38.309470   2020/11/23 8:23  2020/11/23 8:40
0    116.864643
1    116.864762
2    116.831244
3    116.831309
4    116.831309
Name: long, dtype: float64

取出多列：

print(data[['long','lat']].head())

 long        lat
0  116.864643  38.310846
1  116.864762  38.311371
2  116.831244  38.309362
3  116.831309  38.309470
4  116.831309  38.309470

也可以用.的方式取列：

print(data.long.head())
0    116.864643
1    116.864762
2    116.831244
3    116.831309
4    116.831309
Name: long, dtype: float64

列名如果是中文或者空格不能用此方法。

序列的行索引：
若 Series 只有单个值对应，则返回这个标量值，如果有
多个值对应，则返回一个 Series：

a=pandas.Series([0.1,1,-2,3.14159,4,5],index=['a','a','a','b','b','c'])
print(a['a'])

a    0.1
a    1.0
a   -2.0
dtype: float64

print(a['c'])

5.0

多个索引：

print(a[['b','c']])

b    3.14159
b    4.00000
c    5.00000
dtype: float64

如果想要取出某两个索引之间的元素，并且这两个索引是在整个索引中唯一出现，则可以使用切片，同时需
要注意这里的切片会包含两个端点。

print(a['b':'c':2])
b    3.14159
c    5.00000
dtype: float64

在使用数据的读入函数时，如果不特别指定所对应的列作为索引，那么会生成从 0 开始的整数索引作为默认索引.
也可以指定索引：

a=pandas.Series(data=[1,2,'q',[4,5],{'name':'chen'}],index=[1,2,3,4,5],
                   dtype='object',name='chen')
print(a[4])


[4, 5]

print(a[[2,3]])

2    2
3    q
Name: chen, dtype: object

loc索引器：
loc 索引器的一般形式是 loc[*, ] ，其中第一个 * 代表行的选择，第二个 * 代表列的选择，如果省略第二个位置写作 loc[] ，这个 * 是指行的筛选。

data=data.set_index('end_time_format')
print(data.head())
print(data.loc['early'])

                       long        lat start_time_format
end_time_format                                         
early            116.864643  38.310846   2020/11/23 6:35
early            116.864762  38.311371   2020/11/23 7:08
late             116.831244  38.309362   2020/11/23 8:23
late             116.831309  38.309470   2020/11/23 8:23
late             116.831309  38.309470   2020/11/23 8:23


                       long        lat start_time_format
end_time_format                                         
early            116.864643  38.310846   2020/11/23 6:35
early            116.864762  38.311371   2020/11/23 7:08

print(data.loc['early','long'])

end_time_format
early    116.864643
early    116.864762
Name: long, dtype: float64

选定行和列输出，整理选前4行，long和lat列的数据：

print(data.loc[0:3,['long','lat']])

         long        lat
0  116.864643  38.310846
1  116.864762  38.311371
2  116.831244  38.309362
3  116.831309  38.309470

切片：

print(data.loc[0:3,'long':'start_time_format'])

     long        lat start_time_format
0  116.864643  38.310846   2020/11/23 6:35
1  116.864762  38.311371   2020/11/23 7:08
2  116.831244  38.309362   2020/11/23 8:23
3  116.831309  38.309470   2020/11/23 8:23

如果 DataFrame 使用整数索引，其使用整数切片的时候和上面字符串索引的要求一致，都是元素切片，包含端点，如0：3取的是前四行数据。起点、终点不允许有重复值.

根据条件筛选：

print(data.loc[data.long>116.85])

          long        lat start_time_format   end_time_format
0   116.864643  38.310846   2020/11/23 6:35   2020/11/23 7:25
1   116.864762  38.311371   2020/11/23 7:08   2020/11/23 7:25
13  116.864731  38.311345  2020/11/23 10:11  2020/11/23 10:36

符合条件：

c_1=data.long>116.85
c_2=data.lat<40
c_3=(data.long+data.lat)<200
print(data.loc[(c_1 | c_2)&c_3])


          long        lat start_time_format   end_time_format
0   116.864643  38.310846   2020/11/23 6:35   2020/11/23 7:25
1   116.864762  38.311371   2020/11/23 7:08   2020/11/23 7:25
2   116.831244  38.309362   2020/11/23 8:23   2020/11/23 8:40
3   116.831309  38.309470   2020/11/23 8:23   2020/11/23 8:40
4   116.831309  38.309470   2020/11/23 8:23   2020/11/23 8:40
5   116.831495  38.309779   2020/11/23 8:30   2020/11/23 8:40

条件写入函数：

def con(x):
    c_1=x.long>116.85
    c_2=x.lat<40
    c_3=(x.long+x.lat)<200
    return (c_1 | c_2)&c_3
print(data.loc[con])


          long        lat start_time_format   end_time_format
0   116.864643  38.310846   2020/11/23 6:35   2020/11/23 7:25
1   116.864762  38.311371   2020/11/23 7:08   2020/11/23 7:25
2   116.831244  38.309362   2020/11/23 8:23   2020/11/23 8:40
3   116.831309  38.309470   2020/11/23 8:23   2020/11/23 8:40
4   116.831309  38.309470   2020/11/23 8:23   2020/11/23 8:40
5   116.831495  38.309779   2020/11/23 8:30   2020/11/23 8:40

lambda表达式：

print(data.loc[lambda x:3,lambda x:'long'])

116.831309

print(data.loc[lambda x:slice(1,3)])

         long        lat start_time_format  end_time_format
1  116.864762  38.311371   2020/11/23 7:08  2020/11/23 7:25
2  116.831244  38.309362   2020/11/23 8:23  2020/11/23 8:40
3  116.831309  38.309470   2020/11/23 8:23  2020/11/23 8:40

在对表或者序列赋值时，应当在使用一层索引器后直接进行赋值操作，这样做是由于进行多次
索引后赋值是赋在临时返回的 copy 副本上的，而没有真正修改元素从而报出 SettingWithCopyWarning 警告