Pandas数据处理基本常用操作——核心数据结构Series

最新推荐文章于 2024-04-22 12:07:22 发布

志存高远脚踏实地

最新推荐文章于 2024-04-22 12:07:22 发布

阅读量542

点赞数 1

文章标签： Pandas数据处理基本操作 Pandas中的Series Series的使用

本文链接：https://blog.csdn.net/weixin_44451032/article/details/99406282

版权

Pandas数据处理基本常用操作——核心数据结构Series

Pandas各个数据类型的关系 0维单值变量->1维Series->2维DataFrame->3维层次化DataFrame，Series与DataFrame都是Pandas的核心数结构

创建Series的方法主要有列表创建和字典创建以及使用其他方法创建

使用列表创建Series

import pandas as pd
#使用列表创建Series
pd.Series([1,2,3,4,5])#pandas自动创建索引，每一个索引对应一个value类似于字典的key value  还可以指定索引

0    1
1    2
2    3
3    4
4    5
dtype: int64

pandas自动创建索引，每一个索引对应一个value类似于字典的key value 还可以指定索引

使用列表创建Series 指定索引

#使用列表创建Series  指定索引
pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])

a    1
b    2
c    3
d    4
e    5
dtype: int64

使用字典创建Series，索引就是字典的key值

#使用字典创建Series，索引就是字典的key值
my_info = {'name':'gaozhiyuan','age':20,'sex':'male'}
pd.Series(my_info)

name    gaozhiyuan
age             20
sex           male
dtype: object

标量创建Series

#标量创建Series
pd.Series(5,index=[1,2,3,4,5])

1    5
2    5
3    5
4    5
5    5
dtype: int64

pd.Series(range(5))

0    0
1    1
2    2
3    3
4    4
dtype: int64

使用numpy数组创建

#使用numpy数组创建
import numpy as np
import pandas as pd
pd.Series(np.arange(5),index=np.arange(5,0,-1))

5    0
4    1
3    2
2    3
1    4
dtype: int32

Series中数据的查询

#Series中数据的查询
#定义一个班级的series   索引值为学号  value为名字
class_series = pd.Series(['xiaoming','xiaohong','xiaolan'],index=[2017,2016,2018])
class_series

2017    xiaoming
2016    xiaohong
2018     xiaolan
dtype: object

#查询学号是2016的人的名字
class_series[2016]

'xiaohong'

查看数据结构

#查看数据结构
class_series.shape  #一维数据返回其值的个数

(3,)

查询value和index

#查询value和index
class_series.values

array(['xiaoming', 'xiaohong', 'xiaolan'], dtype=object)

type(class_series.values)  #可以看到其value的类型是np.ndarry  实际上这是因为pandas是在numpy的基础上继续开发的

可以看到其value的类型是np.ndarry 实际上这是因为pandas是在numpy的基础上继续开发的

numpy.ndarray

class_series.index

Int64Index([2017, 2016, 2018], dtype='int64')

查看单个索引或值

#查看单个索引或值
class_series.values[1],class_series.index[1]

('xiaohong', 2016)

索引查询的自定义索引查询和默认查询单值查询

#索引查询的自定义索引查询和默认查询  单值查询
class_series = pd.Series(['xiaoming','xiaohong','xiaolan'],index=['2017','2016','2018'])  
#只有在index为非int时才可以同时使用两种查询查到相同的结果  但是实际中自定义了索引当然是用自定的索引值查询
class_series['2016'],class_series[1]

('xiaohong', 'xiaohong')

多值查询

#多值查询
class_series = pd.Series(['xiaoming','xiaohong','xiaolan'],index=['2017','2016','2018'])  
class_series[['2017','2016']]

2017    xiaoming
2016    xiaohong
dtype: object

class_series[[0,1]]

2017    xiaoming
2016    xiaohong
dtype: object

切片查询开始结束步长

#切片查询  开始  结束  步长
class_series[0:2]

2017    xiaoming
2016    xiaohong
dtype: object

class_series['2017':'2018':2]

2017    xiaoming
2018     xiaolan
dtype: object

class_series[0::2]

2017    xiaoming
2018     xiaolan
dtype: object

布尔查询

#布尔查询
class_series == 'xiaoming'

2017    False
2016    False
2018    False
dtype: bool

根据判断的返回布尔值进行索引

#根据判断的返回布尔值进行索引
class_series[class_series == 'xiaoming']

2017    xiaoming
dtype: object

Series的向量化运算矢量运算并行化运算

#Series的向量化运算  矢量运算  并行化运算
class_series = pd.Series(['xiaoming','xiaohong','xiaolan'],index=['2017','2016','2018'])  
class_series + '_1703'

2017    xiaoming_1703
2016    xiaohong_1703
2018     xiaolan_1703
dtype: object

class_score = pd.Series([90,88,92],index=[1701,1702,1703])
class_score

1701    90
1702    88
1703    92
dtype: int64

#都减去10分
class_score - 10

1701    80
1702    78
1703    82
dtype: int64

#计算总分
np.sum(class_score),class_score.sum()

(270, 270)

#计算平均分
np.average(class_score),class_score.mean()

(90.0, 90.0)

#求最值
np.max(class_score),class_score.max()

(92, 92)

修改值

#修改值
class_score[1702] = 91
class_score

1701    90
1702    91
1703    92
dtype: int64

修改多个值

#修改多个值
class_score[[1701,1702,1703]] = [90,90,90]
class_score

0       90
1702    90
1703    90
dtype: int32

使用rename对索引修改

标量，哈希序列，类字典dict-like或者函数，可选择字典或者函数改变索引值

可变类型不可哈希，不可变类型可哈希

标量和哈希序列只会改变Series的name属性

#index : scalar, hashable sequence, dict-like or function, optional dict-like or functions are transformations to apply tothe index.
#Scalar or hashable sequence-like will alter the ``Series.name`` attribute.
#标量,哈希序列改变name  
class_score.rename(1),class_score.rename('1111')

(1701    90
 1702    90
 1703    90
 Name: 1, dtype: int32, 1701    90
 1702    90
 1703    90
 Name: 1111, dtype: int32)

类字典dict-like或者函数可改变Series的索引值

#类字典dict-like或者函数可改变Series的索引值
class_score_2 = pd.Series([89,90,91])
class_score_2

0    89
1    90
2    91
dtype: int64

使用函数改变Series的索引值

#使用函数改变Series的索引值，可以使用匿名函数lambda或者自定义函数，lambda在我的博文lambda匿名函数与过滤器filter中讲过
def square(x):
    return x**2
class_score_2.rename(square),class_score_2.rename(lambda x:x**2)

(0    89
 1    90
 4    91
 dtype: int64, 0    89
 1    90
 4    91
 dtype: int64)

使用类似字典dict_like改变Series的索引值

#使用类似字典dict_like改变Series的索引值
class_score_2.rename({0:89,1:90,4:91})

89    89
90    90
2     91
dtype: int64

再次查看class_score_2的值发现索引值并没有更改，这是因为rename方法默认重新拷贝了一份数据并不是原始数据

class_score_2

0    89
1    90
2    91
dtype: int64

志存高远脚踏实地

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫