Pandas数据结构之Series对象

透明的星星

已于 2024-07-14 23:20:24 修改

阅读量134

点赞数 4

文章标签： pandas 人工智能 python

于 2024-07-14 22:04:50 首次发布

本文链接：https://blog.csdn.net/2401_85384231/article/details/140424052

版权

Series是一个一维标签数组（也就是带索引的数组，轴标签统称为索引index），可以存放任何数据类型如：整形、字符串、浮点数、python对象等。Series的基本创建方法为pd.Series(data, index)，如果没有指定index，则默认为[0，1，2，...，len(data) - 1]，data可以是以下形式。

data的三种形式

python字典

import pandas as pd

# 没有指定index的情况下，默认索引为字典的键，值为字典的键所对应的值
ser = pd.Series({'a': 0, 'b': 1, 'c': 2})  # ser的结果如下
# a    0
# b    1
# c    2
# dtype: int64

# 当data字典中的项小于index长度时，使用NaN填充多余的索引
ser2 = pd.Series({'a': 0, 'b': 1, 'c': 2}, index=['a', 'b', 'c', 'd']) 
# ser2的结果如下
# a    0.0
# b    1.0
# c    2.0
# d    NaN
# dtype: float64

# 当字典中的键都在index中但字典中键的顺序和index中不一致时，使用键匹配原则，
# 而不是顺序/位置匹配原则
ser2_ = pd.Series({'a': 0, 'c': 1, 'd': 2}, index=['a', 'b', 'c', 'd'])
# ser2_的结果如下
# a    0.0
# b    NaN
# c    1.0
# d    2.0
# dtype: float64


# 当data字典中的键和index不对应时，index中的元素会替换掉字典中的键
ser3 = pd.Series({'a': 0, 'b': 1, 'c': 2, 'f': 3}, index=['a', 'b', 'c', 'd'])
# ser3的结果如下
# a    0.0
# b    1.0
# c    2.0
# d    NaN
# dtype: float64

# 当data为一个字典，且字典长度大于index长度时，以index为准，取data字典中前len(index)个项，
# 丢弃其它字典项
ser4 = pd.Series({'a': 0, 'b': 1, 'c': 2, 'f': 3, 'g': 4}, index=['a', 'b', 'c', 'd'])
# ser4的结果如下
# a    0.0
# b    1.0
# c    2.0
# d    NaN
# dtype: float64

ndarray

import numpy as np
import pandas as pd

ser = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
# ser的结果为
# a    0.167915
# b   -1.310394
# c    2.540096
# d    0.440984
# e    0.430818
# dtype: float64

# 当ndarray的长度与index的长度不相等时，会保存
ser = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e', 'f'])
# 结果会报错：ValueError: Length of values (5) does not match length of index (6)

# ndarray只能是单一维度的数组
ser = pd.Series(np.random.randn((5, 5)), index=['a', 'b', 'c', 'd', 'e'])
# 会报错：TypeError: 'tuple' object cannot be interpreted as an integer

标量

import pandas as pd

# 当没有指定index时，默认创建只由单个元素的Series
ser = pd.Series(5.0)
# ser的结果为：
# 0    5.0
# dtype: float64

# 当指定了index时，每个index对应的值均为该标量
ser1 = pd.Series(5.0, index=['a', 'b', 'c', 'd'])
# ser1的结果为
# a    5.0
# b    5.0
# c    5.0
# d    5.0
# dtype: float64

由创建方式衍生出的Series特性

得益于上述创建Series的方法，Series对象也同时具备一些ndarray以及python字典的一些特性（假设ser是一个已存在的Series对象）：

根据index键访问Series元素

value = ser['a']  # 获取ser中索引为a的元素值

判断某个index是否存在

if 'a' in ser:  # 使用in关键字判断某个索引是否存在
    print(ser['a'])

修改特定index键的Series元素

ser['a'] = 1.0  # 将ser中索引为a的元素值设置为1.0

类似于字典的get方法

value = ser.get('a', NaN)  # 获取ser中索引为a的元素值，如果不存在，则返回NaN

转换为numpy数组

arr = ser.to_numpy()  # 使用Series对象的to_numpy()方法将其转换为numpy数组

执行numpy算术运算

# 可以直接使用numpy的算术运算操作，例如
ser2 = np.exp(ser)  
# 此种情况下，会对ser中的每一个值执行np.exp运算，
# 返回的ser2仍然是一个Series，但是索引所对应的值改变了

向量化操作及标签对齐特性

Series对象可以执行numpy中的绝大多数向量化操作，如：

new_ser = ser + ser  # ser是一个Series数组
new_ser = np.exp(ser)
new_ser = ser * 2
# 执行numpy操作时，改变的都是Series数组中的值

Series数组和numpy数组的一个重要区别是Series可以根据索引自动进行对齐，例如：

ser = pd.Series({'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4})
new_ser = ser[1:] + ser[:-1]
# 此处ser[1:]包含ser的后四项，即索引为['b', 'c', 'd', 'e']，ser[:-1]包含ser中的前四项
# 即索引为['a', 'b', 'c', 'd']，当两者相加时，首先会对索引进行对齐，由于ser[1:]中没有索
# a，因此ser[1:]中会创建一个索引为a的项，其值为NaN，同理ser[:-1]中会创建一个索引为e的项，
# 值为NaN，随后两者才执行相加操作，相加时两者会将对应索引的值进行相加。简单总结就是其中一个
# Series中有但另一个Series没有的索引项，则在另一个Series中创建对应的索引项，这个过程是相
# 互的。其中创建的索引项的值默认为NaN
# 因此，此处new_ser的结果为:
# a    NaN
# b    2.0
# c    4.0
# d    6.0
# e    NaN
# dtype: float64

name属性

Series对象有一个name属性，反应在DataFrame中就是列名，在创建时可以通过pd.Series(name='')指定，也可以在创建之后通过series.rename()方法指定或修改，可以通过series.name方式获取series实例的name属性。

注意：series.rename()方法默认不是inplace操作，即rename后与rename前不是同一个Series，可以通过设置inplace参数为True实现原地修改，另外如果该方法传入的是一个函数或者一个字典，则默认修改的是Series的索引

透明的星星

关注

4
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas数据结构之Series对象

Series是一个一维标签数组（也就是带索引的数组，轴标签统称为索引index），可以存放任何数据类型如：整形、字符串、浮点数、python对象等。Series的基本创建方法为)，如果没有指定index，则默认为[0，1，2，...，len(data) - 1]data。
复制链接

扫一扫