Pandas数据处理之Pandas的Series对象

最新推荐文章于 2024-07-26 19:09:10 发布

初一·

最新推荐文章于 2024-07-26 19:09:10 发布

阅读量2.6k

点赞数 2

分类专栏：数据处理文章标签： Python 数据分析 Pandas

本文链接：https://blog.csdn.net/weixin_43060843/article/details/91437468

版权

数据处理专栏收录该内容

8 篇文章 1 订阅

订阅专栏

《Python数据科学手册》读书笔记

3.2.1 Pandas的Series对象

Pandas的Series对象是一个带索引数据构成的一维数组。可以用一个数组创建Series对象，如下所示：

In [1] : improt numpy as np
		 import pandas as pd
In [2] : data = pd.Series([1,2,3,4])
In [3] : data
Out[3] :0    1
		1    2
		2    3
		3    4
		dtype: int64

Series对象将一组数据和一组索引绑定在一起，我们可以通过values属性和index属性获取数据。values属性返回的结果与Numpy数组类似。index属性返回的是一个类型为pd.index的类数组对象。

In [4] :data.values
Out[4] :array([1, 2, 3, 4], dtype=int64)

In [5] :data.index
Out[6] :RangeIndex(start=0, stop=4, step=1)

和Numpy数组一样，数据可以通过Pythn的中括号索引标签来获取：

In [6] :data[1]	#  根据index取值
Out[6] :2

In [7] :data[::-1]	# 倒序
Out[7] :3    4
		2    3
		1    2
		0    1
		dtype: int64
		
In [8] :data[1:3]	# 切片
Out[8] :1    2
		2    3
		dtype: int64

1.Series是通用的Numpy数组

到目前为止，我们可能觉得Series对象和一维Numpy数组基本可以等价交换，但两者的本质区别其实是索引：NumPy 数组通过隐式定义的整数索引获取数值，而 Pandas 的Series 对象用一种显式定义的索引与数值关联。
显式索引的定义让 Series 对象拥有了更强的能力。例如，索引不再仅仅是整数，还可以是任意想要的类型。如果需要，完全可以用字符串定义索引：

In [8] ：data = pd.Series([1,2,3,4],index=['a','b','c','d'])
In [9] ：data
Out[9] ：
a    1
b    2
c    3
d    4
dtype: int64

# 取值
In [10] ：data['a']		# 根据key取值
Out[10] ：1

In [11] ：data[0]		# 根据索引值取值
Out[11] ：1

也可以使用不连续或不按顺序的索引

In [12] ：data = pd.Series([1,2,3,4],index=[2,5,3,7])
In [13] ：data
Out[13] : 2    1
		  5    2
		  3    3
		  7    4
		  dtype: int64

2.Series是特殊的字典

你可以把 Pandas 的 Series 对象看成一种特殊的 Python 字典。字典是一种将任意键映射到一组任意值的数据结构，而 Series 对象其实是一种将类型键映射到一组类型值的数据结构.
我们可以直接用 Python 的字典创建一个 Series 对象，让 Series 对象与字典的类比更加清晰：

In[15] : population_dict = {
    'a':1,
    'b':2,
    'c':3,
    'd':4,
    'e':5
}
In[16] : polulation = pd.Series(population_dict)
In[17] : polulation
Out[17] :a    1
		 b    2
		 c    3
		 d    4
		 e    5
		 dtype: int64

用字典创建Series对象时，其索引默认按照顺序排列。典型的字典数字获取方式仍然有效。

In[18] : polulation["a"]
Out[18]: 1

# 和字典不同，Series对象还支持数组形式的操作，比如切片，不过其包含最后一个值。
In[19] : polulation["a":"c"]
Out[19] :a    1
		 b    2
		 c    3
		 dtype: int64

3.创建Series数组

创建Pandas的Series对象的方法，都是像这样的形式：
pd.Series(data,index=index)
其中，index是一个可选参数，data参数支持多种数据类型。例如，data可以是列表或者Numpy数组，这时index默认值为整数序列。

In[20]  : pd.Series([2,4,6])
Out[20] :0    2
		 1    4
		 2    6
		 dtype: int64

data也可以是一个标量，创建Series对象时会重复填充到每个索引上

In[21]  : pd.Series(5,index=[1,2,3])
Out[21] :1    5
		 2    5
		 3    5
		 dtype: int64

data还可以是一个字典，index默认是排序的字典键：

In[22]  : pd.Series({2:"a",1:'b',3:"c"})
Out[22] :2    a
		 1    b
		 3    c
		 dtype: object

每一种形式都可以通过显示指定索引筛选需要的结果:

#  Series对象只会保留显示定义的键值对
In[23]  : pd.Series({2:"a",1:'b',3:"c"},index={2,3})
Out[23] :2    a
		 3    c
		 dtype: object

# 当索引值长度大于data长度，自动用NaN填充
In[24]  :pd.Series({2:"a",1:'b',3:"c"},index={1,2,3,4})
Out[25] : 1       b
		  2       a
		  3       c
		  4     NaN
		 dtype: object