科学数据库(Pandas)——第一节：pandas之Series类型

最新推荐文章于 2024-04-07 15:19:09 发布

Lucky20171225

最新推荐文章于 2024-04-07 15:19:09 发布

阅读量174

点赞数

分类专栏： python数据科学库

本文链接：https://blog.csdn.net/qq_42119837/article/details/112869996

版权

python数据科学库专栏收录该内容

11 篇文章

订阅专栏

本文详细介绍了pandas库中Series的数据结构，包括其本质——由索引和值组成的数组，展示了创建Series的两种方法，以及如何进行切片、索引和处理缺失值。此外，还讲解了Series的where和mask方法的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Series的本质

Series 是一维的数组型对象，本质上由两个数组构成，一个数组构成对象的键（index,索引），一个数组构成对象的值（values)，键->值

Series的创建

先导入pandas模块

import pandas as pd

方法一：向Series里传入列表（index不写，默认从0开始），index可以设置指定值，但是index的个数必须和值的个数保持一致。

In [18]: pd.Series([1,5,8,23,5])
Out[18]:
0 1
1 5
2 8
3 23
4 5
dtype: int64

In [17]: pd.Series([1,5,8,23,5],index=list("abcde"))
Out[17]:
a 1
b 5
c 8
d 23
e 5
dtype: int64

方法二：通过字典创建一个Series,字典的键就是Series的索引。

In [20]: temp_dict = {"name": "xiaohong", "age": 30, "te
    ...: l": 10086}

In [21]: t2=pd.Series(temp_dict)

In [22]: t2
Out[22]:
name    xiaohong
age           30
tel        10086
dtype: object

Series的切片和索引


In [22]: t2
Out[22]:
name    xiaohong
age           30
tel        10086
dtype: object

In [36]: t2.index
Out[36]: Index(['name', 'age', 'tel'], dtype='object')

In [37]: for i in t2.index:   #index是可迭代的对象，可以遍历
    ...:     print(i)
    ...:
name
age
tel
In [38]: len(t2.index)   #计算index个数
Out[38]: 3

In [39]: list(t2.index)
Out[39]: ['name', 'age', 'tel']

In [40]: list(t2.index)[:2]
Out[40]: ['name', 'age']

In [41]: t2.values
Out[41]: array(['xiaohong', 30, 10086], dtype=object)

In [42]: type(t2.values)
Out[42]: numpy.ndarray

In [43]: type(t2.index)
Out[43]: pandas.core.indexes.base.Index

pandas中的缺失值

pandas中用NAN来标记缺失值，使用isnull和notnull函数来检查缺失数据

temp_dict = {"name": "xiaohong", "age": 30, "tel": 10086}
t2=pd.Series(temp_dict)
In [47]: states=["name","num","tel"]
    ...: t3=pd.Series(temp_dict,index=states)

In [48]: t3
Out[48]:
name    xiaohong
num          NaN
tel        10086
dtype: object

In [49]: pd.isnull(t3)
Out[49]:
name    False
num      True
tel     False
dtype: bool

In [50]: pd.notnull(t3)
Out[50]:
name     True
num     False
tel      True
dtype: bool

Series具有的where方法

Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors=‘raise’, try_cast=False, raise_on_error=None)
如果 cond 为真，保持原来的值，否则替换为other的值， other不写默认为NAN

In [3]: s = pd.Series(range(5))

In [4]: s.where(s>0)  #将符合条件的值显示出来，不符合的则显示NAN
Out[4]:
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64

In [5]: s.where(s>0,10)  
Out[5]:
0    10
1     1
2     2
3     3
4     4
dtype: int64

#mask函数的作用与where刚好相反
In [6]: s.mask(s>0,10)
Out[6]:
0     0
1    10
2    10
3    10
4    10
dtype: int64