Python 第三方模块数据分析 Pandas模块 Series1

最新推荐文章于 2023-05-19 17:04:35 发布

EdVzAs

最新推荐文章于 2023-05-19 17:04:35 发布

阅读量1.2k

点赞数

文章标签： python 数据分析 Series pandas

本文链接：https://blog.csdn.net/weixin_46131409/article/details/109908408

版权

Python 同时被 2 个专栏收录

135 篇文章

订阅专栏

数据分析

54 篇文章

订阅专栏

本文详细介绍Pandas中的Series数据结构，包括创建方式、基本操作、属性和方法等内容，并提供丰富的示例帮助理解。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一.简介

series基于Numpy中的ndarray,是带标签的1维同构数组,每个元素都有1个标签,类似于Python的dict

在这里插入图片描述

The Series is the primary building block of pandas and represents a one-dimensional labeled array based on the
NumPy ndarray

二.使用
1.创建对象:

关于index参数参见:http://liao.cpython.org/pandas03/

创建Series对象:pd.Series([data=None,index=(0,1...n-1),dtype=<dtype>,name=None,copy=False,fastpath=False])
  #参数说明:
    data:指定要存储的数据;为array-like/Iterable/dict/scalar
      #如果为dict且未指定<index>,会用键和值分别作为<index>和<data>
      #必须可哈希化
    index:指定标签(行名);为array-like/dict/pandas.Index对象,应和<data>等长
      #每个数据的标签可以不唯一;<index>必须可哈希化
      #如果指定了<index>且<data>为dict,会覆盖掉<data>的键而成为标签
      #为dict时会使用键作为<index>
    dtype:指定<data>中数据的数据类型;为str/numpy.dtype/ExtensionDtype,默认会从<data>中推断
    name:指定Series的名字;为str
    copy:指定是否显式复制输入的数据;为bool
    fastpath:指定快捷路径;为bool
      #源码和官网中都没有介绍....

#实例:
>>> pd.Series()
<stdin>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
Series([], dtype: float64)
>>> pd.Series([1,2,3,4,5],index=['a','b','c','f','e'])
a    1
b    2
c    3
f    4
e    5
>>> pd.Series(range(3))
0    0
1    1
2    2
dtype: int64
>>> s=pd.Series([1,2,3,4,5],index=['a','b','c','f','e'],name="aaa")
>>> s.name
'aaa'
>>> pd.Series([1,2,3,4,5],index=range(5),dtype="float64")
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64
>>> pd.Series({'a':3,'b':4,'c':5,'f':6,'e':8})
a    3
b    4
c    5
f    6
e    8
dtype: int64

2.操作
(1)索引与切片:

参见 Python.第三方模块.数据分析.Pandas模块.索引与切片.一,1 部分

(2)判定是否属于:

判定值是否属于Series对象:<val> in <S>.values
  #注意:一定要加.values
判定标签是否属于Series对象:<val> in <S>

#实例:
>>> s=pd.Series([1,2,3,4,5],index=["a","b","c","d","e"])
>>> 1 in s
False
>>> "a" in s
True
>>> 1 in s.values
True

(3)根据条件删除:

<S>[<cond>]
  #<cond>判定为False处的值会被删除
  #参数说明:
    cond:指定判定条件
      #注意:一定是对<S>整体(或其全部值或其全部标签)进行的判断

#实例:
>>> s=pd.Series([1,2,3,4,5],index=["a","b","c","d","e"])
>>> s[3!=s.values]
a    1
b    2
d    4
e    5
dtype: int64
>>> s[3==s.values]
c    3
dtype: int64
>>> s["a"==s.index]
a    1
dtype: int64
>>> s[3<=s]#相当于对s.values进行判断
c    3
d    4
e    5
dtype: int64

(4)运算与比较:

允许对任意2个Series进行运算:
>>> s1=pd.Series([1,2,3,4],index=["a","b","c","d"])
>>> s1+s1
a    2
b    4
c    6
d    8
dtype: int64
要注意的是运算时会进行"对齐"操作,即用2个Series中标签索引相同的元素进行运算,其余元素的运算结果均为NaN:
>>> s2=pd.Series([1,2,3,4],index=["a","f","e","d"])
>>> s1+s2
a    2.0
b    NaN
c    NaN
d    8.0
e    NaN
f    NaN
dtype: float64
当标签索引存在重复时,会给出所有可能的结果(但不完全是两两运算):
>>> s3=pd.Series([1,1],index=["a","a"])
>>> s4=pd.Series([2,1],index=["a","a"])
>>> s3+s4
a    3
a    2
dtype: int64
>>> s3=pd.Series([1,1,1,1],index=["a","a","b","c"])
>>> s4=pd.Series([1,2,3,4],index=["a","a","a","b"])
>>> s3+s4
a    2.0
a    3.0
a    4.0
a    2.0
a    3.0
a    4.0
b    5.0
c    NaN
dtype: float64

######################################################################################################################

2个Series的比较仅在二者的标签全部相同(包括顺序)时才允许:
>>> s1>s2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\ops\common.py", line 65, in new_method
    return method(self, other)
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\ops\__init__.py", line 365, in wrapper
    raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
>>> s1=pd.Series([1,1,1],index=["a","b","c"])
>>> s2=pd.Series([-1,1,2],index=["a","b","c"])
>>> s1>s2
a     True
b    False
c    False
dtype: bool
>>> s3=pd.Series([-1,1,2],index=["a","c","b"])
>>> s1>s3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\ops\common.py", line 65, in new_method
    return method(self, other)
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\ops\__init__.py", line 365, in wrapper
    raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects

三.属性:

属性既可用于查询,也可用于修改

获取标签:<S>.index
  #即创建时的<index>参数,不过为pandas.core.indexes.base.Index对象

#实例:
>>> s=pd.Series([1,2,3,4,5],index=["a","b","c","d","e"])
>>> s.index
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
>>> type(s.index)
<class 'pandas.core.indexes.base.Index'>
>>> s.index=["e","d","c","d","a"]
>>> s
e    1
d    2
c    3
d    4
a    5
dtype: int64

#################################################################################################

获取数据:<S>.values
  #即创建时的<data>参数,不过为numpy.ndarray对象
  #重复的数据不会被剔除

#实例:接上
>>> s.values
array([1, 2, 3, 4, 5], dtype=int64)
>>> type(s.values)
<class 'numpy.ndarray'>

#################################################################################################

获取元素个数:<S>.size

#实例:接上
>>> s.size
5

#################################################################################################

查看元素的数据类型:<S>.dtype
  #即创建时的<dtype>参数,不过为numpy.dtype对象

#实例:接上
>>> s.dtype
dtype('int64')
>>> type(s.dtype)
<class 'numpy.dtype'>

#################################################################################################

判断<S>中的值是否均唯一:<S>.is_unique

#实例:接上
>>> s.is_unique
True

#################################################################################################

转换为StringMethods object:<S>.str
  #返回pandas.core.strings.StringMethods object,已使用该类的方法(参见 Python.Pandas模块.数据处理.二 部分)
  #要求<S>中的所有值均为str

#实例:
>>> s=pd.Series([1,2,3,4,5],index=["a","b","c","d","e"])
>>> s.str
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py", line 5132, in __getattr__
    return object.__getattribute__(self, name)
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\strings.py", line 2100, in __init__
    self._inferred_dtype = self._validate(data)
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\strings.py", line 2157, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
>>> s=pd.Series(["1","2","3","4","5"],index=["a","b","c","d","e"])
>>> s.str
<pandas.core.strings.StringMethods object at 0x000001EC9DEE1A60>

四.方法

以下各函数均不会直接修改调用该函数的Series对象或作为参数的Series对象
NumPy模块的ufuncs(元素级数组方法)也可用于Seires

1.基础操作
(1)查:

查看指定位置的值:<S>.take(<indices>[,axis=0,is_copy=None,**kwargs])
  #参数说明:
  	indices:指定要查看的位置的索引;为array-like

#实例:
>>> s=pd.Series([1,2,3,4,5,6,7])
>>> s.take([2,4])
2    3
4    5
dtype: int64

#################################################################################################

取Series的前n行:<S>.head(<n>)
  #返回Series对象
取Series的后n行:<rS>=<S>.tail(<n>)
  #返回Series对象
  #参数说明:
    n:指定要获取的行数;为int

#实例:
>>> s=pd.Series([1,2,3,4,5],index=["a","b","c","d","e"])
>>> s.head(3)
a    1
b    2
c    3
dtype: int64
>>> s.tail(2)
d    4
e    5
dtype: int64

#################################################################################################

从<S>中随机取值:<S>.sample(<n>)
  #返回取出的值及其在<S>中的索引构成的Series对象
  #参数说明:
    n:指定要取几个值;默认为1

#实例:
>>> s=pd.Series([1,2,3,4,5,6,10,11,12,13])
>>> s.sample()
0    1
dtype: int64
>>> s.sample(5)
3     4
2     3
4     5
6    10
8    12
dtype: int64

#################################################################################################

查询符合条件的值:<S>.where(<cond>)
  #符合条件处为原值;不符合条件处为NaN
查询不符合条件的值:<S>.mask(<cond>)
  #不符合条件处为原值;符合条件处为NaN
  #参数说明:
    cond:指定条件
      #注意:应使用<S>进行比较(见实例)

#实例:
>>> s=pd.Series([-4,-3,-2,-1,0,1,2,3,4])
>>> s.where(s>0)
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    1.0
6    2.0
7    3.0
8    4.0
dtype: float64
>>> s.mask(s>0)
0   -4.0
1   -3.0
2   -2.0
3   -1.0
4    0.0
5    NaN
6    NaN
7    NaN
8    NaN
dtype: float64
>>> s=pd.Series(['a','b','c'])
>>> s.where(s>'a')
0    NaN
1      b
2      c
dtype: object
>>> s.mask(s>'a')
0      a
1    NaN
2    NaN
dtype: object

#################################################################################################

查看指定值是否在<S>中:<S>.isin(<value>)
  #返回Series对象,<value>在<S>中的位置的对应位置处为True,其他位置处为False
  #注意:如果<S>中包含不可哈希化的数据类型,很容易发生额外的问题,这时将<value>的尾元素设为某个Series对象即可解决(不
  #     过<value>中仍不能包含不可哈希化的数据类型)
  #参数说明:
    value:指定要查询的值;为set/list-like

#实例:
>>> s=pd.Series(['lama','cow','lama','beetle','lama','hippo'],name='animal')
>>> s.isin(['cow','lama'])
0     True
1     True
2     True
3    False
4     True
5    False
Name: animal, dtype: bool
>>> s.isin(['lama'])
0     True
1    False
2     True
3    False
4     True
5    False
Name: animal, dtype: bool
>>> s=pd.Series([1,2,'lama',[1,2]])
>>> s.isin(['lama'])
TypeError: unhashable type: 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 4685, in isin
    result = algorithms.isin(self, values)
  File "C:\Users\Euler\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\algorithms.py", line 460, in isin
    return f(comps, values)
  File "pandas\_libs\hashtable_func_helper.pxi", line 454, in pandas._libs.hashtable.ismember_object
SystemError: <built-in method view of numpy.ndarray object at 0x0000022DA286B300> returned a result with an error set
>>> s.isin(['lama',s])
0    False
1    False
2     True
3    False
dtype: bool

(2)改:

修改数据类型:<S>.astype("<dtype>"[,copy=True,errors="raise"])
  #参数说明:
    dtype:指定数据类型;为str
    copy:为True时,返回修改后的副本而不是<S>本身;为False时,修改<S>本身
      #慎用False,因为可能影响其他对象
    errors:指定<dtype>不适用时的操作;"raise"表示报错,"ignore"表示忽略并返回<S>本身

#实例:
>>> s=pd.Series([0,1,2,3],index=["a","c","b","e"])
>>> s.dtype
dtype('int64')
>>> s=s.astype('float64')
>>> s.dtype
dtype('float64')

#################################################################################################

修改对象名或标签索引:<S>.rename([index=None,axis=None,copy=True,inplace=False,level=None,errors="ignore"])
  #参数说明:
	index:指定如何修改;为scalar/hashable sequence/dict-like/function

#实例:
>>> s=pd.Series([0,1,2,3],index=["a","c","b","e"])
>>> s.rename(index="name")#scalar,修改<S>.name
a    0
c    1
b    2
e    3
Name: name, dtype: int64
>>> s.rename({"a":"A","b":"B","d":"D"})#dict,修改标签索引
A    0
c    1
B    2
e    3
dtype: int64
>>> def f(x):
...     return x.upper() if x.islower() else x.lower()
...
>>> s.rename(f)#function,修改标签索引
A    0
C    1
B    2
E    3
dtype: int64

#################################################################################################

修改指定值:<S>.replace([to_replace=None,value=None,inplace=False,limit=None,regex=False,method="pad"])
  #参数说明:
  	to_replace:指定要修改的值;为str(支持正则)/list/dict(指定值替换为指定值,value应为None)/Series/num/None
  	value:指定要修改为的值;为scalar/dict(指定列替换为指定值)/list/str(支持正则)/None

#实例:
>>> s=pd.Series([1,2,-999,4,-999,-999,7])
>>> s.replace(-999,np.NaN)
0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
5    NaN
6    7.0
dtype: float64

#################################################################################################

重设行标签索引为默认值:<S>.reset_index([level=None,drop=False,name=None,inplace=False])
  #参数说明:
  	level:指定要重设的行标签层级;为int/str/tuple/list
  	drop:是否丢弃原行标签;为bool
  	  #如果不丢弃,会作为新列插入,列名为index
  	name:指定原数据所在列的列名;为scalar,默认为<S>.name

#实例:
>>> s=pd.Series([1,2,3,4,5],index=["a","b","c","d","e"])
>>> s.name="aaa"
>>> s.reset_index()
  index  aaa
0     a    1
1     b    2
2     c    3
3     d    4
4     e    5
>>> s.reset_index(drop=True)
0    1
1    2
2    3
3    4
4    5
Name: aaa, dtype: int64
>>> s.reset_index(name=111)
  index  111
0     a    1
1     b    2
2     c    3
3     d    4
4     e    5

(3)元素转换

参见:https://blog.csdn.net/yangjjuan/article/details/104430332

对每列执行指定函数:<df>.apply(<func>,axis=0,raw=False,result_type=None,args=(),**kwds)
  #参见 Python.第三方模块.数据分析.Pandas模块.DataFrame.三.5 部分

#################################################################################################

执行指定函数:<s>.aggregate([func=None,axis=0,*args,**kwargs])
  #也可为<s>.agg();既可作用于单个元素,也可作用于整列
  #参数说明:
    func:指定要执行的函数;为function/function list-like/str

#实例:
>>> s=pd.Series([-1,3,3,0,2,6,4,-7,3,0])
>>> def f(x):
...     return x**2
...
>>> def f3(x):
...     return x**3
...
>>> s.agg([f,f3])
    f   f3
0   1   -1
1   9   27
2   9   27
3   0    0
4   4    8
5  36  216
6  16   64
7  49 -343
8   9   27
9   0    0
>>> s.agg("sum")
13

#################################################################################################

对每个元素执行指定函数:<S>.tranform(<func>[,*args,**kwargs])
  #参数说明:
	func:指定函数;为function/str/list/dict

#实例:
>>> s=pd.Series(['男','女','男','女','男','女','男','男'])
>>> s.transform(lambda x:x+x)
0    男男
1    女女
2    男男
3    女女
4    男男
5    女女
6    男男
7    男男
dtype: object
>>> s.transform("rank")
0    6.0
1    2.0
2    6.0
3    2.0
4    6.0
5    2.0
6    6.0
7    6.0
dtype: float64
>>> s.transform({"男":"rank"})
男  0    6.0
   1    2.0
   2    6.0
   3    2.0
   4    6.0
   5    2.0
   6    6.0
   7    6.0
dtype: float64

#################################################################################################

对每个元素执行指定映射:<S>.map(<arg>[,na_action=None])
  #注意:不能用于DataFrame;没能成功进行映射的元素均变为NaN
  #参数说明:
	arg:指定函数;为function/collections.abc.Mapping subclass/Series

#实例:
>>> s=pd.Series([-1,3,3,0,2,6,4,-7,3,0])
>>> def f(x):
...     return x**2-1
...
>>> s.map(f)
0     0
1     8
2     8
3    -1
4     3
5    35
6    15
7    48
8     8
9    -1
dtype: int64

(4)增:

插入指定元素:<S>.append(<nS>)
  #Series对象不能直接插入单个元素,只能通过合并2个Series对象实现插入
  #参数说明:
    nS:指定要插入元素
      #插入的元素的标签仍是其在<nS>中的标签

#实例:
>>> s=pd.Series([1,2,3,4,5],index=["a","b","c","d","e"])
>>> ns=pd.Series([10,11],index=["a","b"])
>>> s.append(ns)
a     1
b     2
c     3
d     4
e     5
a    10
b    11
dtype: int64

(5)删:

删除指定元素:<S>.drop([labels=None,axis=0,index=None,inplace=False,errors="raise"])
  #参数说明:errors同.astype
    labels/index:指定要删除的元素的标签;为single label/label list
      #以上2个参数应至少指定1个
      #2个参数的作用相同,指定1个即可,也可以直接使用位置参数
    axis:对Series对象只能是0
    inplace:为True,直接对<S>进行删除并返回None;为False,返回删除指定元素后的副本

#实例:
>>> s=pd.Series([1,2,"a","a","b"],index=['q','w','e','r','t'])
>>> s.drop('q')
w    2
e    a
r    a
t    b
dtype: object

#################################################################################################

去除指定部分:<S>.truncate([before=None,after=None,axis=None,copy=True])
  #参数说明:
	before:去除之前的部分;为date/str/int(数组索引)
	after:去除之后的部分;为date/str/int(数值索引)

#实例:
>>> s=pd.Series([0,1,2,3,4,5,6,7,8,9])
>>> s.truncate()
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64
>>> s.truncate(before=6)
6    6
7    7
8    8
9    9
dtype: int64