参考链接: Python数据分析与展示
参考链接: Pandas官网
参考链接: User Guide
参考链接: Getting started tutorials
Series类型:
Series类型由一组数据及与之相关的数据索引组成
演示1:
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> a = pd.Series([9,8,7,6])
>>> # 输出左侧是自动生成的索引,右侧是值
# 类型是NumPy中的数据类型,
# 因为Pandas是在NumPy基础上构建的
>>> a
0 9
1 8
2 7
3 6
dtype: int64
>>> # 用户自定义索引
b1 = pd.Series([9,8,7,6],index=["a","b",'c','d'])
>>> b1
a 9
b 8
c 7
d 6
dtype: int64
>>> b2 = pd.Series([9,8,7,6],["a","b",'c','d']) # 如果索引在第二位,也可省略index=
>>> b2
a 9
b 8
c 7
d 6
dtype: int64
>>>
>>>
Series类型的创建:
- Series类型可以由如下类型创建:
- Python列表,index与列表元素个数一致
- 标量值,index表达Series类型的尺寸
- Python字典,键值对中的“键”是索引,index从字典中进行选择操作
- ndarray,索引和数据都可以通过ndarray类型创建
- 其他函数,range()函数等
实验2:
>>> ################################################
>>> # 由标量创建Series类型
>>> s = pd.Series(25,index=['a','b','c','d']) # 这里不能省略index=
>>> s
a 25
b 25
c 25
d 25
dtype: int64
>>> # 由字典类型创建
>>> dd = pd.Series({'a':99,'b':88,'c':77})
>>> dd
a 99
b 88
c 77
dtype: int64
>>>
>>> # 指明索引的顺序,其中缺失部分是NaN,表示值为空
>>> # 可以理解为index从字典中进行选择操作
>>> ddd = pd.Series({'a':99,'b':88,'c':77},index = ['c','a','b','d'])
>>> ddd
c 77.0
a 99.0
b 88.0
d NaN
dtype: float64
>>> # 使用numpy中的ndarray类型来创建
>>> n = pd.Series(np.arange(5))
>>> n
0 0
1 1
2 2
3 3
4 4
dtype: int32
>>> # 使用numpy中的ndarray类型来创建
>>> m = pd.Series(np.arange(5),index=np.arange(19,14,-1))
>>> m
19 0
18 1
17 2
16 3
15 4
dtype: int32
>>> b = pd.Series([9,8,7,6],index=["a","b",'c','d'])
>>> b
a 9
b 8
c 7
d 6
dtype: int64
>>>
- Series类型基本操作:
- Series类型包括index和values两部分
- Series类型的操作类似ndarray类型
- Series类型的操作类似Python字典类型
实验3:
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> b = pd.Series([9,8,7,6],index=["a","b",'c','d'])
>>> b
a 9
b 8
c 7
d 6
dtype: int64
>>> # 获得Series的索引
>>> b.index
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> # 获得Series的数据
>>> b.values
array([9, 8, 7, 6], dtype=int64)
>>> # Series内部自动索引和自定义索引并存
>>> b['b']
8
>>> b[1]
8
>>> b["b"]
8
>>> b[["c",'d',0]] # 两套索引方式并存,但是不能混合使用
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
b[["c",'d',0]] # 两套索引方式并存,但是不能混合使用
File "D:\Python\Python37\lib\site-packages\pandas\core\series.py", line 910, in __getitem__
return self._get_with(key)
File "D:\Python\Python37\lib\site-packages\pandas\core\series.py", line 958, in _get_with
return self.loc[key]
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1768, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1954, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1595, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1553, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1655, in _validate_read_indexer
"Passing list-likes to .loc or [] with any missing labels "
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'
>>> b[["c",'d','a']]
c 7
d 6
a 9
dtype: int64
>>>
>>>
- Series类型的操作类似ndarray类型:
- 索引方法相同,采用[]
- NumPy中运算和操作可用于Series类型
- 可以通过自定义索引的列表进行切片
- 可以通过自动索引进行切片,如果存在自定义索引,则一同被切片
实验4:
>>>
>>> # 索引和切片
b = pd.Series([9,8,7,6],index=["a","b",'c','d'])
>>> b
a 9
b 8
c 7
d 6
dtype: int64
>>> # 索引
b[3]
6
>>> b['a']
9
>>> # 切片
b[:3]
a 9
b 8
c 7
dtype: int64
>>> # 求中位数
b.median()
7.5
>>> b[b>b.median()]
a 9
b 8
dtype: int64
>>> np.exp(b)
a 8103.083928
b 2980.957987
c 1096.633158
d 403.428793
dtype: float64
>>>
>>>
>>>
- Series类型的操作类似Python字典类型:
- 通过自定义索引访问
- 保留字in操作
- 使用.get()方法
实验5:
>>>
>>> b = pd.Series([9,8,7,6],index=["a","b",'c','d'])
>>> b['b']
8
>>> # 判断某一个数据是否在Series的索引中
'c' in b
True
>>> # 不会判断自动索引
0 in b
False
>>> "f" in b
False
>>> # 类似字典的get方法
b.get('f',100)
100
>>> b.get('a',100)
9
>>>
Series类型在运算中会自动对齐不同索引的数据
实验6:
>>>
>>> # Series类型的对齐操作
>>> a = pd.Series([1,2,3],['c','d','e'])
>>> b = pd.Series([9,8,7,6],['a','b','c','d'])
>>> # Series类型在运算中会自动对齐不同索引的数据
# 结果的维度是两者的并集
>>> a + b
a NaN
b NaN
c 8.0
d 8.0
e NaN
dtype: float64
>>>
>>>
Series对象和索引都可以有一个名字,存储在属性.name中,Series对象可以随时修改并即刻生效:
实验7:
>>>
>>>
>>> a = pd.Series([1,2,3],['c','d','e'])
>>>
>>>
>>>
>>> b = pd.Series([9,8,7,6],['a','b','c','d'])
>>> b
a 9
b 8
c 7
d 6
dtype: int64
>>> b.name
>>> b.index.name
>>> b.name = '林麻子Series对象'
>>> b.name
'林麻子Series对象'
>>> b.index.name = '麻子索引列'
>>> b.index.name
'麻子索引列'
>>> b
麻子索引列
a 9
b 8
c 7
d 6
Name: 林麻子Series对象, dtype: int64
>>>
>>>
>>> b = pd.Series([9,8,7,6],['a','b','c','d'])
>>> b
a 9
b 8
c 7
d 6
dtype: int64
>>> b['a'] = 15
>>> b.name = '宝儿Series对象'
>>> b
a 15
b 8
c 7
d 6
Name: 宝儿Series对象, dtype: int64
>>> b.name = '宝儿Series新对象'
>>> b
a 15
b 8
c 7
d 6
Name: 宝儿Series新对象, dtype: int64
>>> b['b','c'] = 20200910
>>> b
a 15
b 20200910
c 20200910
d 6
Name: 宝儿Series新对象, dtype: int64
>>>
>>>
>>>
总结:
Series是一维带“标签”数组
index_0 ---> data_a
Series基本操作类似ndarray和字典,根据索引对齐