Python基础教程（三）Pandas--Series

最新推荐文章于 2024-04-30 15:37:16 发布

ngany

最新推荐文章于 2024-04-30 15:37:16 发布

阅读量503

点赞数

分类专栏： Python 文章标签： python pandas

本文链接：https://blog.csdn.net/ngany/article/details/113802598

版权

Python 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

编程字典Pandas教程 http://codingdict.com/article/8270
清华计算机博士带你学-Python金融量化分析 https://www.bilibili.com/video/BV1i741147LS?t

1.介绍与安装

Pandas是一个开源的Python库，使用其强大的数据结构提供高性能的数据处理和分析工具，是基于Numpy构建的。Pandas这个名字源自面板数据 - 来自多维数据的计量经济学。

2008年，开发人员Wes McKinney在需要高性能，灵活的数据分析工具时开始开发Pandas。

在Pandas之前，Python主要用于数据管理和准备。它对数据分析的贡献很小。Pandas解决了这个问题。使用Pandas，无论数据来源如何 - 加载，准备，操作，建模和分析，我们都可以完成数据处理和分析中的五个典型步骤。

Python与Pandas一起使用的领域广泛，包括学术和商业领域，包括金融，经济学，统计学，分析学等。

安装方法：

pip install pandas

引用方法：

import pandas as pd

2. Series 一维数据对象

Series是一种类似于一维数组的对象，是由一组数据和一组数据相关的数据标签（索引）组成的。

Series创建

Pandas序列可以使用以下构造函数创建 -

pandas.Series( data, index, dtype, copy)

No	参数	说明
1	data	数据采用各种形式，如ndarray，列表，常量
2	index	索引值必须是唯一可散列的，与数据长度相同。如果没有索引被传递，则默认为 np.arrange（n）。
3	dtype	dtype用于数据类型。如果没有，则会推断数据类型
4	copy	复制数据。默认为False

从list创建Series

a = pd.Series([2,3,4,5],index=['a','b','c','d'])
print(a)

输出

a    2
b    3
c    4
d    5
dtype: int64

从numpy创建Series

sr = pd.Series(np.arange(4),index=['a','b','c','d'])

输出

a    0
b    1
c    2
d    3
dtype: int32

从字典创建Series

sr = pd.Series({'a':0,'b':1,'c':2,'d':3})
print(sr)

输出

a    0
b    1
c    2
d    3
dtype: int64

Series基本属性

Series比较像列表（数组）和字典的结合体

属性或方法	描述
axes()	返回行轴标签的列表。
dtype()	返回对象的dtype。
empty()	如果series为空，则返回True。
ndim()	根据定义1返回基础数据的维度数。
size()	返回基础数据中元素的数量。
values()	将该序列作为ndarray返回。
index()	返回索引。
head()	返回前n行。
tail()	返回最后n行。

Series的运算

与标量运算

sr1 = pd.Series(np.arange(5),index=[list('abcde')])
print(sr1*2)

结果

a    0
b    2
c    4
d    6
e    8
dtype: int32

两个Series计算

sr1 = pd.Series(np.arange(5),index=[list('abcde')])
sr2 = sr1.copy()
print(sr1+sr2+10)

结果

a    10
b    12
c    14
d    16
e    18
dtype: int32

切片和索引

从标签索引

a = pd.Series([2,3,4,5],index=['a','b','c','d'])
print(a['a'])   #从标签索引

输出

从下标索引

和数组的操作一样

a = pd.Series([2,3,4,5],index=['a','b','c','d'])
print(a[0])   #从下标索引

输出

字典创建Series的键索引

# 从字典创建Series
sr = pd.Series({'a':0,'b':1,'c':2,'d':3})
print(sr['a'])  #键索引

输出

花式索引

# 从字典创建Series
sr = pd.Series({'a':0,'b':1,'c':2,'d':3})
print(sr[[1,3]])		#下标花式索引
print(sr[['b','c']])    #标签花式索引

输出

b    1
d    3
dtype: int64
b    1
c    2
dtype: int64

布尔值索引

# 从字典创建Series
sr = pd.Series({'a':0,'b':1,'c':2,'d':3})
print(sr[sr>1])		#布尔值索引，找出所有大于1的元素

输出

c    2
d    3
dtype: int64

使用下标切片

a = pd.Series([2,3,4,5],index=['a','b','c','d'])
b = a[1:3]		#前包后不包
print(b)

输出

b    3
c    4
dtype: int64

使用索引切片

a = pd.Series([2,3,4,5],index=['a','b','c','d'])
b = a['b':'d']	#前包后也包
print(b)

输出

b    3
c    4
d    5
dtype: int64

整数索引问题

整数索引的pandas对象往往会使新手抓狂。Series的下标与标签可能产生冲突。

解决方法：使用loc属性和iloc属性

loc属性：将索引解释为标签
iloc属性：将索引解释为下标

sr = pd.Series(np.arange(20))
sr2 = sr[10:].copy()
print(sr2.loc[10])  #这个10解释为标签
print(sr2.iloc[9])  #这个9解释为下标
a = np.arange(20)
print(a[-1])        #打印最后一个

输出

10
19
19

Series数据对齐

Pandas在进行两个Series对象的运算是，会按照索引进行对齐然后计算。

#Series数据对齐
sr1 = pd.Series([32,15,42],index=['a','b','c'])
sr2 = pd.Series([21,55,11],index=['c','a','b'])
sr3 = sr1+sr2
print(sr3)

输出

a    87
b    26
c    63
dtype: int64

两个Series不一样长的时候，仍然会按照索引进行对齐然后计算。但是遇到其中一个Series的标签有缺失时，会将这个标签的数据补充为NaN（Not a Number），作为Pandas中的数据缺失值，并且数据类型默认为浮点型。

#Series数据对齐
sr1 = pd.Series([32,15,42,10],index=['a','b','c','d'])
sr2 = pd.Series([21,55,11],index=['c','a','b'])
sr3 = sr1+sr2
print(sr3)

输出

a    87.0
b    26.0
c    63.0
d     NaN
dtype: float64

缺失数据处理

让我们考虑一个产品的在线调查。很多时候，人们不会分享与他们有关的所有信息。很少有人分享他们的经验，但他们没有多久使用该产品; 很少有人分享他们使用产品的时间，他们的经验，但不是他们的联系信息。因此，以某种方式或其他方式，一部分数据总是会丢失，这在实时中非常普遍。让我们看看我们如何处理使用Pandas的缺失值（如NA或NaN）
相关方法：

方法	描述
isnull	判断是否为缺失值
notnull	判断不是缺失值
fillna	填充缺失值
dropna	删除缺失值，含有axis 参数。默认情况下，axis = 0，即沿着行，这意味着如果行内的任何值为NA，则排除整行。

例子：

sr1 = pd.Series([32,15,42,10],index=['a','b','c','d'])
sr2 = pd.Series([21,55,11],index=['c','a','b'])
sr3 = sr1+sr2
print(sr3)

输出

a    87.0
b    26.0
c    63.0
d     NaN
dtype: float64

notnull

print(sr3.notnull())

输出

a     True
b     True
c     True
d    False
dtype: bool

isnull

print(sr3.isnull())

输出

a    False
b    False
c    False
d     True
dtype: bool

dropna

print(sr3.dropna())     #扔掉所有NA

输出

a    87.0
b    26.0
c    63.0
dtype: float64

fillna

把NA填为0

print(sr3.fillna(0))    #把NA填为0

输出

a    87.0
b    26.0
c    63.0
d     0.0
dtype: float64

把NA填为平均值

print(sr3.fillna(sr3.mean()))    #把NA填为平均值

输出

a    87.000000
b    26.000000
c    63.000000
d    58.666667
dtype: float64

ngany

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
Python基础教程（三）Pandas--Series

编程字典Pandas教程 http://codingdict.com/article/8270清华计算机博士带你学-Python金融量化分析 https://www.bilibili.com/video/BV1i741147LS?t1.介绍与安装Pandas是一个开源的Python库，使用其强大的数据结构提供高性能的数据处理和分析工具，是基于Numpy构建的。Pandas这个名字源自面板数据 - 来自多维数据的计量经济学。2008年，开发人员Wes McKinney在需要高性能，灵活的数据分析工具时
复制链接

扫一扫

专栏目录