pandas标签数组运算

最新推荐文章于 2023-06-12 17:55:36 发布

qingyd

最新推荐文章于 2023-06-12 17:55:36 发布

阅读量550

点赞数

分类专栏：学习资料

原文链接：https://www.shiyanlou.com/courses/1091

版权

本文介绍了Pandas库中的数据结构Series和DataFrame的创建、操作和运算，包括如何创建一维和二维数组，DataFrame的复制、缺失值处理，以及索引、切片、运算和统计分析。同时，详细讲解了Series和DataFrame的多种操作，如按索引或位置查询、数据清洗、预处理、绘图等，旨在帮助读者深入理解Pandas在数据处理中的应用。

摘要由CSDN通过智能技术生成

Pandas 是基于 NumPy 的一种数据处理工具。数据结构： Series（一维数组），DataFrame（二维数组）的应用的最为广泛。
它可以包含任何数据类型：整数，字符串，浮点数，Python 对象等。它是带标签的数组，可以通过标签来定位。

导入 Pandas：import pandas as pd
查看 Pandas 版本信息：print(pd.version)

创建一维数组

从列表创建 Series：
arr = [0, 1, 2, 3, 4]
s1 = pd.Series(arr)
从 Ndarray 创建 Series：
import numpy as np
n = np.random.randn(5) # 创建一个随机 Ndarray 数组
index = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]
s2 = pd.Series(n, index=index)
从字典创建 Series：
d = {‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4, ‘e’: 5} # 定义示例字典
s3 = pd.Series(d)
Series 基本操作
修改 Series 索引：
s1.index = [‘A’, ‘B’, ‘C’, ‘D’, ‘E’] # 修改后的索引
Series 纵向拼接：
s4 = s3.append(s1) # 将 s1 拼接到 s3
Series 按指定索引删除元素：
s4 = s4.drop(‘e’) # 删除索引为 e 的值
Series 修改指定索引元素：
s4[‘A’] = 6 # 修改索引为 A 的值 = 6
Series 按指定索引查找元素：
s4[‘B’]
Series 切片操作：
例如对s4的前 3 个数据访问
s4[:3]
Series 运算
Series 加法运算：
Series 的加法运算是按照索引计算，如果索引不同则填充为 NaN（空值）。
s4.add(s3)
Series 减法运算：
Series的减法运算是按照索引对应计算，如果不同则填充为 NaN（空值）。
s4.sub(s3)
Series 乘法运算：
Series 的乘法运算是按照索引对应计算，如果索引不同则填充为 NaN（空值）。
s4.mul(s3)
Series 除法运算：
Series 的除法运算是按照索引对应计算，如果索引不同则填充为 NaN（空值）。
s4.div(s3)
Series 求中位数：
s4.median()
Series 求和：
s4.sum()
Series 求最大值：
s4.max()
Series 求最小值：
s4.min()

创建二维数组

通过 NumPy 数组创建 DataFrame：
dates = pd.date_range(‘today’, periods=6)
num_arr = np.random.randn(6, 4)
columns = [‘A’, ‘B’, ‘C’, ‘D’]
df1 = pd.DataFrame(num_arr, index=dates, columns=columns)
通过字典数组创建 DataFrame：
data = {‘animal’: [‘cat’, ‘cat’, ‘snake’, ‘dog’, ‘dog’, ‘cat’, ‘snake’, ‘cat’, ‘dog’, ‘dog’],
‘age’: [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
‘visits’: [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
‘priority’: [‘yes’, ‘yes’, ‘no’, ‘yes’, ‘no’, ‘no’, ‘no’, ‘yes’, ‘no’, ‘no’]}
labels = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
df2 = pd.DataFrame(data, index=labels)
查看 DataFrame 的数据类型：
df2.dtypes
DataFrame 基本操作
预览 DataFrame 的前 5 行数据：
此方法对快速了解陌生数据集结构十分有用。
df2.head() # 默认为显示 5 行，可根据需要在括号中填入希望预览的行数
查看 DataFrame 的后 3 行数据：
df2.tail(3)
查看 DataFrame 的索引：
df2.index
Index([‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’], dtype=‘object’)
查看 DataFrame 的列名：
df2.columns
Index([‘animal’, ‘age’, ‘visits’, ‘priority’], dtype=‘object’)
查看 DataFrame 的数值：
df2.values
查看 DataFrame 的统计数据：
df2.describe()
DataFrame 转置操作：
df2.T
copy
对 DataFrame 进行按列排序：
df2.sort_values(by=‘age’) # 按 age 升序排列
对 DataFrame 数据切片：
df2[1:3]
对 DataFrame 通过标签查询（单列）：
df2[‘age’]
对 DataFrame 通过标签查询（多列）：
df2[[‘age’, ‘animal’]] # 传入一个列名组成的列表
对 DataFrame 通过位置查询：
df2.iloc[1:3] # 查询 2，3 行
DataFrame 副本拷贝：

生成 DataFrame 副本，方便数据集被多个不同流程使用

df3 = df2.copy()

判断 DataFrame 元素是否为空：
df3.isnull() # 如果为空则返回为 True
copy
animal age visits priority
a False False False False
b False False False False
c False False False False
d False True False False
e False False False False
f False False False False
g False False False False
h False True False False
i False False False False
j False False False False
copy
添加列数据：
num = pd.Series([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],