【Python基础知识库】Pandas库中的函数

最新推荐文章于 2024-08-23 09:48:05 发布

holysll

最新推荐文章于 2024-08-23 09:48:05 发布

阅读量8.2k

点赞数 5

分类专栏： Python基础知识库 Python 文章标签： Python基础知识库 Pandas Series DataFrame Matplotlib

本文链接：https://blog.csdn.net/holysll/article/details/89396976

版权

本文详细介绍了Python数据分析库Pandas的基础知识，包括Series和DataFrame两种核心数据结构。Pandas结合了Numpy的数组计算和字典的操作，方便进行数据清洗和分析。Series是一种带索引的一维数组，而DataFrame是二维表格型数据结构，支持多种数据类型。文中列举了Series和DataFrame的创建、属性、操作和转换方法，以及在数据处理和分析中的应用。

摘要由CSDN通过智能技术生成

Pandas 是基于NumPy 的一种工具，支持大部分的Numpy语言风格的数组计算，尤其是数组函数以及没有for循环的各种数据处理。尽管Pandas采用了很多Numpy的代码风格，但最大的不同在于Pandas是用来处理表格型或异质型数据的，而Numpy更适合处理同质型的数值类数组数据。

Pandas 所包含的数据结构和数据处理工具的设计，使得数据清洗和分析非常快捷，并经常和其他Numpy、Scipy数值计算工具、数据可视化工具Matplotlib结合起来使用，其中大量库和一些标准的数据模型、函数和方法，支持着大型数据集的高效处理。

Pandas常用的数据结构：Series和DataFrame。

import pandas as pd
from pandas import Series, DataFrame

一、Series

Series是一种以为的数组型对象，它包含了一个值序列（与Numpy中的类型相似），并且包含了数据标签，成为索引（index）。Series可以运用ndarray或字典的几乎所有索引操作和函数，融合了字典和ndarray的优点。

1、Series创建索引序列

（1）数组自动索引

import pandas as pd
import numpy as np
a = pd.Series([9,8,7,6])
a
Out[17]: 
0    9
1    8
2    7
3    6
dtype: int64

（2）自定义索引

import pandas as pd
import numpy as np
b = pd.Series([9,8,7,6], index=['a','b','c','d'])
b
Out[18]: 
a    9
b    8
c    7
d    6
dtype: int64

（3）固定值创建

import pandas as pd
import numpy as np
c = pd.Series(25, index=['a','b','c','d'])
c
Out[19]: 
a    25
b    25
c    25
d    25
dtype: int64

（4）从字典类型创建，键值对中的键是索引

import pandas as pd
import numpy as np
d = pd.Series({'a':9, 'b':8, 'c':7, 'd':6})
d
Out[20]: 
a    9
b    8
c    7
d    6
dtype: int64


e = pd.Series({'a':9, 'b':8, 'c':7, 'd':6}, index=['a','b','c','d','e'])
e
Out[21]: 
a    9.0
b    8.0
c    7.0
d    6.0
e    NaN
dtype: float64

（5）从ndarray类型创建

import pandas as pd
import numpy as np
f = pd.Series(np.arange(5))
g = pd.Series(np.arange(5), index=np.arange(6,1,-1))
print(f)
print(g)

Out[22]: 
0    0
1    1
2    2
3    3
4    4
dtype: int32
6    0
5    1
4    2
3    3
2    4
dtype: int32

注意：Series类型会自动生成默认索引，当自动索引和自定义索引并存，全当自定义索引。

2. Series常用属性

（1）values和index获取Series对象的值和索引

import pandas as pd
import numpy as np
b = pd.Series([9,8,7,6], index=['a','b','c','d'])
b.index

Out[29]: Index(['a', 'b', 'c', 'd'], dtype='object')

b.values
Out[30]: array([9, 8, 7, 6], dtype=int64)

b[1]
Out[31]: 8

b['b']
Out[32]: 8         

b[['b','c','d',0]]
Out[33]: 
b    8.0
c    7.0
d    6.0
0    NaN
dtype: float64

（2）对象的值的name和索引index的name

import pandas as pd
import numpy as np
b = pd.Series([9,8,7,6], index=['a','b','c','d'])
b.name
b.name = 'Series对象'
b.index.name = '索引列'
b

Out[35]: 
索引列
a    9
b    8
c    7
d    6
Name: Series对象, dtype: int64

（3）支持ndarry的属性，如dtype、shape、ndim、T等，就不一一举例了。下表从官方文档中拉取Series的Attribute，以便查询。

`Series.array`	支持此系列或索引的数据的ExtensionArray。
`Series.values`	返回系列为ndarray或ndarray-like取决于dtype。
`Series.dtype`	返回基础数据的dtype对象。
`Series.ftype`	如果数据稀疏则返回。
`Series.shape`	返回基础数据形状的元组。
`Series.nbytes`	返回基础数据中的字节数。
`Series.ndim`	根据定义1，基础数据的维数。
`Series.size`	返回基础数据中的元素数。
`Series.strides`	返回基础数据的步幅。
`Series.itemsize`	返回基础数据项的dtype大小。
`Series.base`	如果共享基础数据的内存，则返回基础对象。
`Series.T`	返回转置，根据定义自我。
`Series.memory_usage(`[index，deep])	返回系列的内存使用情况。
`Series.hasnans`	如果我有任何nans，我会回来; 实现各种性能加速。
`Series.flags`
`Series.empty`	判断Series是否为空，返回值为布尔类型
`Series.dtypes`	返回基础数据的dtype对象。
`Series.ftypes`	如果数据稀疏则返回。
`Series.data`	返回基础数据的数据指针。
`Series.is_copy`	退回副本。
`Series.name`	返回系列的名称。
`Series.put`（* args，** kwargs）	将put方法应用于其values属性（如果有

3. Series转换函数

`Series.astype`（dtype [，copy，errors]）	将pandas对象转换为指定的dtype `dtype`。
`Series.infer_objects`（）	尝试推断对象列的更好的dtypes。
`Series.convert_objects`（[convert_dates，...]）	（DEPRECATED）尝试推断对象列的更好dtype。
`Series.copy`（[deep]）	复制此对象的索引和数据。
`Series.bool`（）	返回单个元素PandasObject的bool。
`Series.to_numpy`（[dtype，copy]）	NumPy ndarray表示此系列或索引中的值。
`Series.to_period`（[freq，copy]）	将Series从DatetimeIndex转换为具有所需频率的PeriodIndex（如果未传递则从索引推断）。
`Series.to_timestamp`（[freq，how，copy]）	在期间开始时转换为时间戳的时间索引。
`Series.to_list`（）	返回值列表。
`Series.get_values`（）	与值相同（但处理稀疏转换），是一种展示。
`Series.__array__`（[dtype]）	将值作为NumPy数组返回。

4.Series索引、迭代函数

`Series.get`（ket[，default]）	从给定键的对象获取项目（DataFrame列，Panel切片等）。
`Series.at`	访问行/列标签对的单个值。
`Series.iat`	按整数位置访问行/列对的单个值。
`Series.loc`	按标签或布尔数组访问一组行和列。
`Series.iloc`	纯粹基于整数位置的索引，用于按位置选择。
`Series.__iter__`（）	返回值的迭代器。
`Series.iteritems`（）	懒惰地迭代（索引，值）元组。
`Series.items`（）	懒惰地迭代（索引，值）元组。
`Series.keys`（）	索引的别名。
`Series.pop`（item）	返回项目并从框架中删除。
`Series.item`（）	将基础数据的第一个元素作为python标量返回。
`Series.xs`（key[，axis，level，drop_level]）	返回Series / DataFrame的横截面。

5. 二元运算符函数

`Series.add`（other[，level，fill_value，axis]）	添加系列和其他元素（二元运算符添加）。
`Series.sub`（other[，level，fill_value，axis]）	减去序列和其他元素（二元运算符子）。
`Series.mul`（other[，level，fill_value，axis]）	系列和其他的乘法，元素（二元运算符mul）。
`Series.div`（other[，level，fill_value，axis]）	浮动分裂系列和其他，元素方式（二元运算符truediv）。
`Series.truediv`（other[，level，fill_value，axis]）	浮动分裂系列和其他，元素方式（二元运算符truediv）。
`Series.floordiv`（other[，level，fill_value，axis]）	系列和其他的整数除法，元素方式（二元运算符floordiv）。
`Series.mod`（other[，level，fill_value，axis]）	系列和其他的模数，元素方式（二元运算符mod）。
`Series.pow`（other[，level，fill_value，axis]）	系列和其他元素的指数幂（二元运算符pow）。
`Series.radd`（other[，level，fill_value，axis]）	添加系列和其他元素（二元运算符radd）。
`Series.rsub`（other[，level，fill_value，axis]）	系列和其他元素的减法（二元运算符rsub）。
`Series.rmul`（other[，level，fill_value，axis]）	系列和其他的乘法，元素（二元运算符rmul）。
`Series.rdiv`（other[，level，fill_value，axis]）	系列和其他的浮动划分，元素方式（二元算子rtruediv）。
`Series.rtruediv`（other[，level，fill_value，axis]）	系列和其他的浮动划分，元素方式（二元算子rtruediv）。
`Series.rfloordiv`（other[，level，fill_value，...]）	系列和其他的整数除法，元素方式（二元算子rfloordiv）。
`Series.rmod`（other[，level，fill_value，axis]）	系列和其他的模数，元素方式（二元运算符rmod）。
`Series.rpow`（other[，level，fill_value，axis]）	系列和其他元素的指数幂（二元运算符rpow）。
`Series.combine`（other，func [，fill_value]）	根据功能将系列与系列或标量组合。
`Series.combine_first`（other）	组合系列值，首先选择调用系列的值。
`Series.round`（[decimals]）	将系列中的每个值舍入到给定的小数位数。
`Series.lt`（other[，level，fill_value，axis]）	少于系列和其他元素（二元运算符lt）。
`Series.gt`（other[，level，fill_value，axis]）	大于系列和其他元素（二元运算符gt）。
`Series.le`（other[，level，fill_value，axis]）	小于或等于系列和其他元素（二元运算符le）。
`Series.ge`（other[，level，fill_value，axis]）	大于或等于系列和其他元素（二元运算符ge）。
`Series.ne`（other[，level，fill_value，axis]）	不等于系列和其他元素（二元运算符ne）。
`Series.eq`（other[，level，fill_value，axis]）	等于系列和其他元素（二元运算符eq）。
`Series.product`（[axis，skipna，level，...]）	返回请求轴的值的乘积。
`Series.dot`（other）	计算系列和其他列之间的点积。

6. 计算、描述性统计

`Series.abs`（）	返回具有每个元素的绝对数值的Series / DataFrame。
`Series.all`（[axis，bool_only，skipna，level]）	返回是否所有元素都是True，可能是在轴上。
`Series.any`（[axis，bool_only，skipna，level]）	返回任何元素是否为True，可能是在轴上。
`Series.autocorr`（[lag]）	计算lag-N自相关。
`Series.between`（left, right[，inclusive]）	返回布尔值系列等效于左<=系列<=右。
`Series.clip`（[lower，upper，axis，inplace]）	在输入阈值处修剪值。
`Series.clip_lower`（threshold[, axis, inplace]）	（已弃用）修剪低于给定阈值的值。
`Series.clip_upper`（threshold[, axis, inplace]）	（已弃用）修剪高于给定阈值的值。
`Series.corr`（other[，method，min_periods]）	计算与其他系列的相关性，不包括缺失值。
`Series.count`（[level]）	返回系列中非NA / null观测值的返回数。
`Series.cov`（other[，min_periods]）	计算与Series的协方差，不包括缺失值。
`Series.cummax`（[axis，skipna]）	返回DataFrame或Series轴上的累积最大值。
`Series.cummin`（[axis，skipna]）	返回DataFrame或Series轴上的累积最小值。
`Series.cumprod`（[axis，skipna]）	通过DataFrame或Series轴返回累积产品。
`Series.cumsum`（[axis，skipna]）	返回DataFrame或Series轴上的累积和。
`Series.describe`（[percentiles, include, exclude]）	生成描述性统计数据，总结数据集分布的集中趋势，分散和形状，不包括`NaN`值。
`Series.diff`（[periods]）	第一个离散的元素差异。
`Series.factorize`（[sort，na_sentinel]）	将对象编码为枚举类型或分类变量。
`Series.kurt`（[axis，skipna，level，numeric_only]）	使用Fisher对峰度的定义（正常峰度== 0.0），在请求的轴上返回无偏峰度。
`Series.mad`（[axis，skipna，level]）	返回请求轴的值的平均绝对偏差。
`Series.max`（[axis，skipna，level，numeric_only]）	返回请求轴的最大值。
`Series.mean`（[axis，skipna，level，numeric_only]）	返回请求轴的值的平均值。
`Series.median`（[axis，skipna，level，...]）	返回请求轴的值的中值。
`Series.min`（[axis，skipna，level，numeric_only]）	返回请求轴的最小值。
`Series.mode`（[dropna]）	返回数据集的模式。
`Series.nlargest`（[n，keep]）	返回最大的n个元素。
`Series.nsmallest`（[n，keep]）	返回最小的n个元素。
`Series.pct_change`（[periods，fill_method，...]）	当前元素和先前元素之间的百分比变化。
`Series.prod`（[axis，skipna，level，...]）	返回请求轴的值的乘积。
`Series.quantile`（[q，interpolation]）	给定分位数处的返回值。
`Series.rank`（[axis，method，numeric_only，...]）	沿轴计算数值数据等级（1到n）。
`Series.sem`（[axis，skipna，level，ddof，...]）	在请求的轴上返回均值的无偏标准误差。
`Series.skew`（[axis，skipna，level，numeric_only]）	返回请求轴的无偏偏差，由N-1归一化。
`Series.std`（[axis，skipna，level，ddof，...]）	返回请求轴