Python酷库之旅-第三方库Pandas(057)

神奇夜光杯

于 2024-08-01 07:45:00 发布

阅读量487

点赞数 14

分类专栏： Myelsa的Python酷库之旅文章标签： python pandas 开发语言人工智能标准库及第三方库 excel 学习与成长

本文链接：https://blog.csdn.net/ygb_1024/article/details/140799602

版权

Myelsa的Python酷库之旅专栏收录该内容

101 篇文章 20 订阅

订阅专栏

一、用法精讲

216、pandas.Series.filter方法

216-1、语法

216-2、参数

216-3、功能

216-4、返回值

216-5、说明

216-6、用法

216-6-1、数据准备

216-6-2、代码示例

216-6-3、结果输出

217、pandas.Series.backfill方法

217-1、语法

217-2、参数

217-3、功能

217-4、返回值

217-5、说明

217-6、用法

217-6-1、数据准备

217-6-2、代码示例

217-6-3、结果输出

218、pandas.Series.dropna方法

218-1、语法

218-2、参数

218-3、功能

218-4、返回值

218-5、说明

218-6、用法

218-6-1、数据准备

218-6-2、代码示例

218-6-3、结果输出

219、pandas.Series.ffill方法

219-1、语法

219-2、参数

219-3、功能

219-4、返回值

219-5、说明

219-6、用法

219-6-1、数据准备

219-6-2、代码示例

219-6-3、结果输出

220、pandas.Series.fillna方法

220-1、语法

220-2、参数

220-3、功能

220-4、返回值

220-5、说明

220-6、用法

一、用法精讲

216、pandas.Series.filter方法

216-1、语法

# 216、pandas.Series.filter方法
pandas.Series.filter(items=None, like=None, regex=None, axis=None)
Subset the dataframe rows or columns according to the specified index labels.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Parameters:
items
list-like
Keep labels from axis which are in items.

like
str
Keep labels from axis for which “like in label == True”.

regex
str (regular expression)
Keep labels from axis for which re.search(regex, label) == True.

axis
{0 or ‘index’, 1 or ‘columns’, None}, default None
The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, ‘columns’ for DataFrame. For Series this parameter is unused and defaults to None.

Returns:
same type as input object

216-2、参数

216-2-1、items(可选，默认值为None)：表示要选择的标签列表，如果提供了此参数，like和regex参数将被忽略。例如，items=['a'、'b'、'c']将选择标签为'a', 'b'和'c'的元素。

216-2-2、like(可选，默认值为None)：如果提供了此字符串，则将选择标签中包含该字符串的元素。例如，like='one'将选择所有标签中包含'one'的元素。

216-2-3、regex(可选，默认值为None)：如果提供了此正则表达式模式，则将选择匹配该模式的标签。例如，regex='^a.*e$'将选择所有标签匹配此正则表达式的元素。

216-2-4、axis(可选，默认值为None)：无论提供何值，都会被忽略。

216-3、功能

从Series中筛选数据，返回与指定条件匹配的元素。

216-4、返回值

返回一个新的Series对象，包含匹配指定条件的元素。

216-5、说明

使用场景：

216-5-1、数据清理：在数据清理过程中，可以使用filter方法来选择特定的标签或模式，去除不需要的数据。例如，从一列数据中提取特定模式的标签或去除某些无关标签。

216-5-2、特定字段筛选：在处理包含多个字段的Series时，可以使用filter方法快速选择包含特定字符串的标签。例如，选择所有标签中包含特定关键字的元素。

216-5-3、数据探索：在数据探索阶段，通过filter方法快速筛选出匹配特定条件的标签，有助于快速查看和分析特定子集的数据。例如，查看所有标签中包含特定子字符串的数据，以便进一步分析。

216-5-4、日志和文本数据处理：在处理日志文件或大量文本数据时，可以使用filter方法按特定模式筛选标签，从而提取相关的日志条目或文本内容。例如，从日志数据中提取包含特定关键字的条目。

216-5-5、定制报告：在生成定制报告时，可以使用filter方法按特定标签或模式选择数据，从而生成更具针对性的报告。例如，生成只包含特定日期或关键字的数据报告。

216-6、用法

216-6-1、数据准备

无

216-6-2、代码示例

# 216、pandas.Series.filter方法
# 216-1、数据清理
import pandas as pd
# 创建一个Series对象
data = pd.Series([1, 2, 3, 4, 5], index=['one', 'two', 'three', 'four', 'five'])
# 选择标签为'one'和'three'的元素
cleaned_data = data.filter(items=['one', 'three'])
print("数据清理示例:")
print(cleaned_data)

# 216-2、特定字段筛选
import pandas as pd
# 创建一个Series对象
data = pd.Series([1, 2, 3, 4, 5], index=['one', 'two', 'three', 'four', 'five'])
# 使用like参数筛选标签中包含'o'的元素
selected_data = data.filter(like='o')
print("\n特定字段筛选示例:")
print(selected_data)

# 216-3、数据探索
# 创建一个包含日期的Series对象
dates = pd.Series([10, 20, 30, 40, 50], index=['2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01', '2024-05-01'])
# 使用like参数筛选包含'2024-03'的元素
march_data = dates.filter(like='2024-03')
print("\n数据探索示例:")
print(march_data)

# 216-4、日志和文本数据处理
import pandas as pd
# 创建一个包含日志数据的Series对象
logs = pd.Series(['Error: Disk full', 'Warning: CPU high', 'Info: System rebooted'],
                 index=['log1', 'log2', 'log3'])
# 使用regex参数筛选包含'Error'的日志条目
error_logs = logs.filter(regex='Error')
print("\n日志和文本数据处理示例:")
print(error_logs)

216-6-3、结果输出

# 216、pandas.Series.filter方法
# 216-1、数据清理
# 数据清理示例:
# one      1
# three    3
# dtype: int64

# 216-2、特定字段筛选
# 特定字段筛选示例:
# one     1
# two     2
# four    4
# dtype: int64

# 216-3、数据探索
# 数据探索示例:
# 2024-03-01    30
# dtype: int64

# 216-4、日志和文本数据处理
# 日志和文本数据处理示例:
# Series([], dtype: object)

217、pandas.Series.backfill方法

217-1、语法

# 217、pandas.Series.backfill方法
pandas.Series.backfill(*, axis=None, inplace=False, limit=None, downcast=_NoDefault.no_default)
Fill NA/NaN values by using the next valid observation to fill the gap.

Deprecated since version 2.0: Series/DataFrame.backfill is deprecated. Use Series/DataFrame.bfill instead.

Returns:
Series/DataFrame or None
Object with missing values filled or None if inplace=True.

217-2、参数

217-2-1、axis(可选，默认值为None)：对于Series对象，这个参数没有影响，因为Series是一维的；对于DataFrame对象，指定填充的轴，0或'index'为填充列方向，1或'columns'为填充行方向。

217-2-2、inplace(可选，默认值为False)：如果是False，返回一个新的Series对象，不修改原始数据；如果是True，直接在原Series对象上进行操作，修改原始数据。

217-2-3、limit(可选，默认值为None)：指定填充的最大数量。比如，如果设置为2，那么最多只会向前填充2个缺失值。

217-2-4、downcast(可选)：用于在可能的情况下，将数据类型转换为更低精度的类型。比如，将float64转换为float32。

217-3、功能

用于填充Series中的缺失值，填充值取自下一有效值。

217-4、返回值

返回一个新的Series对象(如果inplace=False)，其中缺失值已用下一有效值填充，如果inplace=True，则返回None，原Series对象会被修改。

217-5、说明

此方法已经简化为：pandas.Series.bfill。

217-6、用法

217-6-1、数据准备

无

217-6-2、代码示例

# 217、pandas.Series.backfill方法
import pandas as pd
import numpy as np
# 创建一个包含缺失值的Series对象
data = pd.Series([1, np.nan, np.nan, 2, np.nan, 3, np.nan])
# 使用backfill方法填充缺失值
filled_data = data.backfill()
print("原始数据:")
print(data)
print("\n填充后的数据:")
print(filled_data)

217-6-3、结果输出

# 217、pandas.Series.backfill方法
# 原始数据:
# 0    1.0
# 1    NaN
# 2    NaN
# 3    2.0
# 4    NaN
# 5    3.0
# 6    NaN
# dtype: float64
# 
# 填充后的数据:
# 0    1.0
# 1    2.0
# 2    2.0
# 3    2.0
# 4    3.0
# 5    3.0
# 6    NaN
# dtype: float64

218、pandas.Series.dropna方法

218-1、语法

# 218、pandas.Series.dropna方法
pandas.Series.dropna(*, axis=0, inplace=False, how=None, ignore_index=False)
Return a new Series with missing values removed.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:
axis{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

inplacebool, default False
If True, do operation inplace and return None.

howstr, optional
Not in use. Kept for compatibility.

ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 2.0.0.

Returns:
Series or None
Series with NA entries dropped from it or None if inplace=True.

218-2、参数

218-2-1、axis(可选，默认值为0)：在Series对象中并不实际使用，因为Series是一维的数据结构，所以默认的轴为0(行)。

218-2-2、inplace(可选，默认值为False)：如果为True，则直接在原Series对象上进行操作，并返回None；如果为False，则返回一个新的Series对象，原对象保持不变。

218-2-3、how(可选，默认值为None)：此参数在Series对象中无效，主要在DataFrame中使用，指示如何移除缺失值的行或列(例如，any或all)。

218-2-4、ignore_index(可选，默认值为False)：如果为True，则返回的新Series对象会重新设置索引，也就是说，删除NaN后的索引会从0开始递增；如果为False，则保持原始索引不变。

218-3、功能

用于从Series对象中移除缺失值，这在数据清理和预处理阶段非常有用，可以帮助我们去除不完整的数据点。

218-4、返回值

如果inplace=True，则返回None，并对原Series对象进行修改；如果inplace=False，则返回一个新的Series对象，其中所有缺失值(NaN)都已被移除。

218-5、说明

无

218-6、用法

218-6-1、数据准备

无

218-6-2、代码示例

# 218、pandas.Series.dropna方法
import pandas as pd
import numpy as np
# 创建一个包含缺失值的Series对象
data = pd.Series([1, np.nan, 2, np.nan, 3, 4, np.nan])
# 使用dropna方法移除缺失值
cleaned_data = data.dropna()
print("原始数据:")
print(data)
print("\n移除缺失值后的数据:")
print(cleaned_data)

218-6-3、结果输出

# 218、pandas.Series.dropna方法
# 原始数据:
# 0    1.0
# 1    NaN
# 2    2.0
# 3    NaN
# 4    3.0
# 5    4.0
# 6    NaN
# dtype: float64
# 
# 移除缺失值后的数据:
# 0    1.0
# 2    2.0
# 4    3.0
# 5    4.0
# dtype: float64

219、pandas.Series.ffill方法

219-1、语法

# 219、pandas.Series.ffill方法
pandas.Series.ffill(*, axis=None, inplace=False, limit=None, limit_area=None, downcast=_NoDefault.no_default)
Fill NA/NaN values by propagating the last valid observation to next valid.

Parameters:
axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame
Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

inplacebool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

limit_area{None, ‘inside’, ‘outside’}, default None
If limit is specified, consecutive NaNs will be filled with this restriction.

None: No fill restriction.

‘inside’: Only fill NaNs surrounded by valid values (interpolate).

‘outside’: Only fill NaNs outside valid values (extrapolate).

New in version 2.2.0.

downcastdict, default is None
A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Deprecated since version 2.2.0.

Returns:
Series/DataFrame or None
Object with missing values filled or None if inplace=True.

219-2、参数

219-2-1、axis(可选，默认值为None)：在Series对象中并不实际使用，因为Series是一维的数据结构，所以这个参数并不起作用。

219-2-2、inplace(可选，默认值为False)：如果为True，则直接在原Series对象上进行操作，并返回None；如果为False，则返回一个新的Series对象，原对象保持不变。

219-2-3、limit(可选，默认值为None)：限制填充的最大数量，如果设置了limit，则只会填充最多limit个连续的NaN值。

219-2-4、limit_area(可选，默认值为None)：可选择填充区域，可选'inside'和'outside'，'inside'只填充内部的缺失值，而'outside'填充两端的缺失值。

219-2-5、downcast(可选)：指定是否尝试将填充后的数据转换为指定的数据类型。例如，可以将浮点数转换为整数。

219-3、功能

用于沿指定轴使用前一个有效值填充缺失值，该方法也称为前向填充(Forward Fill)，特别适用于时间序列数据处理。

219-4、返回值

如果inplace=True，则返回None，并对原Series对象进行修改；如果inplace=False，则返回一个新的Series对象，其中所有缺失值(NaN)都已使用前一个有效值填充。

219-5、说明

无

219-6、用法

219-6-1、数据准备

无

219-6-2、代码示例

# 219、pandas.Series.ffill方法
import pandas as pd
import numpy as np
# 创建一个包含缺失值的Series对象
data = pd.Series([1, np.nan, np.nan, 2, np.nan, 3, 4, np.nan])
# 使用ffill方法填充缺失值
filled_data = data.ffill()
print("原始数据:")
print(data)
print("\n前向填充后的数据:")
print(filled_data)

219-6-3、结果输出

# 219、pandas.Series.ffill方法
# 原始数据:
# 0    1.0
# 1    NaN
# 2    NaN
# 3    2.0
# 4    NaN
# 5    3.0
# 6    4.0
# 7    NaN
# dtype: float64
# 
# 前向填充后的数据:
# 0    1.0
# 1    1.0
# 2    1.0
# 3    2.0
# 4    2.0
# 5    3.0
# 6    4.0
# 7    4.0
# dtype: float64

220、pandas.Series.fillna方法

220-1、语法

# 220、pandas.Series.fillna方法
pandas.Series.fillna(value=None, *, method=None, axis=None, inplace=False, limit=None, downcast=_NoDefault.no_default)
Fill NA/NaN values using the specified method.

Parameters:
valuescalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

method{‘backfill’, ‘bfill’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series:

ffill: propagate last valid observation forward to next valid.

backfill / bfill: use next valid observation to fill gap.

Deprecated since version 2.1.0: Use ffill or bfill instead.

axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame
Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.

inplacebool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).

limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

downcastdict, default is None
A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Deprecated since version 2.2.0.

Returns:
Series/DataFrame or None
Object with missing values filled or None if inplace=True.

220-2、参数

220-2-1、value(可选，默认值为None)：用于填充缺失值的标量值、字典、Series或DataFrame，如果指定了method参数，则该参数将被忽略。

220-2-2、method(可选，默认值为None)：填充方法，有以下几种选项：

220-2-2-1、'ffill'或'pad'：向前填充，即用前一个非缺失值填充缺失值。

220-2-2-2、'bfill'或'backfill'：向后填充，即用后一个非缺失值填充缺失值。

220-2-3、axis(可选，默认值为None)：仅适用于DataFrame，指定填充的轴，0或'index'表示按行填充，1或'columns'表示按列填充；对于Series来说，该参数无效。

220-2-4、inplace(可选，默认值为False)：布尔值，如果设置为True，将会在原对象上进行操作，而不会返回新的对象。

220-2-5、limit(可选，默认值为None)：整数值，表示填充时的最大次数，即在填充时最多替换多少个缺失值。

220-2-6、downcast(可选)：字典或字符串，用于向下转换数据类型。例如，{'float':'int'}可以将浮点型数据向下转换为整型数据。

220-3、功能

将缺失值填充为指定的值或者使用特定的方法进行填充，这样可以处理数据中的缺失值问题，确保数据完整性。

220-4、返回值

如果inplace=False，则返回填充缺失值后的新Series；如果inplace=True，则直接修改原Series，返回值为None。

220-5、说明

无

220-6、用法

220-6-1、数据准备

无

220-6-2、代码示例

# 220、pandas.Series.fillna方法
import pandas as pd
import numpy as np
# 创建一个含有缺失值的Series
s = pd.Series([1, 2, np.nan, 4, np.nan, 5])
# 用指定的值填充缺失值
filled_s = s.fillna(value=0)
print(filled_s)
# 向前填充缺失值
ffill_s = s.fillna(method='ffill')
print(ffill_s)
# 向后填充缺失值
bfill_s = s.fillna(method='bfill')
print(bfill_s)
# 使用inplace填充
s.fillna(value=0, inplace=True)
print(s)

220-6-3、结果输出

# 220、pandas.Series.fillna方法
# 0    1.0
# 1    2.0
# 2    0.0
# 3    4.0
# 4    0.0
# 5    5.0
# dtype: float64
# 0    1.0
# 1    2.0
# 2    2.0
# 3    4.0
# 4    4.0
# 5    5.0
# dtype: float64
# 0    1.0
# 1    2.0
# 2    4.0
# 3    4.0
# 4    5.0
# 5    5.0
# dtype: float64
# 0    1.0
# 1    2.0
# 2    0.0
# 3    4.0
# 4    0.0
# 5    5.0
# dtype: float64