Python酷库之旅-第三方库Pandas(108)

神奇夜光杯

于 2024-08-29 08:00:00 发布

阅读量490

点赞数 19

分类专栏： Myelsa的Python酷库之旅文章标签： python pandas 开发语言人工智能标准库及第三方库 excel 学习与成长

本文链接：https://blog.csdn.net/ygb_1024/article/details/141596497

版权

Myelsa的Python酷库之旅专栏收录该内容

151 篇文章 42 订阅

订阅专栏

一、用法精讲

471、pandas.DataFrame.map方法

471-1、语法

471-2、参数

471-3、功能

471-4、返回值

471-5、说明

471-6、用法

471-6-1、数据准备

471-6-2、代码示例

471-6-3、结果输出

472、pandas.DataFrame.pipe方法

472-1、语法

472-2、参数

472-3、功能

472-4、返回值

472-5、说明

472-6、用法

472-6-1、数据准备

472-6-2、代码示例

472-6-3、结果输出

473、pandas.DataFrame.agg方法

473-1、语法

473-2、参数

473-3、功能

473-4、返回值

473-5、说明

473-6、用法

473-6-1、数据准备

473-6-2、代码示例

473-6-3、结果输出

474、pandas.DataFrame.aggregate方法

474-1、语法

474-2、参数

474-3、功能

474-4、返回值

474-5、说明

474-6、用法

474-6-1、数据准备

474-6-2、代码示例

474-6-3、结果输出

475、pandas.DataFrame.transform方法

475-1、语法

475-2、参数

475-3、功能

475-4、返回值

475-5、说明

475-6、用法

一、用法精讲

471、pandas.DataFrame.map方法

471-1、语法

# 471、pandas.DataFrame.map方法
pandas.DataFrame.map(func, na_action=None, **kwargs)
Apply a function to a Dataframe elementwise.

New in version 2.1.0: DataFrame.applymap was deprecated and renamed to DataFrame.map.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters:
func
callable
Python function, returns a single value from a single value.

na_action
{None, ‘ignore’}, default None
If ‘ignore’, propagate NaN values, without passing them to func.

**kwargs
Additional keyword arguments to pass as keywords arguments to func.

Returns:
DataFrame
Transformed DataFrame.

471-2、参数

471-2-1、func(必须)：应用于DataFrame中每个元素的函数，可以是一个函数名，一个字典(用于映射值)或一个Series(用于将索引对齐后赋值)。

471-2-2、na_action(可选，默认值为None)：指定如何处理缺失值(NaN)，如果设为'ignore'，则缺失值会被原样返回，不会传递给函数func，默认为None，这时缺失值会传递给func。

471-2-3、**kwargs(可选)：其他关键字参数，将传递给func。

471-3、功能

用于将一个函数应用到数据框中的每个元素，进行数据清洗、转换、生成新列等操作，如果使用字典或Series作为映射，则可以轻松地替换DataFrame中的值。

471-4、返回值

返回一个DataFrame，其中每个元素都是应用func后的结果，如果na_action设置为'ignore'，则对应的NaN值将保持不变。

471-5、说明

无

471-6、用法

471-6-1、数据准备

无

471-6-2、代码示例

# 471、pandas.DataFrame.map方法
import pandas as pd
# 创建一个示例DataFrame
data = {'A': [1, 2, None, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)
# 使用map方法将每个元素加倍
result = df.map(lambda x: x * 2)
print(result)

471-6-3、结果输出

# 471、pandas.DataFrame.map方法
#      A   B
# 0  2.0  10
# 1  4.0  12
# 2  NaN  14
# 3  8.0  16

472、pandas.DataFrame.pipe方法

472-1、语法

# 472、pandas.DataFrame.pipe方法
pandas.DataFrame.pipe(func, *args, **kwargs)
Apply chainable functions that expect Series or DataFrames.

Parameters:
func
function
Function to apply to the Series/DataFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame.

*args
iterable, optional
Positional arguments passed into func.

**kwargs
mapping, optional
A dictionary of keyword arguments passed into func.

Returns:
the return type of
func.

472-2、参数

472-2-1、func(必须)：一个接受DataFrame作为第一个参数的函数，该函数可以对DataFrame进行任何操作并返回一个DataFrame。

472-2-2、*args(可选)：元组，传递给func的额外位置参数。

472-2-3、**kwargs(可选)：传递给func的额外关键字参数。

472-3、功能

用于将DataFrame传递给一个函数(或方法)，从而实现自定义的数据处理流水线，该方法非常有用，因为它可以帮助你将多个操作链式连接起来，使代码更加清晰易读。

472-4、返回值

返回值是func处理后的DataFrame或者其他类型的结果，具体取决于函数的实现。

472-5、说明

无

472-6、用法

472-6-1、数据准备

无

472-6-2、代码示例

# 472、pandas.DataFrame.pipe方法
import pandas as pd
# 创建一个示例 DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 定义一个自定义函数
def add_columns(dataframe, col1, col2):
    dataframe['C'] = dataframe[col1] + dataframe[col2]
    return dataframe
# 使用pipe方法将DataFrame传递给自定义函数
result = df.pipe(add_columns, 'A', 'B')
print(result)

472-6-3、结果输出

# 472、pandas.DataFrame.pipe方法
#    A  B  C
# 0  1  4  5
# 1  2  5  7
# 2  3  6  9

473、pandas.DataFrame.agg方法

473-1、语法

# 473、pandas.DataFrame.agg方法
pandas.DataFrame.agg(func=None, axis=0, *args, **kwargs)
Aggregate using one or more operations over the specified axis.

Parameters:
funcfunction, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

function

string function name

list of functions and/or function names, e.g. [np.sum, 'mean']

dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
scalar, Series or DataFrame
The return can be:

scalar : when Series.agg is called with single function

Series : when DataFrame.agg is called with a single function

DataFrame : when DataFrame.agg is called with several functions.

473-2、参数

473-2-1、func(可选，默认值为None)：单个函数、可调用对象、函数的列表、字典等，要应用于数据的聚合函数，可以是字符串(如'sum', 'mean', 'max', 'min'等)，也可以是自定义函数，可以传递一个函数列表以应用多个聚合函数。

473-2-2、axis(可选，默认值为0)：{0或'index', 1或'columns'}，选择聚合的方向，0表示对列进行聚合(即按行聚合)，1表示对行进行聚合(即按列聚合)。

473-2-3、*args(可选)：元组，传递给聚合函数的额外位置参数。

473-2-4、**kwargs(可选)：传递给聚合函数的额外关键字参数。

473-3、功能

用于对DataFrame的行或列进行聚合操作，它允许你应用一个或多个函数，对数据进行汇总，从而获得更高层次的视图，该方法特别有用于数据分析中的统计计算。

473-4、返回值

返回一个Series或DataFrame，具体取决于参数的设置和所应用的函数。

473-5、说明

无

473-6、用法

473-6-1、数据准备

无

473-6-2、代码示例

# 473、pandas.DataFrame.agg方法
# 473-1、单个聚合函数
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 对每一列应用sum聚合函数
result = df.agg('sum')
print(result)

# 473-2、多个聚合函数
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 对每一列应用多个聚合函数
result = df.agg(['sum', 'mean'])
print(result)

# 473-3、按行聚合
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 对每一行应用聚合函数
result = df.agg('sum', axis=1)
print(result)

# 473-4、应用自定义函数
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 自定义聚合函数
def custom_func(x):
    return x.max() - x.min()
# 应用自定义函数
result = df.agg(custom_func)
print(result)

473-6-3、结果输出

# 473、pandas.DataFrame.agg方法
# 473-1、单个聚合函数
# A     6
# B    15
# dtype: int64

# 473-2、多个聚合函数
#         A     B
# sum   6.0  15.0
# mean  2.0   5.0

# 473-3、按行聚合
# 0    5
# 1    7
# 2    9
# dtype: int64

# 473-4、应用自定义函数
# A    2
# B    2
# dtype: int64

474、pandas.DataFrame.aggregate方法

474-1、语法

# 474、pandas.DataFrame.aggregate方法
pandas.DataFrame.aggregate(func=None, axis=0, *args, **kwargs)
Aggregate using one or more operations over the specified axis.

Parameters:
funcfunction, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

function

string function name

list of functions and/or function names, e.g. [np.sum, 'mean']

dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
scalar, Series or DataFrame
The return can be:

scalar : when Series.agg is called with single function

Series : when DataFrame.agg is called with a single function

DataFrame : when DataFrame.agg is called with several functions

474-2、参数

474-2-1、func(可选，默认值为None)：单个函数、可调用对象、函数的列表、字典等，要应用于数据的聚合函数，可以是字符串(如'sum', 'mean', 'max', 'min'等)，也可以是自定义函数，可以传递一个函数列表以应用多个聚合函数。

474-2-2、axis(可选，默认值为0)：{0或'index', 1或'columns'}，选择聚合的方向，0表示对列进行聚合(即按行聚合)，1表示对行进行聚合(即按列聚合)。

474-2-3、*args(可选)：元组，传递给聚合函数的额外位置参数。

474-2-4、**kwargs(可选)：传递给聚合函数的额外关键字参数。

474-3、功能

与agg方法的功能是相同的，主要用于对DataFrame的数据进行聚合计算，它允许用户通过一个或多个聚合函数来汇总数据，以便进行更深入的分析。

474-4、返回值

返回一个Series或DataFrame，具体取决于所应用的函数以及输入的DataFrame的结构。

474-5、说明

无

474-6、用法

474-6-1、数据准备

无

474-6-2、代码示例

# 474、pandas.DataFrame.agg方法
# 474-1、单个聚合函数
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 对每一列应用sum聚合函数
result = df.aggregate('sum')
print(result)

# 474-2、多个聚合函数
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 对每一列应用多个聚合函数
result = df.aggregate(['sum', 'mean'])
print(result)

# 474-3、按行聚合
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 对每一行应用聚合函数
result = df.aggregate('sum', axis=1)
print(result)

# 474-4、应用自定义函数
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 自定义聚合函数
def custom_func(x):
    return x.max() - x.min()
# 应用自定义函数
result = df.aggregate(custom_func)
print(result)

# 474-5、使用字典指定不同聚合函数
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 使用字典对每一列指定不同的聚合函数
result = df.aggregate({
    'A': 'sum',
    'B': 'mean'
})
print(result)

474-6-3、结果输出

# 474、pandas.DataFrame.agg方法
# 474-1、单个聚合函数
# A     6
# B    15
# dtype: int64

# 474-2、多个聚合函数
#         A     B
# sum   6.0  15.0
# mean  2.0   5.0

# 474-3、按行聚合
# 0    5
# 1    7
# 2    9
# dtype: int64

# 474-4、应用自定义函数
# A    2
# B    2
# dtype: int64

# 474-5、使用字典指定不同聚合函数
# A    6.0
# B    5.0
# dtype: float64

475、pandas.DataFrame.transform方法

475-1、语法

# 475、pandas.DataFrame.transform方法
pandas.DataFrame.transform(func, axis=0, *args, **kwargs)
Call func on self producing a DataFrame with the same axis shape as self.

Parameters:
funcfunction, str, list-like or dict-like
Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.

Accepted combinations are:

function

string function name

list-like of functions and/or function names, e.g. [np.exp, 'sqrt']

dict-like of axis labels -> functions, function names or list-like of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
DataFrame
A DataFrame that must have the same length as self.

Raises:
ValueError
If the returned DataFrame has a different length than self.

475-2、参数

475-2-1、func(必须)：函数、字符串或可调用对象，应用于每一行或每一列的转换函数，可以是内置函数(如'mean'、'std')或用户自定义的函数。

475-2-2、axis(可选，默认值为0)：{0或'index', 1或'columns'}，指定转换的方向，0或'index'表示对列进行转换(逐列处理)，1或'columns'表示对行进行转换(逐行处理)。

475-2-3、*args(可选)：元组，传递给转换函数的可选位置参数。

475-2-4、**kwargs(可选)：传递给转换函数的可选关键字参数。

475-3、功能

用于对DataFrame中的数据进行转换操作，能够返回与原始DataFrame结构相同的形状，在数据转换过程中，可以应用指定的函数进行逐列或逐行处理，这在数据预处理和特征工程中非常有用，以保证数据的形状不变。

475-4、返回值

返回一个与原始DataFrame结构相同的DataFrame或Series，包含通过func应用后的转换结果。

475-5、说明

无

475-6、用法

475-6-1、数据准备

无

475-6-2、代码示例

# 475、pandas.DataFrame.transform方法
# 475-1、使用内置函数进行转换
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
# 对每一列应用标准化处理
result = df.transform(lambda x: (x - x.mean()) / x.std())
print(result)

# 475-2、指定转换函数
# 使用NumPy的平方根函数进行转换
import numpy as np
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
result = df.transform(np.sqrt)
print(result)

475-6-3、结果输出

# 475、pandas.DataFrame.transform方法
# 475-1、使用内置函数进行转换
#      A    B
# 0 -1.0 -1.0
# 1  0.0  0.0
# 2  1.0  1.0

# 475-2、指定转换函数
#           A         B
# 0  1.000000  2.000000
# 1  1.414214  2.236068
# 2  1.732051  2.449490