Python酷库之旅-第三方库Pandas(109)-CSDN博客

# 476、pandas.DataFrame.groupby方法
pandas.DataFrame.groupby(by=None, axis=_NoDefault.no_default, level=None, as_index=True, sort=True, group_keys=True, observed=_NoDefault.no_default, dropna=True)
Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
bymapping, function, label, pd.Grouper or list of such
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.

levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.

as_indexbool, default True
Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.

group_keysbool, default True
When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not.

Changed in version 2.0.0: group_keys now defaults to True.

observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.

dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:
pandas.api.typing.DataFrameGroupBy
Returns a groupby object that contains information about the groups.

476-2、参数

476-2-1、by(可选，默认值为None)：字符串、字符串列表或函数，用于指定分组的列名、列的列表或者一个函数(可以应用于每一行)，用于定义如何分组。

476-2-2、axis(可选)：{0 or 'index', 1 or 'columns'}，指定用于分组的轴，0表示按行分组，1表示按列分组。

476-2-3、level(可选，默认值为None)：整数、字符串或列表，如果DataFrame是多层索引(MultiIndex)，则可以指定用于分组的级别。

476-2-4、as_index(可选，默认值为True)：布尔值，指定是否将分组键作为结果的索引，如果为True(默认值)，则分组键将成为结果的索引；如果为False，分组键将作为列返回。

476-2-5、sort(可选，默认值为True)：布尔值，指定是否对分组后的结果进行排序，默认值为True，表示按分组键的顺序排序。

476-2-6、group_keys(可选，默认值为True)：布尔值，控制分组键是否包含在返回的结果中，如果为True(默认值)，分组键将包含在结果中；如果为False，分组键将不会出现在结果中。

476-2-7、observed(可选)：布尔值，仅在分类数据(Categorical)中使用，指定是否只包含观察到的分类值，如果为True，则仅返回实际存在的分类值。

476-2-8、dropna(可选，默认值为True)：布尔值，指定是否在分组时忽略缺失值，默认值为True，表示会忽略缺失值；如果为False，则会包括缺失值作为一个组。

476-3、功能

将DataFrame分为多个组并对每组进行操作(如聚合、转换或过滤)，通过分组，可以方便地对数据进行分析，比如计算每个组的平均值、总和、计数等。

476-4、返回值

返回一个DataFrameGroupBy对象，它是一个可用于后续聚合、转换或过滤操作的对象，具体的返回结果会根据后续操作而异，例如：

使用.mean()返回每个组的均值DataFrame。
使用.sum()将返回每个组的总和DataFrame。
使用.size()将返回每个组的大小(即元素个数)。

476-5、说明

无

476-6、用法

476-6-1、数据准备

无

476-6-2、代码示例

# 476、pandas.DataFrame.groupby方法
import pandas as pd
# 创建示例DataFrame
data = {
    'A': ['foo', 'bar', 'foo', 'bar'],
    'B': [1, 2, 3, 4],
    'C': [5, 6, 7, 8]
}
df = pd.DataFrame(data)
# 按照列'A'分组，并计算每组'B'、'C'列的总和
result = df.groupby(by='A').sum()
print(result)

476-6-3、结果输出

# 476、pandas.DataFrame.groupby方法
#      B   C
# A
# bar  6  14
# foo  4  12

477、pandas.DataFrame.rolling方法

477-1、语法

# 477、pandas.DataFrame.rolling方法
pandas.DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=_NoDefault.no_default, closed=None, step=None, method='single')
Provide rolling window calculations.

Parameters:
windowint, timedelta, str, offset, or BaseIndexer subclass
Size of the moving window.

If an integer, the fixed number of observations used for each window.

If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link.

If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, closed and step will be passed to get_window_bounds.

min_periodsint, default None
Minimum number of observations in window required to have a value; otherwise, result is np.nan.

For a window that is specified by an offset, min_periods will default to 1.

For a window that is specified by an integer, min_periods will default to the size of the window.

centerbool, default False
If False, set the window labels as the right edge of the window index.

If True, set the window labels as the center of the window index.

win_typestr, default None
If None, all points are evenly weighted.

If a string, it must be a valid scipy.signal window function.

Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature.

onstr, optional
For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame’s index.

Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

axisint or str, default 0
If 0 or 'index', roll across the rows.

If 1 or 'columns', roll across the columns.

For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: The axis keyword is deprecated. For axis=1, transpose the DataFrame first instead.

closedstr, default None
If 'right', the first point in the window is excluded from calculations.

If 'left', the last point in the window is excluded from calculations.

If 'both', the no points in the window are excluded from calculations.

If 'neither', the first and last points in the window are excluded from calculations.

Default None ('right').

stepint, default None
New in version 1.5.0.

Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input.

methodstr {‘single’, ‘table’}, default ‘single’
New in version 1.3.0.

Execute the rolling operation per single column or row ('single') or over the entire object ('table').

This argument is only implemented when specifying engine='numba' in the method call.

Returns:
pandas.api.typing.Window or pandas.api.typing.Rolling
An instance of Window is returned if win_type is passed. Otherwise, an instance of Rolling is returned.

477-2、参数

477-2-1、window(必须)：整数或偏移量，指定窗口的大小，可以是一个整数(表示样本数量)或偏移量(表示时间窗口)，窗口大小会影响聚合的计算方式

477-2-2、min_periods(可选，默认值为None)：整数，指定窗口内必须至少有多少个非NA值，函数才会计算结果，默认值为None，即窗口内可以有缺失值，若有可用的数据，则会返回结果。

477-2-3、center(可选，默认值为False)：布尔值，指定窗口是否居中，如果为True，窗口会在当前点的左右各延伸half window size；如果为False，窗口将以当前点为右端点。

477-2-4、win_type(可选，默认值为None)：指定窗口类型(如"boxcar", "triang", "blackman", etc.)，用于加权计算。

477-2-5、on(可选，默认值为None)：字符串，指定用于滚动窗口的列名，如果不指定，默认使用索引，适用于使用时间序列数据的情况，可以指定用于滚动的时间列。

477-2-6、axis(可选)：{0 or 'index', 1 or 'columns'}，指定沿哪个轴进行滚动，0表示沿行进行滚动(默认)，1表示沿列进行滚动。

477-2-7、closed(可选，默认值为None)：{'right', 'left', 'both', 'neither'}，指定窗口的闭合方式，即计算窗口时包括的边界，默认情况下是'right'，表示包含右边界。

477-2-8、step(可选，默认值为None)：整数，指定滑动窗口的间隔步长，如果指定，窗口将不再对每个点计算，而是每隔step个点计算一次。

477-2-9、method(可选，默认值为'single')：{'single', 'table'}，指定计算方式，'single'为逐个计算，'table'为使用完整表格进行计算(适用于某些聚合函数)。

477-3、功能

为数据提供滑动窗口的视图，使得可以在指定大小的窗口内进行各种计算，比如计算均值、标准差、总和等，滑动窗口分析在时间序列数据分析中尤其常用，可以帮助识别趋势和周期性。

477-4、返回值

返回一个Rolling对象，它是一个可用于后续聚合、转换或过滤操作的对象，使用该对象可以调用其他聚合或计算函数，如.mean()、.sum()、.std()等。

477-5、说明

无

477-6、用法

477-6-1、数据准备

无

477-6-2、代码示例

# 477、pandas.DataFrame.rolling方法
import pandas as pd
# 创建示例DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1]
}
df = pd.DataFrame(data)
# 计算窗口大小为3的滚动均值
rolling_mean = df.rolling(window=3).mean()
print(rolling_mean)

477-6-3、结果输出

# 477、pandas.DataFrame.rolling方法
#      A    B
# 0  NaN  NaN
# 1  NaN  NaN
# 2  2.0  4.0
# 3  3.0  3.0
# 4  4.0  2.0

478、pandas.DataFrame.expanding方法

478-1、语法

# 478、pandas.DataFrame.expanding方法
pandas.DataFrame.expanding(min_periods=1, axis=_NoDefault.no_default, method='single')
Provide expanding window calculations.

Parameters:
min_periodsint, default 1
Minimum number of observations in window required to have a value; otherwise, result is np.nan.

axisint or str, default 0
If 0 or 'index', roll across the rows.

If 1 or 'columns', roll across the columns.

For Series this parameter is unused and defaults to 0.

methodstr {‘single’, ‘table’}, default ‘single’
Execute the rolling operation per single column or row ('single') or over the entire object ('table').

This argument is only implemented when specifying engine='numba' in the method call.

New in version 1.3.0.

Returns:
pandas.api.typing.Expanding.

478-2、参数

478-2-1、min_periods(可选，默认值为1)：整数，指定在计算结果时，要求的最小的非NA值的数量，如果窗口内非NA的值少于min_periods，则结果将为NA。

478-2-2、axis(可选)：{0 or 'index', 1 or 'columns'}，指定沿哪个轴进行扩展，0表示沿行进行扩展(默认)，1表示沿列进行扩展。

478-2-3、method(可选，默认值为'single')：{'single', 'table'}，指定计算方法，'single'为每个值单独计算，'table'为使用完整表格进行计算(适用于某些聚合函数)。

478-3、功能

进行累积计算，使得可以方便地联合多个数据点的值，通过扩展窗口，可以实现诸如累计和、累计平均数、最大值等计算，特别适合用于时间序列分析或按顺序累积的数据分析。

478-4、返回值

返回一个Expanding对象，用户可调用该对象其他聚合或计算函数，如.sum()、.mean()、.max()、.min()等。

478-5、说明

无

478-6、用法

478-6-1、数据准备

无

478-6-2、代码示例

# 478、pandas.DataFrame.expanding方法
import pandas as pd
# 创建示例DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1]
}
df = pd.DataFrame(data)
# 计算扩展窗口的累计和
cumulative_sum = df.expanding().sum()
print(cumulative_sum)

478-6-3、结果输出

# 478、pandas.DataFrame.expanding方法
#       A     B
# 0   1.0   5.0
# 1   3.0   9.0
# 2   6.0  12.0
# 3  10.0  14.0
# 4  15.0  15.0

479、pandas.DataFrame.ewm方法

479-1、语法

# 479、pandas.DataFrame.ewm方法
Pandas.DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=_NoDefault.no_default, times=None, method='single')
Provide exponentially weighted (EW) calculations.
Exactly one of com, span, halflife, or alpha must be provided if times is not provided. If times is provided, halflife and one of com, span or alpha may be provided.
Parameters:
comfloat, optional
Specify decay in terms of center of mass
а=1/(1+com), for com≥0.
spanfloat, optional
Specify decay in terms of span
а=2/(span+1), for span≥1.
halflifefloat, str, timedelta, optional
Specify decay in terms of half-life
а=1−exp⁡(−ln⁡(2)/ℎalflife), for ℎalflife>0.
If times is specified, a timedelta convertible unit over which an observation decays to half its value. Only applicable to mean(), and halflife value will not apply to the other functions.
alphafloat, optional
Specify smoothing factor а directly
0<а≤1.
min_periodsint, default 0
Minimum number of observations in window required to have a value; otherwise, result is np.nan.
adjustbool, default True
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average).
ignore_nabool, default False
Ignore missing values when calculating weights.
axis{0, 1}, default 0
If 0 or 'index', calculate across the rows.
If 1 or 'columns', calculate across the columns.
For Series this parameter is unused and defaults to 0.
timesnp.ndarray, Series, default None
Only applicable to mean().
Times corresponding to the observations. Must be monotonically increasing and datetime64[ns] dtype.
If 1-D array like, a sequence with the same shape as the observations.
methodstr {‘single’, ‘table’}, default ‘single’
New in version 1.4.0.
Execute the rolling operation per single column or row ('single') or over the entire object ('table').
This argument is only implemented when specifying engine='numba' in the method call.
Only applicable to mean()
Returns:
pandas.api.typing.ExponentialMovingWindow.

479-2、参数

479-2-1、com(可选，默认值为None)：浮点数，指定平滑常数的倒数，即com=(1/alpha)-1，可以通过com来控制窗口的大小。

479-2-2、span(可选，默认值为None)：浮点数，表示加权窗口的跨度，span与alpha之间的关系为alpha = 2/(span+1)。

479-2-3、halflife(可选，默认值为None)：浮点数，指定半衰期，表示新值对结果的影响减少到一半所需的时间长度。

479-2-4、alpha(可选，默认值为None)：浮点数，指定加权因子，控制观察值的加权行为，alpha的值在(0, 1)之间，越接近1，越重视最近的观测值。

479-2-5、min_periods(可选，默认值为0)：整数，指定在计算结果时，要求的最小的有效观察值数量，如果有效值数量少于min_periods，结果将为NA。

479-2-6、adjust(可选，默认值为True)：布尔值，指定是否对返回结果进行调整，如果为True，将会使用调整后的权重；如果为False，则使用未调整的权重。

479-2-7、ignore_na(可选，默认值为False)：布尔值，指定是否在计算时忽略NA值，如果为True，将忽略NA值；如果为False，则NA值会影响计算。

479-2-8、axis(可选)：{0 or 'index', 1 or 'columns'}，指定沿哪个轴进行加权计算，0表示沿行计算，1表示沿列计算。

479-2-9、times(可选，默认值为None)：array-like，仅适用于时间序列，用于指定时间戳，如果提供，将按照指定的时间戳进行加权计算。

479-2-10、method(可选，默认值为'single')：{'single', 'table'}，指定计算方法，'single'表示对每个值单独计算，'table'使用完整表格进行计算。

479-3、功能

创建一个指数加权的窗口对象，使得可以使用此对象进行如加权平均、加权总和等操作，它的主要应用场景包括：

平滑时间序列数据
计算移动平均
分析金融时间序列数据

479-4、返回值

返回一个ExponentialMovingWindow对象，用户可以调用该对象的各种聚合方法，如.mean()、.sum()、.var()、.std()等。

479-5、说明

无

479-6、用法

479-6-1、数据准备

无

479-6-2、代码示例

# 479、pandas.DataFrame.ewm方法
import pandas as pd
# 创建示例DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1]
}
df = pd.DataFrame(data)
# 计算指数加权平均
ewm_mean = df.ewm(span=3).mean()
print(ewm_mean)

479-6-3、结果输出

# 479、pandas.DataFrame.ewm方法
#           A         B
# 0  1.000000  5.000000
# 1  1.666667  4.333333
# 2  2.428571  3.571429
# 3  3.266667  2.733333
# 4  4.161290  1.838710

480、pandas.DataFrame.abs方法

480-1、语法

# 480、pandas.DataFrame.abs方法
pandas.DataFrame.abs()
Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns:
abs
Series/DataFrame containing the absolute value of each element.

480-2、参数

无

480-3、功能

返回一个新的DataFrame，其中每个元素都是原始DataFrame中对应元素的绝对值，负值将变为正值，正值保持不变。

480-4、返回值

返回一个新DataFrame，包含了原始DataFrame每个元素的绝对值。

480-5、说明

无

480-6、用法

480-6-1、数据准备

无

480-6-2、代码示例

# 480、pandas.DataFrame.abs方法
import pandas as pd
# 创建示例DataFrame
data = {
    'A': [-1, -2, -3, 4, 5],
    'B': [5, -4, 3, -2, -1]
}
df = pd.DataFrame(data)
# 计算绝对值
abs_df = df.abs()
print(abs_df)

480-6-3、结果输出

# 480、pandas.DataFrame.abs方法
#    A  B
# 0  1  5
# 1  2  4
# 2  3  3
# 3  4  2
# 4  5  1