Pandas GroupBy对象

最新推荐文章于 2024-06-26 22:04:40 发布

Claroja

最新推荐文章于 2024-06-26 22:04:40 发布

阅读量1.5w

点赞数 5

分类专栏： pandas 文章标签：对象 as pd

本文链接：https://blog.csdn.net/claroja/article/details/71080293

版权

pandas 专栏收录该内容

82 篇文章 6 订阅

订阅专栏

创建GroupBy对象

GroupBy对象可以通过pandas.DataFrame.groupby(), pandas.Series.groupby()来创建。

GroupBy = DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)[source]

参数	描述
by	mapping, function, str, or iterable
axis	int, default 0
level	int, level name, or sequence of such, default None(复合索引的时候指定索引层级)
as_index	boolean, default True(by列当成索引)
sort	boolean, default True(排序)
group_keys	boolean, default True(?)
squeeze	boolean, default False(?)

参数说明:
1.as_index
默认会把groupby的key当成索引,不太符合sql的习惯,可以设置为False

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'A' : [1, 1, 2, 2,1, 2, 2, 2],
    'B' : [15,14,15,12,13,14,15,16]})
df.groupby("A").sum()
out:
    B
A    
1  42
2  72

df.groupby("A", as_index=False).sum()
df.groupby("A").sum().reset_index() # 和as_index=False等效
out:
   A   B
0  1  42
1  2  72

索引与迭代

groupby是一个迭代对象,每个元素是分组后的小数据框.

属性	描述
GroupBy.iter()	Groupby iterator
GroupBy.groups	dict {group name -> group labels}
GroupBy.indices	dict {group name -> group indices}
GroupBy.get_group(name[, obj])	Constructs NDFrame from group with provided name
Grouper([key, level, freq, axis, sort])	A Grouper allows the user to specify a groupby instruction for a target

函数应用（Function application）

GroupBy.apply(func, *args, **kwargs)apply函数是对迭代对象每个小数据框进行作用,可以调用dataframe的所有方法
GroupBy.aggregate(func, *args, **kwargs)聚合函数可以传入np.sum或者"sum"等聚合参数,在描述统计中的函数,其实都是在调用agg(简写形式)函数
GroupBy.transform(func, *args, **kwargs)
filter

描述统计

##数据框（DataFrame）与序列（Series）通用函数

Function	Describe
统计函数
GroupBy.sum()	计算每组的和
GroupBy.ohlc()	Compute sum of values, excluding missing values
GroupBy.cumcount([ascending])	Number each item in each group from 0 to the length of that group - 1.
GroupBy.mean(args, *kwargs)	均值，不包含缺失值
GroupBy.prod()	Compute prod of group values
GroupBy.var([ddof])	方差，不包含缺失值
GroupBy.std([ddof])	标准差，不包含缺失值
GroupBy.sem([ddof])	标准误，不包含缺失值
GroupBy.size()	组大小
GroupBy.count()	组元素个数，不包含缺失值
GroupBy.max()	组最大值
GroupBy.min()	组最小值
GroupBy.median()	组中间值
索引函数
GroupBy.first()	Compute first of group values
GroupBy.head([n])	Returns first n rows of each group.
GroupBy.last()	Compute last of group values
GroupBy.tail([n])	Returns last n rows of each group
GroupBy.nth(n[, dropna])	每组第n条数据

数据框（DataFrame）与序列（Series）不一致函数

Function	Describe
DataFrameGroupBy.agg(arg,?args,?*kwargs)	Aggregate using input function or dict of {column ->
DataFrameGroupBy.all([axis,?bool_only,?..])	Return whether all elements are True over requested axis
DataFrameGroupBy.any([axis,?bool_only,?..])	Return whether any element is True over requested axis
DataFrameGroupBy.bfill([limit])	Backward fill the values
DataFrameGroupBy.corr([method,?min_periods])	Compute pairwise correlation of columns, excluding NA/null values
DataFrameGroupBy.count()	Compute count of group, excluding missing values
DataFrameGroupBy.cov([min_periods])	Compute pairwise covariance of columns, excluding NA/null values
DataFrameGroupBy.cummax([axis,?skipna])	Return cumulative max over requested axis.
DataFrameGroupBy.cummin([axis,?skipna])	Return cumulative minimum over requested axis.
DataFrameGroupBy.cumprod([axis])	Cumulative product for each group
DataFrameGroupBy.cumsum([axis])	Cumulative sum for each group
DataFrameGroupBy.describe([percentiles,?..])	Generate various summary statistics, excluding NaN values.
DataFrameGroupBy.diff([periods,?axis])	1st discrete difference of object
DataFrameGroupBy.ffill([limit])	Forward fill the values
DataFrameGroupBy.fillna([value,?method,?..])	Fill NA/NaN values using the specified method
DataFrameGroupBy.hist(data[,?column,?by,?..])	Draw histogram of the DataFrame’s series using matplotlib / pylab.
DataFrameGroupBy.idxmax([axis,?skipna])	Return index of first occurrence of maximum over requested axis.
DataFrameGroupBy.idxmin([axis,?skipna])	Return index of first occurrence of minimum over requested axis.
DataFrameGroupBy.mad([axis,?skipna,?level])	Return the mean absolute deviation of the values for the requested axis
DataFrameGroupBy.pct_change([periods,?..])	Percent change over given number of periods.
DataFrameGroupBy.plot	Class implementing the .plot attribute for groupby objects
DataFrameGroupBy.quantile([q,?axis,?..])	Return values at the given quantile over requested axis, a la numpy.percentile.
DataFrameGroupBy.rank([axis,?method,?..])	Compute numerical data ranks (1 through n) along axis.
DataFrameGroupBy.resample(rule,?args,?*kwargs)	Provide resampling when using a TimeGrouper
DataFrameGroupBy.shift([periods,?freq,?axis])	Shift each group by periods observations
DataFrameGroupBy.size()	Compute group sizes
DataFrameGroupBy.skew([axis,?skipna,?level,?..])	Return unbiased skew over requested axis
DataFrameGroupBy.take(indices[,?axis,?..])	Analogous to ndarray.take
DataFrameGroupBy.tshift([periods,?freq,?axis])	Shift the time index, using the index’s frequency if available.

仅支持序列（Series）的函数

Function	Describe
SeriesGroupBy.nlargest(args,?*kwargs)	Return the largest?n?elements.
SeriesGroupBy.nsmallest(args,?*kwargs)	Return the smallest?n?elements.
SeriesGroupBy.nunique([dropna])	Returns number of unique elements in the group
SeriesGroupBy.unique()	Return np.ndarray of unique values in the object.
SeriesGroupBy.value_counts([normalize,?..])

仅支持数据框（DataFrame）的函数

Function	Describe
DataFrameGroupBy.corrwith(other[,?axis,?drop])	Compute pairwise correlation between rows or columns of two DataFrame objects.
DataFrameGroupBy.boxplot(grouped[,?..])	Make box plots from DataFrameGroupBy data.