Pandas GroupBy对象

创建GroupBy对象

GroupBy对象可以通过pandas.DataFrame.groupby(), pandas.Series.groupby()来创建。

GroupBy = DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)[source]
参数描述
bymapping, function, str, or iterable
axisint, default 0
levelint, level name, or sequence of such, default None(复合索引的时候指定索引层级)
as_indexboolean, default True(by列当成索引)
sortboolean, default True(排序)
group_keysboolean, default True(?)
squeezeboolean, default False(?)

参数说明:
1.as_index
默认会把groupby的key当成索引,不太符合sql的习惯,可以设置为False

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'A' : [1, 1, 2, 2,1, 2, 2, 2],
    'B' : [15,14,15,12,13,14,15,16]})
df.groupby("A").sum()
out:
    B
A    
1  42
2  72

df.groupby("A", as_index=False).sum()
df.groupby("A").sum().reset_index() # 和as_index=False等效
out:
   A   B
0  1  42
1  2  72

索引与迭代

groupby是一个迭代对象,每个元素是分组后的小数据框.

属性描述
GroupBy.iter()Groupby iterator
GroupBy.groupsdict {group name -> group labels}
GroupBy.indicesdict {group name -> group indices}
GroupBy.get_group(name[, obj])Constructs NDFrame from group with provided name
Grouper([key, level, freq, axis, sort])A Grouper allows the user to specify a groupby instruction for a target

函数应用(Function application)

  1. GroupBy.apply(func, *args, **kwargs)apply函数是对迭代对象每个小数据框进行作用,可以调用dataframe的所有方法
  2. GroupBy.aggregate(func, *args, **kwargs)聚合函数可以传入np.sum或者"sum"等聚合参数,在描述统计中的函数,其实都是在调用agg(简写形式)函数
  3. GroupBy.transform(func, *args, **kwargs)
  4. filter

描述统计

##数据框(DataFrame)与序列(Series)通用函数

FunctionDescribe
统计函数
GroupBy.sum()计算每组的和
GroupBy.ohlc()Compute sum of values, excluding missing values
GroupBy.cumcount([ascending])Number each item in each group from 0 to the length of that group - 1.
GroupBy.mean(*args, **kwargs)均值,不包含缺失值
GroupBy.prod()Compute prod of group values
GroupBy.var([ddof])方差,不包含缺失值
GroupBy.std([ddof])标准差,不包含缺失值
GroupBy.sem([ddof])标准误,不包含缺失值
GroupBy.size()组大小
GroupBy.count()组元素个数,不包含缺失值
GroupBy.max()组最大值
GroupBy.min()组最小值
GroupBy.median()组中间值
索引函数
GroupBy.first()Compute first of group values
GroupBy.head([n])Returns first n rows of each group.
GroupBy.last()Compute last of group values
GroupBy.tail([n])Returns last n rows of each group
GroupBy.nth(n[, dropna])每组第n条数据

数据框(DataFrame)与序列(Series)不一致函数

FunctionDescribe
DataFrameGroupBy.agg(arg,?*args,?**kwargs)Aggregate using input function or dict of {column ->
DataFrameGroupBy.all([axis,?bool_only,?..])Return whether all elements are True over requested axis
DataFrameGroupBy.any([axis,?bool_only,?..])Return whether any element is True over requested axis
DataFrameGroupBy.bfill([limit])Backward fill the values
DataFrameGroupBy.corr([method,?min_periods])Compute pairwise correlation of columns, excluding NA/null values
DataFrameGroupBy.count()Compute count of group, excluding missing values
DataFrameGroupBy.cov([min_periods])Compute pairwise covariance of columns, excluding NA/null values
DataFrameGroupBy.cummax([axis,?skipna])Return cumulative max over requested axis.
DataFrameGroupBy.cummin([axis,?skipna])Return cumulative minimum over requested axis.
DataFrameGroupBy.cumprod([axis])Cumulative product for each group
DataFrameGroupBy.cumsum([axis])Cumulative sum for each group
DataFrameGroupBy.describe([percentiles,?..])Generate various summary statistics, excluding NaN values.
DataFrameGroupBy.diff([periods,?axis])1st discrete difference of object
DataFrameGroupBy.ffill([limit])Forward fill the values
DataFrameGroupBy.fillna([value,?method,?..])Fill NA/NaN values using the specified method
DataFrameGroupBy.hist(data[,?column,?by,?..])Draw histogram of the DataFrame’s series using matplotlib / pylab.
DataFrameGroupBy.idxmax([axis,?skipna])Return index of first occurrence of maximum over requested axis.
DataFrameGroupBy.idxmin([axis,?skipna])Return index of first occurrence of minimum over requested axis.
DataFrameGroupBy.mad([axis,?skipna,?level])Return the mean absolute deviation of the values for the requested axis
DataFrameGroupBy.pct_change([periods,?..])Percent change over given number of periods.
DataFrameGroupBy.plotClass implementing the .plot attribute for groupby objects
DataFrameGroupBy.quantile([q,?axis,?..])Return values at the given quantile over requested axis, a la numpy.percentile.
DataFrameGroupBy.rank([axis,?method,?..])Compute numerical data ranks (1 through n) along axis.
DataFrameGroupBy.resample(rule,?*args,?**kwargs)Provide resampling when using a TimeGrouper
DataFrameGroupBy.shift([periods,?freq,?axis])Shift each group by periods observations
DataFrameGroupBy.size()Compute group sizes
DataFrameGroupBy.skew([axis,?skipna,?level,?..])Return unbiased skew over requested axis
DataFrameGroupBy.take(indices[,?axis,?..])Analogous to ndarray.take
DataFrameGroupBy.tshift([periods,?freq,?axis])Shift the time index, using the index’s frequency if available.

仅支持序列(Series)的函数

FunctionDescribe
SeriesGroupBy.nlargest(*args,?**kwargs)Return the largest?n?elements.
SeriesGroupBy.nsmallest(*args,?**kwargs)Return the smallest?n?elements.
SeriesGroupBy.nunique([dropna])Returns number of unique elements in the group
SeriesGroupBy.unique()Return np.ndarray of unique values in the object.
SeriesGroupBy.value_counts([normalize,?..])

仅支持数据框(DataFrame)的函数

FunctionDescribe
DataFrameGroupBy.corrwith(other[,?axis,?drop])Compute pairwise correlation between rows or columns of two DataFrame objects.
DataFrameGroupBy.boxplot(grouped[,?..])Make box plots from DataFrameGroupBy data.
  • 5
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值