同时遍历多个dataframe_业界使用最多的Python中Dataframe的重塑变形

最新推荐文章于 2024-02-25 17:26:34 发布

weixin_39684454

最新推荐文章于 2024-02-25 17:26:34 发布

阅读量740

点赞数

文章标签：同时遍历多个dataframe

本文链接：https://blog.csdn.net/weixin_39684454/article/details/112264484

版权

本文介绍了如何在Python中使用DataFrame进行数据重塑变形，重点讲解了pivot、pivot_table、stack和unstack等方法的使用。通过实例展示了如何通过这些方法对数据进行统计分析和维度转换，帮助读者掌握数据处理技巧。

摘要由CSDN通过智能技术生成

喜欢的同学记得点赞、转发、收藏哦～

欢迎关注公众号：计算广告生态 即时查看更多经典博文

目录设定

pivot
pivot_table
stack/unstack

pivot

pivot函数用于从给定的表中创建出新的派生表

pivot有三个参数:

索引
列
值

def pivot_simple(index, columns, values):    """    index : ndarray        Labels to use to make new frame's index    columns : ndarray        Labels to use to make new frame's columns    values : ndarray        Values to use for populating new frame's values

pivot函数将创建一个新表，其行和列索引是相应参数的唯一值读取数据：

from collections import OrderedDictfrom pandas import DataFrameimport pandas as pdimport numpy as np data = OrderedDict((    ("item", ['Item1', 'Item1', 'Item2', 'Item2']),    ('color', ['red', 'blue', 'red', 'black']),    ('user',  ['1', '2', '3', '4']),    ('bm',   ['1', '2', '3', '4'])))data = DataFrame(data)<=============================================    item  color user bm0  Item1    red    1  11  Item1   blue    2  22  Item2    red    3  33  Item2  black    4  4

利用pivot进行变形：

df = data.pivot(index='item', columns='color', values='user')print(df)<==============================================color black  blue reditem                 Item1  None     2   1Item2     4  None   3

将上述数据中的bm也包含进来

df = data.pivot(index='item', columns='color')print(df)<==============================================color black  blue red black  blue reditem                                 Item1  None     2   1  None     2   1Item2     4  None   3     4  None   3

pivot_table

先看如下例子，使用pivot实现

data = OrderedDict((    ("item", ['Item1', 'Item1', 'Item1', 'Item2']),    ('color', ['red', 'blue', 'red', 'black']),    ('user',  ['1', '2', '3', '4']),    ('bm',   ['1', '2', '3', '4'])))data = DataFrame(data) df = data.pivot(index='item', columns='color', values='user')<==============================================ValueError: Index contains duplicate entries, cannot reshape

可以看到，现在index和columns对应的位置有不同的值。因此，必须确保我们指定的列和行没有重复的数据，才可以用pivot函数

pivot_table方法实现了类似pivot方法的功能它可以在指定的列和行有重复的情况下使用我们可以使用均值、中值或其他的聚合函数来计算重复条目中的单个值

先看pivot_table这个方法

pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',fill_value=None, margins=False, dropna=True,margins_name='All')

data = OrderedDict((    ("item", ['Item1', 'Item1', 'Item1', 'Item2']),    ('color', ['red', 'blue', 'red', 'black']),    ('user',  ['1', '2', '3', '4']),    ('bm',   ['1', '2', '3', '4'])))data = DataFrame(data) df = data.pivot_table(index='item', columns='color', values='user', aggfunc=np.min)print(df)<==============================================color black  blue   reditem                   Item1  None     2     1Item2     4  None  None

pivot_table()是pivot()的泛化，它允许在数据集中聚合具有相同目标的多个值。

对于不用的列使用通的统计方法

使用字典来实现

df_nodmp5.pivot_table(index="ad_network_name",values=["mt_income","impression"], aggfunc={"mt_income":[np.sum],"impression":[np.sum]})

stack/unstack

事实上，变换一个表只是堆叠DataFrame的一种特殊情况

假设我们有一个在行列上有多个索引的DataFrame。堆叠DataFrame意味着移动最里面的列索引成为最里面的行索引，反向操作称之为取消堆叠，意味着将最里面的行索引移动为最里面的列索引。

from pandas import DataFrameimport pandas as pdimport numpy as np # 建立多个行索引row_idx_arr = list(zip(['r0', 'r0'], ['r-00', 'r-01']))row_idx = pd.MultiIndex.from_tuples(row_idx_arr)print "行索引:"print row_idx# 建立多个列索引col_idx_arr = list(zip(['c0', 'c0', 'c1'], ['c-00', 'c-01', 'c-10']))col_idx = pd.MultiIndex.from_tuples(col_idx_arr)print "列索引:"print col_idx# 创建DataFramed = DataFrame(np.arange(6).reshape(2,3), index=row_idx, columns=col_idx)d = d.applymap(lambda x: (x // 3, x % 3))print "dataframe:"print d# Stack/Unstacks = d.stack()u = d.unstack()print(s)print(u)<=====================================================## 行索引:MultiIndex(levels=[[u'r0'], [u'r-00', u'r-01']],           labels=[[0, 0], [0, 1]])## 列索引:MultiIndex(levels=[[u'c0', u'c1'], [u'c-00', u'c-01', u'c-10']],           labels=[[0, 0, 1], [0, 1, 2]])## dataframe:             c0              c1           c-00    c-01    c-10r0 r-00  (0, 0)  (0, 1)  (0, 2)   r-01  (1, 0)  (1, 1)  (1, 2)## stack:                  c0      c1r0 r-00 c-00  (0, 0)     NaN        c-01  (0, 1)     NaN        c-10     NaN  (0, 2)   r-01 c-00  (1, 0)     NaN        c-01  (1, 1)     NaN        c-10     NaN  (1, 2)## unstack:        c0                              c1              c-00            c-01            c-10              r-00    r-01    r-00    r-01    r-00    r-01r0  (0, 0)  (1, 0)  (0, 1)  (1, 1)  (0, 2)  (1, 2)

--END--

喜欢本文的同学记得点赞、转发、收藏~

更多内容，欢迎大家关注我们的公众号：计算广告生态

同时也带来更多系列文章以及干货！