Python数据处理小技巧：pivot_table后如何拍平columns

最新推荐文章于 2024-06-22 15:42:30 发布

才能我浪费

最新推荐文章于 2024-06-22 15:42:30 发布

阅读量6.4k

点赞数 4

分类专栏： Python

本文链接：https://blog.csdn.net/hawkman/article/details/103941009

版权

Python 专栏收录该内容

5 篇文章 2 订阅

订阅专栏

机器学习的过程中很多时候需要用到类似透视表的功能。Pandas提供了pivot和pivot_table实现透视表功能。相对比而言，pivot_table更加强大，在实现透视表的时候可以进行聚类等操作。

pivot_table帮助地址：
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

官方给的几个例子：

>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two",
...                          "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small",
...                          "small", "large", "small", "small",
...                          "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df
     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

This first example aggregates values by taking the sum.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum)
>>> table
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

We can also fill missing values using the fill_value parameter.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum, fill_value=0)
>>> table
C        large  small
A   B
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6

The next example aggregates by taking the mean across multiple columns.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': np.mean})
>>> table
                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333

We can also calculate multiple types of aggregations for any given value column.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': [min, max, np.mean]})
>>> table
                D    E
            mean  max      mean  min
A   C
bar large  5.500000  9.0  7.500000  6.0
    small  5.500000  9.0  8.500000  8.0
foo large  2.000000  5.0  4.500000  4.0
    small  2.333333  6.0  4.333333  2.0

现在的一个问题是，处理后的dataframe的columns是多层的，例如最后一个例子的columns是这个样子的:

table.columns：

MultiIndex(levels=[['D', 'E'], ['max', 'mean', 'min']],
           labels=[[0, 1, 1, 1], [1, 0, 1, 2]])

为了后续的运算，我们经常希望它能简化，便于处理。也就是说吧columns拍平。大家可以这么处理：

table.columns =[s1 +'_'+ str(s2) for (s1,s2) in table.columns.tolist()]
table.reset_index(inplace=True)

效果如下：

table.columns

Index(['A', 'C', 'D_mean', 'E_max', 'E_mean', 'E_min'], dtype='object')

整个案例效果：

才能我浪费

关注

4
点赞
踩
21

收藏

觉得还不错? 一键收藏
2
评论
Python数据处理小技巧：pivot_table后如何拍平columns

机器学习的过程中很多时候需要用到类似透视表的功能。Pandas提供了pivot和pivot_table实现透视表功能。相对比而言，pivot_table更加强大，在实现透视表的时候可以进行聚类等操作。pivot_table帮助地址：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.htm...
复制链接

扫一扫

专栏目录