理解Pandas的Transform

最新推荐文章于 2024-06-28 13:46:47 发布

fightingoyo

最新推荐文章于 2024-06-28 13:46:47 发布

阅读量681

点赞数

文章标签： python 数据分析

原文链接：https://www.jianshu.com/p/509d7b97088c

版权

Pandas具有丰富的功能，transform是与groupby（pandas中最有用的操作之一）组合使用的。一般情况下，我们在groupby之后使用aggregate , filter 或 apply来汇总数据，transform可能稍难理解。

aggregation会返回数据的缩减版本，而transformation能返回完整数据的某一变换版本供我们重组。这样的transformation，输出的形状和输入一致。一个常见的例子是通过减去分组平均值来居中数据。

实践

加载数据

import pandas as pd

data = {'account':[383080,383080,383080,412290,412290,412290,412290,412290,218895,218895,218895,218895],
       "name":['Will LLC','Will LLC','Will LLC','Jerde-Hilpert','Jerde-Hilpert','Jerde-Hilpert','Jerde-Hilpert','Jerde-Hilpert','Kulans Inc','Kulans Inc','Kulans Inc','Kulans Inc'],
       "order":[1001,1001,1001,1005,1005,1005,1005,1005,1006,1006,1006,1006],
       'sku':['B1-20000','B1-27722','B1-86481','S1-06532','S1-82801','S1-06532','S1-47412','S1-27722','S1-27722','B1-33087','B1-33364','B1-20000'],
       "quantity":[7,11,3,48,21,9,44,36,32,23,3,-1],
       'unit price':[33.69,21.12,35.99,55.82,13.62,92.55,78.91,25.42,95.66,22.55,72.3,72.18],
       "ext price":[235.83,232.32,107.97,2679.36,286.02,832.95,3472.04,915.12,3061.12,518.65,216.90,-72.18]}

df = pd.DataFrame(data)
df

	account	name	order	sku	quantity	unit price	ext price
0	383080	Will LLC	1001	B1-20000	7	33.69	235.83
1	383080	Will LLC	1001	B1-27722	11	21.12	232.32
2	383080	Will LLC	1001	B1-86481	3	35.99	107.97
3	412290	Jerde-Hilpert	1005	S1-06532	48	55.82	2679.36
4	412290	Jerde-Hilpert	1005	S1-82801	21	13.62	286.02
5	412290	Jerde-Hilpert	1005	S1-06532	9	92.55	832.95
6	412290	Jerde-Hilpert	1005	S1-47412	44	78.91	3472.04
7	412290	Jerde-Hilpert	1005	S1-27722	36	25.42	915.12
8	218895	Kulans Inc	1006	S1-27722	32	95.66	3061.12
9	218895	Kulans Inc	1006	B1-33087	23	22.55	518.65
10	218895	Kulans Inc	1006	B1-33364	3	72.30	216.90
11	218895	Kulans Inc	1006	B1-20000	-1	72.18	-72.18

可以看到数据包含了不同的订单（order），以及订单里的不同商品的数量（quantity）、单价（unit price）和总价（ext price）

我们的任务是为数据表添加一列，表示不同商品在所在订单的价钱占比。

第一种方法实现步骤：

df.groupby(by='order')['ext price'].agg(sum)

order
1001     576.12
1005    8185.49
1006    3724.49
Name: ext price, dtype: float64

df.groupby(by='order')['ext price'].agg(sum).rename('order_total')

order
1001     576.12
1005    8185.49
1006    3724.49
Name: order_total, dtype: float64

order_total = df.groupby(by='order')['ext price'].agg(sum).rename('order_total').reset_index()
order_total

	order	order_total
0	1001	576.12
1	1005	8185.49
2	1006	3724.49

df_1 = df.merge(order_total)
df_1

	account	name	order	sku	quantity	unit price	ext price	order_total
0	383080	Will LLC	1001	B1-20000	7	33.69	235.83	576.12
1	383080	Will LLC	1001	B1-27722	11	21.12	232.32	576.12
2	383080	Will LLC	1001	B1-86481	3	35.99	107.97	576.12
3	412290	Jerde-Hilpert	1005	S1-06532	48	55.82	2679.36	8185.49
4	412290	Jerde-Hilpert	1005	S1-82801	21	13.62	286.02	8185.49
5	412290	Jerde-Hilpert	1005	S1-06532	9	92.55	832.95	8185.49
6	412290	Jerde-Hilpert	1005	S1-47412	44	78.91	3472.04	8185.49
7	412290	Jerde-Hilpert	1005	S1-27722	36	25.42	915.12	8185.49
8	218895	Kulans Inc	1006	S1-27722	32	95.66	3061.12	3724.49
9	218895	Kulans Inc	1006	B1-33087	23	22.55	518.65	3724.49
10	218895	Kulans Inc	1006	B1-33364	3	72.30	216.90	3724.49
11	218895	Kulans Inc	1006	B1-20000	-1	72.18	-72.18	3724.49

df_1['percent_of _order'] = df_1['ext price']/df_1['order_total']
df_1

	account	name	order	sku	quantity	unit price	ext price	order_total	percent_of _order
0	383080	Will LLC	1001	B1-20000	7	33.69	235.83	576.12	0.409342
1	383080	Will LLC	1001	B1-27722	11	21.12	232.32	576.12	0.403249
2	383080	Will LLC	1001	B1-86481	3	35.99	107.97	576.12	0.187409
3	412290	Jerde-Hilpert	1005	S1-06532	48	55.82	2679.36	8185.49	0.327330
4	412290	Jerde-Hilpert	1005	S1-82801	21	13.62	286.02	8185.49	0.034942
5	412290	Jerde-Hilpert	1005	S1-06532	9	92.55	832.95	8185.49	0.101759
6	412290	Jerde-Hilpert	1005	S1-47412	44	78.91	3472.04	8185.49	0.424170
7	412290	Jerde-Hilpert	1005	S1-27722	36	25.42	915.12	8185.49	0.111798
8	218895	Kulans Inc	1006	S1-27722	32	95.66	3061.12	3724.49	0.821890
9	218895	Kulans Inc	1006	B1-33087	23	22.55	518.65	3724.49	0.139254
10	218895	Kulans Inc	1006	B1-33364	3	72.30	216.90	3724.49	0.058236
11	218895	Kulans Inc	1006	B1-20000	-1	72.18	-72.18	3724.49	-0.019380

第二种方法实现步骤（transform）

df.groupby(by='order')['ext price'].transform(sum)

0      576.12
1      576.12
2      576.12
3     8185.49
4     8185.49
5     8185.49
6     8185.49
7     8185.49
8     3724.49
9     3724.49
10    3724.49
11    3724.49
Name: ext price, dtype: float64

不再是只显示3个订单的对应项，而是保持了与原始数据集相同数量的项目，这样就很好继续了。这就是transform的独特之处。

df['order_total'] = df.groupby(by='order')['ext price'].transform(sum)
df['percent_of_order'] = df['ext price']/df['order_total']
df

	account	name	order	sku	quantity	unit price	ext price	order_total	percent_of_order
0	383080	Will LLC	1001	B1-20000	7	33.69	235.83	576.12	0.409342
1	383080	Will LLC	1001	B1-27722	11	21.12	232.32	576.12	0.403249
2	383080	Will LLC	1001	B1-86481	3	35.99	107.97	576.12	0.187409
3	412290	Jerde-Hilpert	1005	S1-06532	48	55.82	2679.36	8185.49	0.327330
4	412290	Jerde-Hilpert	1005	S1-82801	21	13.62	286.02	8185.49	0.034942
5	412290	Jerde-Hilpert	1005	S1-06532	9	92.55	832.95	8185.49	0.101759
6	412290	Jerde-Hilpert	1005	S1-47412	44	78.91	3472.04	8185.49	0.424170
7	412290	Jerde-Hilpert	1005	S1-27722	36	25.42	915.12	8185.49	0.111798
8	218895	Kulans Inc	1006	S1-27722	32	95.66	3061.12	3724.49	0.821890
9	218895	Kulans Inc	1006	B1-33087	23	22.55	518.65	3724.49	0.139254
10	218895	Kulans Inc	1006	B1-33364	3	72.30	216.90	3724.49	0.058236
11	218895	Kulans Inc	1006	B1-20000	-1	72.18	-72.18	3724.49	-0.019380

甚至可以一步解决：

df['percent_of_order'] = df['ext price']/df.groupby(by='order')['ext price'].transform(sum)
df

	account	name	order	sku	quantity	unit price	ext price	order_total	percent_of_order
0	383080	Will LLC	1001	B1-20000	7	33.69	235.83	576.12	0.409342
1	383080	Will LLC	1001	B1-27722	11	21.12	232.32	576.12	0.403249
2	383080	Will LLC	1001	B1-86481	3	35.99	107.97	576.12	0.187409
3	412290	Jerde-Hilpert	1005	S1-06532	48	55.82	2679.36	8185.49	0.327330
4	412290	Jerde-Hilpert	1005	S1-82801	21	13.62	286.02	8185.49	0.034942
5	412290	Jerde-Hilpert	1005	S1-06532	9	92.55	832.95	8185.49	0.101759
6	412290	Jerde-Hilpert	1005	S1-47412	44	78.91	3472.04	8185.49	0.424170
7	412290	Jerde-Hilpert	1005	S1-27722	36	25.42	915.12	8185.49	0.111798
8	218895	Kulans Inc	1006	S1-27722	32	95.66	3061.12	3724.49	0.821890
9	218895	Kulans Inc	1006	B1-33087	23	22.55	518.65	3724.49	0.139254
10	218895	Kulans Inc	1006	B1-33364	3	72.30	216.90	3724.49	0.058236
11	218895	Kulans Inc	1006	B1-20000	-1	72.18	-72.18	3724.49	-0.019380