使用pandas groupby的一个案例

57 篇文章 1 订阅
11 篇文章 1 订阅

mongodb中有下面一组数据,业务场景就不介绍了,看数据应该很好理解

[
    {
        "_id":"606fdd0805f5710008a82223",
        "brand_id":"1",
        "brand_name":"APPLE",
        "series_id":"11",
        "series_name":"iPhone",
        "product_name":"iPhone12",
        "cover":[
            {
                "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
            }
        ]
    },
    {
        "_id":"606d7f741492340008497daf",
        "brand_id":"1",
        "brand_name":"APPLE",
        "series_id":"11",
        "series_name":"MacBook",
        "product_name":"MacBook Pro",
        "cover":[
            {
                "oss_name":"56f42ec6aa7b87010c3b2b09310d37d6.png"
            }
        ]
    },
    {
        "_id":"6080d04d2cc7f20008f22670",
        "brand_id":"2",
        "brand_name":"SONY",
        "series_id":"21",
        "series_name":"PlayStation",
        "product_name":"PS5",
        "cover":[
            {
                "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
            }
        ]
    }
]

我们希望对品牌、系列进行分组聚合变为如下数据

[
    {
        "brand_id":"1",
        "brand_name":"APPLE",
        "series_list":[
            {
                "series_name":"iPhone",
                "series_id":"11",
                "products_list":[
                    {
                        "_id":"606fdd0805f5710008a82223",
                        "cover":[
                            {
                                "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                            }
                        ],
                        "product_name":"iPhone12"
                    }
                ]
            },
            {
                "series_name":"MacBook",
                "series_id":"12",
                "products_list":[
                    {
                        "_id":"606fdd0805f5710008a82223",
                        "cover":[
                            {
                                "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                            }
                        ],
                        "product_name":"MacBook Pro"
                    }
                ]
            }
        ]
    },
    {
        "brand_id":"2",
        "brand_name":"SONY",
        "series_list":[
            {
                "series_name":"PlayStation",
                "series_id":"21",
                "products_list":[
                    {
                        "_id":"6080d04d2cc7f20008f22670",
                        "cover":[
                            {
                                "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                            }
                        ],
                        "product_name":"PS5"
                    }
                ]
            }
        ]
    }
]

相同品牌的系列分组到同一品牌下的series_list集合中,相同品牌和系列的产品分组到同一品牌下的同一series_list中的products_list集合中

使用mongodb的aggregate框架理论上能实现,但应该比较复杂,如果有高手了解还望不吝赐教。
由于有一定的pandas基础,我尝试使用pandas处理上面的数据

预处理

在分组聚合前,先对数据进行预处理,改一下数据结构,变为如下结构

[
    {
        "brand_id":"1",
        "brand_name":"APPLE",
        "series_list":{
            "series_name":"iPhone",
            "series_id":"11",
            "products_list":{
                "_id":"606fdd0805f5710008a82223",
                "cover":[
                    {
                        "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                    }
                ],
                "product_name":"iPhone12.0"
            }
        }
    },
    {
        "brand_id":"1",
        "brand_name":"APPLE",
        "series_list":{
            "series_name":"MacBook",
            "series_id":"11",
            "products_list":{
                "_id":"606d7f741492340008497daf",
                "cover":[
                    {
                        "oss_name":"56f42ec6aa7b87010c3b2b09310d37d6.png"
                    }
                ],
                "product_name":"MacBook Pro"
            }
        }
    },
    {
        "brand_id":"2",
        "brand_name":"SONY",
        "series_list":{
            "series_name":"PlayStation",
            "series_id":"21",
            "products_list":{
                "_id":"6080d04d2cc7f20008f22670",
                "cover":[
                    {
                        "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                    }
                ],
                "product_name":"PS5"
            }
        }
    }
]
for product in prodocts_list:
    product['series_list'] = {}
    product['series_list']['series_name'] = product.pop('series_name')
    product['series_list']['series_id'] = product.pop('series_id')
    product['series_list']['products_list'] = {}
    product['series_list']['products_list']['_id'] = product.pop('_id')
    product['series_list']['products_list']['cover'] = product.pop('cover')
    product['series_list']['products_list']['product_name'] = product.pop('product_name')

分组聚合

进行两次groupby操作,第一次groupby将所有相同品牌的产品分组到同一series_list中

>>> brands = pd.DataFrame(prodocts_list).groupby(['brand_id', 'brand_name']).agg(list).reset_index().to_dict('records')
[
    {
        "brand_id":"1",
        "brand_name":"APPLE",
        "series_list":[
            {
                "series_name":"iPhone",
                "series_id":"11",
                "products_list":{
                    "_id":"606fdd0805f5710008a82223",
                    "cover":[
                        {
                            "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                        }
                    ],
                    "product_name":"iPhone12"
                }
            },
            {
                "series_name":"MacBook",
                "series_id":"11",
                "products_list":{
                    "_id":"606d7f741492340008497daf",
                    "cover":[
                        {
                            "oss_name":"56f42ec6aa7b87010c3b2b09310d37d6.png"
                        }
                    ],
                    "product_name":"MacBook Pro"
                }
            }
        ]
    },
    {
        "brand_id":"2",
        "brand_name":"SONY",
        "series_list":[
            {
                "series_name":"PlayStation",
                "series_id":"21",
                "products_list":{
                    "_id":"6080d04d2cc7f20008f22670",
                    "cover":[
                        {
                            "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                        }
                    ],
                    "product_name":"PS5"
                }
            }
        ]
    }
]

第二次groupby,则对每个品牌中的series_list进行groupby,将相同系列的产品分组到同一products_list中

>>> for brand in brands:
    brand['series_list'] = pd.DataFrame(brand['series_list']).groupby(['series_name', 'series_id']).agg(list).reset_index().to_dict('records')
[
    {
        "brand_id":"1",
        "brand_name":"APPLE",
        "series_list":[
            {
                "series_name":"MacBook",
                "series_id":"11",
                "products_list":[
                    {
                        "_id":"606d7f741492340008497daf",
                        "cover":[
                            {
                                "oss_name":"56f42ec6aa7b87010c3b2b09310d37d6.png"
                            }
                        ],
                        "product_name":"MacBook Pro"
                    }
                ]
            },
            {
                "series_name":"iPhone",
                "series_id":"11",
                "products_list":[
                    {
                        "_id":"606fdd0805f5710008a82223",
                        "cover":[
                            {
                                "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                            }
                        ],
                        "product_name":"iPhone12"
                    }
                ]
            }
        ]
    },
    {
        "brand_id":"2",
        "brand_name":"SONY",
        "series_list":[
            {
                "series_name":"PlayStation",
                "series_id":"21",
                "products_list":[
                    {
                        "_id":"6080d04d2cc7f20008f22670",
                        "cover":[
                            {
                                "oss_name":"0e4ebf0ac613a757fff916a90b2224c9.jpg"
                            }
                        ],
                        "product_name":"PS5"
                    }
                ]
            }
        ]
    }
]
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
使用pandasgroupby()方法时,可以结合apply()方法对分组后的数据进行操作。引用中的代码演示了如何对变量df的"品种"列进行聚合,并使用apply()方法打印每个分组后的数据。通过这种方式,我们可以对每个分组进行自定义的操作或计算。引用中的代码演示了如何在groupby()时使用as_index=False参数,以避免结果中产生RangeIndex作为索引。通过这种方式,我们可以获得一个带有自定义列名的结果。引用中的文章则提供了更多关于使用groupby().apply()的案例和技巧。总的来说,pandasgroupby()与apply()的结合可以帮助我们实现更灵活和高级的数据分析和处理。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [python数据分析高阶应用技巧-pandas库聚合案例groupby().apply()写法强化】](https://blog.csdn.net/weixin_48964486/article/details/127588129)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* [Pandas.groupby.apply 的一些细节](https://blog.csdn.net/a597688570/article/details/127992318)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值