探索在数据开发中利用chatgpt提高代码质量---第一篇_order by ${field} ${order} chatgpt-CSDN博客

本文链接：https://blog.csdn.net/csdn_wxl_use/article/details/131559388

文章比较了两种SQL查询方法，用于根据商品销售金额分组并计算每组的唯一浏览用户数。ChatGPT生成的代码虽然运行时间较长（107.6s），但通过限制top_goods的查询范围和减少连接数据量，提高了效率。然而，原始代码可能需要调整以确保用户在同组内只被计算一次。

摘要由CSDN通过智能技术生成

题目一：

order订单表，字段为：goods_id， amount ；

pv 浏览表，字段为：goods_id，uid；

goods按照总销售金额排序，分成top10,top10~top20,其他三组

求每组商品的浏览用户数（同组内同一用户只能算一次）

（1）数据准备：

-- 创建订单表 (order_table)
 CREATE TABLE bdms.order_table ( goods_id INT, amount DECIMAL(10, 2) ); 
-- 插入示例数据 
INSERT INTO bdms.order_table (goods_id, amount) VALUES (1, 10.5), (2, 20.0), (1, 15.3), (3, 8.2), (2, 18.7);
-- 创建浏览表(pv_table)
CREATE TABLE bdms.pv_table (
  goods_id INT,
  uid INT
);
-- 插入示例数据
INSERT INTO bdms.pv_table (goods_id, uid)VALUES
  (1, 1001),
  (2, 1002),
  (1, 1003),
  (3, 1001),
  (2, 1004),
  (3, 1005),
  (1, 1002);

（2）自己写法：

with t1 as(
select goods_id
	  ,case when nn<= 10 then 'top10'
            when 10<nn<= 20 then 'top10~top20'
       else 'other' 
	   end as goods_group
from
(
    select goods_id
    ,row_number() over(partition by goods_id order by sale_sum desc) as nn
    from
    (
        select goods_id,sum(amount) as sale_sum  --  每种商品总金额
        from bdms.order_table
        group by goods_id
    ) aa
) bb
)
select b.goods_group,count(distinct a.uid) as num
from bdms.pv_table a 
left join t1 b
on a.goods_id = b.goods_id
group by b.goods_id;

（3）Chatgpt生成：

WITH top_goods AS (
  SELECT goods_id, total_sales,rn
  FROM (
    SELECT goods_id, SUM(amount) AS total_sales,
           ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS rn
    FROM bdms.order_table
    GROUP BY goods_id
  ) t
  WHERE rn <= 20
),
grouped_goods as(
SELECT tg.goods_id,
        CASE WHEN tg.rn <= 10 THEN 'top10'
			WHEN 10<tg.rn <= 20 THEN 'top10~top20'
			ELSE 'other'
	   END AS sales_group
  FROM top_goods tg
)
SELECT g.sales_group, COUNT(DISTINCT pv.uid) AS unique_users
FROM grouped_goods g
LEFT JOIN (
			SELECT DISTINCT goods_id, uid
			FROM bdms.pv_table pv 
			) pv
ON g.goods_id = pv.goods_id
GROUP BY g.sales_group;

（4）分析结果：

a.对比运行时长：

自己写的运行时长：88.2s

Chatgpt运行时长：107.6s

b.结果阐述：

缺点：根据运行时长对比，Chatgpt生成的代码运行时间相对较长，而且根据本题来说要求同组内同一用户只能算一次，利用Chatgpt优化生成的代码时，并不能每次都生成满足符合题目要求的sql，还需要自己再修改一下

优点：在top_goods查询中，通过限定条件rn <= 20，使临时表的结果数据量大大减少，当数据量很大的时候，为后面查询该表时提高了查询效率。在外部查询中，将浏览表（pv_table）的子查询改为获取 DISTINCT 的商品ID和用户ID，以减少左连接时的数据量。虽然运行时长相对较长，但是对于代码的质量和严谨上是过关的。