SQL数据分析流程:
分析需求:
ESC电商公司想做一次区域销售分析,希望数据分析师调取各年度、各季度的销售冠军的省份及其销售数据。销售数据应包括:总销售额、总完成订单数目、平均每个订单销售额。并且回答以下两个业务问题:
- 2018年第二季度的销售冠军是哪个省份?(文字)
- 山西省2018年第一季度总销售额同比2017年的第一季度增长多少?(文字 + 数学公式)
- Submit full code
得到需求结果,立马映射出需求数据结构框架:
-
表头:销售冠军省份(首列) + 年份 + 季度 + 总销售额(按照各年度、季度聚类) + 总订单数目 + 平均每个订单销售额
分析目标细化: -
最终返回结果:销售冠军省份及销售数据
-
聚合方式: 各年、各季度、总销售额最大省份
数据定位:
- 提供每个订单下单日期、销售额、订单id 以及可衍生变量——订单平均销售额的
order_info
- 提供订单与订单之间联系的顾客信息表customer_info
详细解答步骤:
①得到季度信息、年度信息数据 (YEAR、MONTH)
②通过表间连接获得省份信息 (INNER JOIN)
③为各省份每年每季度计算销售数据(聚合)
总销售额 (SUM)
总完成订单数目 (COUNT)
平均每个订单销售额度 (AVG)
④获得各年度、各季度销售冠军的省份及其销售数据
计算各年度、各季度最高销售额(GROUP BY 、MAX)
通过表间连接获得各年度、各季度最高销售额对应的省份及其他销售数据 (INNER JOIN)
—— ———— ———— ————— ——— ———— ———— ———— ———— ———— ——— ——
探索阶段:(不符合业务要求)
# 生成c表
SELECT
a.`province`,
#b.`create_time`,
YEAR(create_time) AS `year`,
CASE
WHEN MONTH(create_time) >=10 THEN 4
WHEN MONTH(create_time) >=7 THEN 3
WHEN MONTH(create_time) >=4 THEN 2
ELSE 1
END AS `quarter`,
SUM(payment_amount) AS `total_sales`,
COUNT(order_id) AS `order_count`,
AVG(payment_amount) AS `avg_payment`
FROM customer_info AS a
INNER JOIN `order_info` AS b
ON a.customer_id = b.customer_id
GROUP BY `province`, `year`,`quarter`
# 进一步由c表生成d表
SELECT
`year`,
`quarter`,
MAX(`total_sales`) AS `total_sales` #我要适当修改一些字段命名
FROM
(
SELECT
a.`province`,
#b.`create_time`,
YEAR(create_time) AS `year`,
CASE
WHEN MONTH(create_time) >=10 THEN 4
WHEN MONTH(create_time) >=7 THEN 3
WHEN MONTH(create_time) >=4 THEN 2
ELSE 1
END AS `quarter`,
SUM(payment_amount) AS `total_sales`,
COUNT(order_id) AS `order_count`,
AVG(payment_amount) AS `avg_payment`
FROM customer_info AS a
INNER JOIN `order_info` AS b
ON a.customer_id = b.customer_id
GROUP BY `province`, `year`,`quarter`
) AS c
GROUP BY `year`,`quarter`
# 生成终表
SELECT
c.`province`,
d.`year`,
d.`quarter`,
d.`total_sales`,
c.`order_count`,
c.`avg_payment`
FROM
(
SELECT
`year`,
`quarter`,
MAX(`total_sales`) AS `total_sales` # 适当修改一些字段命名
FROM
(
SELECT
a.`province`,
#b.`create_time`,
YEAR(create_time) AS `year`,
CASE
WHEN MONTH(create_time) >=10 THEN 4
WHEN MONTH(create_time) >=7 THEN 3
WHEN MONTH(create_time) >=4 THEN 2
ELSE 1
END AS `quarter`,
SUM(payment_amount) AS `total_sales`,
COUNT(order_id) AS `order_count`,
AVG(payment_amount) AS `avg_payment`
FROM customer_info AS a
INNER JOIN `order_info` AS b
ON a.customer_id = b.customer_id
GROUP BY `province`, `year`,`quarter`
) AS c
GROUP BY `year`,`quarter`
) AS d
INNER JOIN
(
SELECT
a.`province`,
#b.`create_time`,
YEAR(create_time) AS `year`,
CASE
WHEN MONTH(create_time) >=10 THEN 4
WHEN MONTH(create_time) >=7 THEN 3
WHEN MONTH(create_time) >=4 THEN 2
ELSE 1
END AS `quarter`,
SUM(payment_amount) AS `total_sales`,
COUNT(order_id) AS `order_count`,
AVG(payment_amount) AS `avg_payment`
FROM customer_info AS a
INNER JOIN `order_info` AS b
ON a.customer_id = b.customer_id
GROUP BY `province`, `year`,`quarter`
) AS c
ON c.`year` = d.`year` AND c.`quarter` = d.`quarter`AND c.`total_sales` = d.`total_sales`
/*You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '= d.`quarter`, c.`total_sales` = d.`total_sales`'
ON c.`year` = d.`year` , c.`quarter` = d.`quarter`, c.`total_sales` = d.`total_sales`*/
1.最终结果输出:
2.2018第二季度的销售冠军: 吉林省
3:山西省2018年第一季度总销售额同比2017年第一季度增长:(119001.49 - 46850.5) / 46850.5 = 154%
____________________________________________________________________________________________________________-
代码了逻辑思路:
- 计算产品销售记录的年份、季度 - “订单信息表/ order_info” (YEAR / CASE WHE )
- 计算各年度、各季度的销售额 - “订单信息表/ order_info” (GROUP BY / SUM / COUNT)
- 连接顾客信息表,生成省份、年度、季度、总销售额、订单数量、订单平均销售额字段的临时表c表
- 进一步通过MAX函数,并以年度、季度进行分组,生成年度、季度、最大总销售额为字段的临时表d表
- 继而,通过inner join 语句连接c表、d表,并连接省份、订单总数、订单平均销售额等信息获得最终的各年度、季度的冠军数据表;
技术回顾:
技术要点:多字段聚合、多字段作为聚合键进行表的拼接(AND);值得回顾;
一步到位比较难,还时要多些写,多思考;
后续更新:
sjk专项练习 + 项目实战 + xiaozao