SparkSQL 实战一

1. 数据准备

1.1 数据说明

💡 数据集见文末!

数据集为活品交易数据集,每个订单可能包含多个货品,每个订单可以产生多次交易,不同的货品有不同的单价,表结构如下图:

image-20211224092350123

1.2 案例类准备

case class tbDate (dateid:String, years:Int, theyear:Int, month:Int, day:Int, weekday:Int, week:Int, quarter:Int, period:Int, halfmonth:Int) extends Serializable
case class tbStockDetail (ordernumber:String, rownum:Int, itemid:String, number:Int, price:Double, amount:Double) extends Serializable
case class tbStock (ordernumber:String,locationid:String,dateid:String) extends Serializable

1.3 加载数据

val tbDateRdd = sc.textFile("PracticalExercisesData/tbDate.txt")
val tbDateDs = tbDateRdd.map(_.split(","))
	.map(attr => tbDate(attr(0), attr(1).trim().toInt, attr(2).trim().toInt, attr(3).trim().toInt, attr(4).trim().toInt, attr(5).trim().toInt, attr(6).trim().toInt, attr(7).trim().toInt, attr(8).trim().toInt, attr(9).trim().toInt)).toDS()

image-20211224100041819

val tbStockRdd = sc.textFile("PracticalExercisesData/tbStock.txt")
val tbStockDs = tbStockRdd.map(_.split(","))
	.map(attr => tbStock(attr(0), attr(1), attr(2))).toDS()

image-20211224100113552

val tbStockDetailRdd = sc.textFile("PracticalExercisesData/tbStockDetail.txt")
val tbStockDetailDs = tbStockDetailRdd.map(_.split(","))
	.map(attr => tbStockDetail(attr(0), attr(1).trim().toInt, attr(2), attr(3).trim().toInt, attr(4).trim().toDouble, attr(5).trim().toDouble)).toDS

image-20211224100147075

1.4 创建视图

tbDateDs.createOrReplaceTempView("tbDate")
tbStockDs.createOrReplaceTempView("tbStock")
tbStockDetailDs.createOrReplaceTempView("tbStockDetail")

2. 业务分析

2.1 计算所有订单中每年的销售单数、销售总额

spark.sql(
    """
    | select
    |   t3.theyear,
    |   count(distinct t1.ordernumber),
    |   sum(t2.amount)
    | from tbStock t1
    |   join tbStockDetail t2 on t1.ordernumber = t2.ordernumber
    |   join tbDate t3 on t1.dateid = t3.dateid
    | group by t3.theyear
    | order by t3.theyear
    |""".stripMargin)

image-20211224101905256

2.2 计算所有订单每年最大金额订单的销售额

1️⃣ 统计每年,每个订单一共有多少销售额

spark.sql(
    """
    | select
    |   t1.dateid,
    |   t1.ordernumber,
    |   sum(t2.amount) SumOfAmount
    | from tbStock t1
    |   join tbStockDetail t2 on t1.ordernumber = t2.ordernumber
    | group by t1.dateid, t1.ordernumber
    |""".stripMargin).createOrReplaceTempView("t3")

image-20211224103137072

2️⃣ 以上一步查询结果为基础表,和表 tbDate 使用 dateid join,求出每年最大金额订单的销售额

spark.sql(
    """
    | select
    |   t4.theyear,
    |   max(t3.SumOfAmount) MaxOfAmount
    | from t3
    |   join tbDate t4 on t3.dateid = t4.dateid
    | group by theyear
    | order by theyear desc
    |""".stripMargin)

image-20211224104555295

2.3 计算所有订单中每年最畅销货品

1️⃣ 求出每年每个货品的销售额

spark.sql(
    """
    | select
    |   t3.theyear,
    |   t2.itemid,
    |   sum(t2.amount) SumOfAmount
    | from tbStock t1
    |   join tbStockDetail t2 on t1.ordernumber = t2.ordernumber
    |   join tbDate t3 on t1.dateid = t3.dateid
    | group by t3.theyear, t2.itemid
    |""".stripMargin).createOrReplaceTempView("t4")

image-20211224105521387

2️⃣ 在第一步的基础上,统计每年单个货品中的最大金额

spark.sql(
    """
    | select
    |   t4.theyear,
    |   max(t4.SumOfAmount) MaxOfAmount
    | from t4
    | group by t4.theyear
    |""".stripMargin).createOrReplaceTempView("t5")

image-20211224105543867

3️⃣ 用最大销售额和统计好的每个货品的销售额join,以及用年join,集合得到最畅销货品那一行信息

spark.sql(
    """
    | select
    |   t4.theyear,
    |   t4.itemid,
    |   t5.MaxOfAmount
    | from
    |   t4 join t5 on t4.theyear = t5.theyear
    |   and t4.SumOfAmount = t5.MaxOfAmount
    | order by t4.theyear
    |""".stripMargin)

image-20211224110354117

🌈 数据集:

链接:https://pan.baidu.com/s/1vVituBq8ZGiCVBDCYu6sgg
提取码:czck

 


❤️ END ❤️
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

JOEL-T99

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值