题目:求月销售额和总销售额
1、数据说明
(1)数据格式
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250
(2)字段含义 店铺,月份,金额
3、需求 求出每个店铺的当月销售额和累计到当月的总销售额
scala代码实现
如果将本地文件内容读到scala里面
想再IDEA里面用scala编程解决的话,需要把文件的内容读进来,用一个变量来接,方法为
//固定写法,其中jd文件夹一定要建在scala工程根目录下
val lines=Source.fromFile("jd/store.txt","utf-8").getLines().toArray
为了方便感受每一步的输出,使用dos窗口,观感效果更好
1.创建变量来接数据
scala> var lines=Array("a,01,150","a,01,200","b,01,1000","b,01,800","c,01,250"
| ,"c,01,220","b,01,6000","a,02,2000","a,02,3000","b,02,1000"
| ,"b,02,1500","c,02,350","c,02,280","a,03,350","a,03,250")
lines: Array[String] = Array(a,01,150, a,01,200, b,01,1000,
b,01,800, c,01,250, c,01,220, b,01,6000, a,02,2000, a,02,3000,
b,02,1000, b,02,1500, c,02,350, c,02,280, a,03,350, a,03,250)
scala> lines.foreach(println)
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250
每个店铺的当月销售额
思路分析:以店铺名和月份两个字段作为分组条件,对对应的销售额进行相加。如果把店铺名和月份看为一个整体,是不是类似wordCount呢
走起
这里我是把他俩用元组合并
切割转化
按","分割为Array数组
scala> lines.map(x=>x.split(","))
res1: Array[Array[String]] = Array(Array(a, 01, 150)
, Array(a, 01, 200), Array(b, 01, 1000), Array(b, 01, 800)
,Array(c, 01, 250), Array(c, 01, 220), Array(b, 01, 6000)
,Array(a, 02, 2000), Array(a, 02, 3000), Array(b, 02, 1000)
,Array(b, 02, 1500), Array(c, 02, 350), Array(c, 02, 280)
,Array(a, 03, 350), Array(a, 03, 250))
每个数组转为元组,元组类型为((String, Int), Int),即元组第一
个元素也是元组,将其看为整体,之后以他做分组,就可以了
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
res4: Array[((String, Int), Int)] =
Array(
((a,01),150), ((a,01),200), ((b,01),1000),
((b,01),800), ((c,01),250), ((c,01),220),
((b,01),6000), ((a,02),2000), ((a,02),3000),
((b,02),1000), ((b,02),1500), ((c,02),350),
((c,02),280), ((a,03),350), ((a,03),250)
)
这两步可以合成一步,太变态了,算了吧
scala> lines.map(x=>((x.split(",")(0),x.split(",")(1).toInt),x.split(",")(2).toInt))
res6: Array[((String,Int), Int)] = Array(((a,01),150),
((a,01),200), ((b,01),1000), ((b,01),800), ((c,01),250),
((c,01),220), ((b,01),6000), ((a,02),2000), ((a,02),3000),
((b,02),1000), ((b,02),1500), ((c,02),350), ((c,02),280),
((a,03),350), ((a,03),250))
以(店铺名,月份)做分组,聚合
groupBy之后生成的是键值对,键是分组字段,值是所有包含分组字段的Array数组,分组成功
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1)
res9: scala.collection.immutable.Map[(String, String),Array[((String, Int), Int)]] =
Map(
(c,01) -> Array(((c,01),250), ((c,01),220)),
(b,02) -> Array(((b,02),1000), ((b,02),1500)),
(c,02) -> Array(((c,02),350), ((c,02),280)),
(a,03) -> Array(((a,03),350), ((a,03),250)),
(a,01) -> Array(((a,01),150), ((a,01),200)),
(a,02) -> Array(((a,02),2000), ((a,02),3000)),
(b,01) -> Array(((b,01),1000), ((b,01),800), ((b,01),6000))
)
分组成功后做聚合
一般键值对的map操作如果不想改变键的值,那就用Map特有的mapValues函数,好处是它的转换是对值做转换,键还是原来的,不会变
用一个案例来说明
d是一个Map
我只想实现所有的键值+1,但是用map完成的是整体转换,我每次都有加上键,(x=>(x._1,x._2+1)),就很麻烦
但是用mapValues直接操作值,就很舒服
scala> val d=Map("a"->1,"b"->2)
d: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2)
scala> d.map(x=>(x._1,x._2+1))
res14: scala.collection.immutable.Map[String,Int] = Map(a -> 2, b -> 3)
scala> d.mapValues(x=>x+1)
res15: scala.collection.immutable.Map[String,Int] = Map(a -> 2, b -> 3)
开始吧,对值聚合
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1)
.mapValues(x=>x.map(x=>x._2).sum)
res20: scala.collection.immutable.Map[(String, Int),Int] =
Map(
(c,2) -> 630,
(b,2) -> 2500,
(a,2) -> 5000,
(a,3) -> 600,
(b,1) -> 7800,
(c,1) -> 470,
(a,1) -> 350)
)
虽然顺序有点乱,但是结果是不是出来了,开心的输出吧
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1)
.mapValues(x=>x.map(x=>x._2).sum)
.foreach(x=>
println(s"${x._1._1}\t${x._1._2}\t${x._2}\t")
)
c 2 630
b 2 2500
a 2 5000
a 3 600
b 1 7800
c 1 470
a 1 350
这道题做完了,但是这个结果看起来可读性就很差,如果按照店铺分组,月份排序是不是看起来更美观,在hive里不就是通过分组和排序让结果清晰可见吗,那我们来试一试吧
先按照店铺分组
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1)
.mapValues(x=>x.map(x=>x._2).sum)
.groupBy(x=>x._1._1)
res23: scala.collection.immutable.Map[String,scala.collection.immutable.Map[(String, Int),Int]] =
Map(
b -> Map((b,2) -> 2500, (b,1) -> 7800),
a -> Map((a,2) -> 5000, (a,3) -> 600, (a,1) -> 350),
c -> Map((c,2) -> 630, (c,1) -> 470)
)
分组完成,对每个组内进行月份排序,特别要注意,组内的值也是Map,而Map是没有sortBy这些排序方法的,所以要将Map转为Array才能排序
把每个组内的键值转化为Array,
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1).mapValues(x=>x.map(x=>x._2).sum)
.groupBy(x=>x._1._1)
.mapValues(x=>x.toArray)
res31: scala.collection.immutable.Map[String,Array[((String, Int), Int)]] =
Map(
b -> Array(((b,2),2500), ((b,1),7800)),
a -> Array(((a,2),5000), ((a,3),600), ((a,1),350)),
c -> Array(((c,2),630), ((c,1),470))
)
这时结构已经有些复杂了,值里有一个值是和键重复的,可以选择把它去掉,提高可读性,去掉也很简单,我直接把排序也加在里面了
```cpp
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1)
.mapValues(x=>x.map(x=>x._2).sum)
.groupBy(x=>x._1._1)
.mapValues(x => x.toArray.map(x => Array(x._1._2, x._2)).sortBy(x=>x(0)))
res35: scala.collection.immutable.Map[String,Array[Array[Int]]] =
Map(
b -> Array(Array(1, 7800), Array(2, 2500)),
a -> Array(Array(1, 350), Array(2, 5000), Array(3, 600)),
c -> Array(Array(1, 470), Array(2, 630))
)
此时已经完成了按店铺分组,按月份排序
愉快的输出吧
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1)
.mapValues(x=>x.map(x=>x._2).sum)
.groupBy(x=>x._1._1)
.mapValues(x => x.toArray.map(x => Array(x._1._2, x._2)).sortBy(x=>x(0)))
.foreach(x=>{
print(s"${x._1}\t")
x._2.foreach(x=>print(s"(${x(0)}\t${x(1)})\t"))
println()
})
b (1 7800) (2 2500)
a (1 350) (2 5000) (3 600)
c (1 470) (2 630)
每个店铺累计到当月的总销售额
思路分析:算这个肯定要用到上面的结果吧,这就是我为啥拼了命的分组,排序
先把结果放在这看看
scala> lines.map(x=>x.split(","))
.map(x=>((x(0),x(1).toInt),x(2).toInt))
.groupBy(x=>x._1)
.mapValues(x=>x.map(x=>x._2).sum)
.groupBy(x=>x._1._1)
.mapValues(x => x.toArray.map(x => Array(x._1._2, x._2)).sortBy(x=>x(0)))
res35: scala.collection.immutable.Map[String,Array[Array[Int]]] =
Map(
b -> Array(Array(1, 7800), Array(2, 2500)),
a -> Array(Array(1, 350), Array(2, 5000), Array(3, 600)),
c -> Array(Array(1, 470), Array(2, 630))
)
键不用动,我只要想着把每个店铺每个月的销售额做累加就行了,在同一个数组里做累加并把每一步的结果全部返回,有两种方式,举例说明
scala> c
res46: Array[Int] = Array(1, 2, 3, 4, 5)
//当然选这个了
scala> c.scan(0)(_+_).tail
res50: Array[Int] = Array(1, 3, 6, 10, 15)
scala> c.inits.toArray.map(x=>x.sum).reverse.tail
res51: Array[Int] = Array(1, 3, 6, 10, 15)
现在又有一个问题,数据不是在一个数组啊,而且我算完之后还得对应好月份,也就是
1.拆成两个数组 map
2.对那个销售额数组进行累加 scan配合tail
3.两个数组的元素再一一对应 zip
先用一个变量来接上面的结果,然后你就可以感受到无尽的快乐
scala> val result=
lines.map(x => x.split(","))
.map(x => ((x(0), x(1).toInt), x(2).toInt))
.groupBy(_._1)
.mapValues(x => x.map(x => x._2).sum)
.groupBy(_._1._1)
.mapValues(x => x.toArray.map(x => Array(x._1._2, x._2)).sortBy(x=>x(0)))
result: scala.collection.immutable.Map[String,Array[Array[Int]]] =
Map(
b -> Array(Array(1, 7800), Array(2, 2500)),
a -> Array(Array(1, 350), Array(2, 5000), Array(3, 600)),
c -> Array(Array(1, 470), Array(2, 630))
)
开始聚合
一步搞定
scala> result.mapValues(x=>x.map(_(0)).zip(x.map(_(1)).scan(0)(_+_).tail))
res66: scala.collection.immutable.Map[String,Array[(Int, Int)]] =
Map(
b -> Array((1,7800), (2,10300)),
a -> Array((1,350), (2,5350), (3,5950)),
c -> Array((1,470), (2,1100))
)
扩展一下,你想求截止到当月最大销售额,行啊,把scan里面的累加方法改为求最大值就行了啊,来,走起
scala> result.mapValues(x=>x.map(_(0)).zip(x.map(_(1)).scan(0)((x,y)=>Array(x,y).max).tail))
res69: scala.collection.immutable.Map[String,Array[(Int, Int)]] =
Map(
b -> Array((1,7800), (2,7800)),
a -> Array((1,350), (2,5000), (3,5000)),
c -> Array((1,470), (2,630))
)
看不懂里面的方法,给你看一下这个
scala> result.mapValues(x=>x.map(_(0)).zip(x.map(_(1)).scan(0)(_+_).tail))
res66: scala.collection.immutable.Map[String,Array[(Int, Int)]] =
Map(
b -> Array((1,7800), (2,10300)),
a -> Array((1,350), (2,5350), (3,5950)),
c -> Array((1,470), (2,1100))
)
scala> result.mapValues(x=>x.map(_(0)).zip(x.map(_(1)).scan(0)((x,y)=>Array(x,y).sum).tail))
res70: scala.collection.immutable.Map[String,Array[(Int, Int)]] =
Map(
b -> Array((1,7800), (2,10300)),
a -> Array((1,350), (2,5350), (3,5950)),
c -> Array((1,470), (2,1100))
)
scala> result.mapValues(x=>x.map(_(0)).zip(x.map(_(1)).scan(0)((x,y)=>x+y).tail))
res71: scala.collection.immutable.Map[String,Array[(Int, Int)]] =
Map(
b -> Array((1,7800), (2,10300)),
a -> Array((1,350), (2,5350), (3,5950)),
c -> Array((1,470), (2,1100))
)
这三个一模一样,我把sum改成max,一键搞定
只要分组好,排序好,解决别的不要太简单,妈妈再也不用担心我的编程