#一.计算器
##1.官网
![](http://img.blog.itpub.net/blog/2019/07/25/31047f6c155f551f.png?x-oss-process=style/bb)
##2.解释
```
计数器只支持加,计算器字task里面
```
##3.测试
![](http://img.blog.itpub.net/blog/2019/07/25/d754b858a53f8bb5.png?x-oss-process=style/bb)
##4.结果截图WEBUI
![](http://img.blog.itpub.net/blog/2019/07/25/b14141693249018a.png?x-oss-process=style/bb)
##4.应用场景
```
数据很多有的数据挂了,做数据质量监控用
```
#二.广播变量
##1.官网
![](http://img.blog.itpub.net/blog/2019/07/25/ff2ee9b625d5a42d.png?x-oss-process=style/bb)
##2.join代码
```scala
def commonJoin(sc:SparkContext): Unit = {
val peopleInfo = sc.parallelize(Array(("G301","糊涂虫"),("G302","森老"),("G303","Gordon"))).map(x=>(x._1, x))
val peopleDetail = sc.parallelize(Array(("G301","清华大学",18))).map(x=>(x._1,x))
// TODO... 大表关联小表 join key from a join b on a.id=b.id
peopleInfo.join(peopleDetail).map(x=>{x._1 + "," + x._2._1._2 + "," + x._2._2._2+ "," + x._2._2._3})
}
```
##3.结果WEBUI
![](http://img.blog.itpub.net/blog/2019/07/25/ff386cbf9acc0089.png?x-oss-process=style/bb)
##4.广播变量使用前提
```
广播变量的前提条件是数据量少,一大一小,不能超过内存
数据量大小还要看内存,你内存够大就可以放
广播变量放到内存中
```
##5.广播变量的join代码
```scala
def broadcastJoin(sc:SparkContext): Unit = {
val peopleInfo = sc.parallelize(Array(("G301","糊涂虫"),("G302","森老"),("G303","Gordon"))).collectAsMap()
val peopleDetail = sc.parallelize(Array(("G301","清华大学",18))).map(x=>(x._1, x))
// 通过sc将变量广播出去
val peopleBroadcast = sc.broadcast(peopleInfo)
// mappartition: 取出表中的一条记录和广播变量中的对比
peopleDetail.mapPartitions(x=>{
val map = peopleBroadcast.value // 是不是就是内存的东西
for((key,value)
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/69941978/viewspace-2651740/,如需转载,请注明出处,否则将追究法律责任。