spark core 4

Memory
execution: computation 
storage:   caching and propagating 
 
1.6 StaticMemoryManager 
1.6+ UnifiedMemoryManager 




SizeEstimator.estimate(file)
 
List<Integer> list = new ArrayList<Integer>() 
int[] array = new int[]


Map<Integer,Student> students = new HashMap<Integer,Student>()
id:name,age,grade.....$id:name,age,grade.....
1:大舅哥,25,1$2:二舅哥,24,2....


class Classes {
List<舅哥> s = new ArrayList<舅哥>();
}


json
数据本地化





RDD
preferedLocations


广播


map
function(map)




1 wenzi
2 xiaolu


1 school1 201
2 school2 202
3 school3 203


1 wenzi school1 201
2 xiaolu  school2 202






a join b on a.id=b.id






val peoples = sc.parallelize(Array(("1","wenzi"),("2","xiaolu"))).map(x => (x._1, x))   


val details = sc.parallelize(Array(
("1","school1","201"),
("2","school2","202"),
("3","school3","203")
)).map(x => (x._1, x))


peoples.join(details)
peoples.join(details).map(x => {
x._1 + "," + x._2._1._2 + "," + x._2._2._2 + "," + x._2._2._3
}).collect.foreach(println)
















val peoples = sc.parallelize(Array(("1","wenzi"),("2","xiaolu")))
.collectAsMap()


val details = sc.parallelize(Array(
("1","school1","201"),
("2","school2","202"),
("3","school3","203")
)).map(x => (x._1, x))


val peoplesBroadcast = sc.broadcast(sc.parallelize(Array(("1","wenzi"),("2","xiaolu"))).collectAsMap())


details.mapPartitions(partition => {
val broadcastPeoples = peoplesBroadcast.value
for((key,value) <- partition if(broadcastPeoples.contains(key)))
yield (key, broadcastPeoples.getOrElse(key, ""), value._2, value._3)
}).collect().foreach(println)






./spark-shell --master yarn \
--executor-memory=1G \
--num-executors=2 \
--executor-cores=1 




 
 
 
 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值