协同过滤itembase增量计算Spark实现
Controller
1. 数据统计
user count:=========>8239237
itemCode count:=====>7421567
spark result distinct count ======>5826484
2. 运行子任务
倒叙
3. Spark集群信息
Spark初始化采用硬资源分配,计算过程中动态进行资源分配。
协同过滤为大数据依赖型,需大内存,cpu要求一般
4. 参数配置
sparkConf.set("spark.executor.memory","7G"); sparkConf.set("spark.executor.cores","1"); sparkConf.set("spark.executor.heartbeatInterval","20s"); sparkConf.set("spark.kryoserializer.buffer.max","256m"); sparkConf.set("spark.speculation","true"); sparkConf.set("spark.worker.timeout","500"); sparkConf.set("spark.core.connection.ack.wait.timeout","600"); sparkConf.set("spark.cores.max", "4"); |
5. 输出文件命名规则