1.干预切片计算逻辑CombineTextInputFormat 2.实现partition策略防止数据据倾斜,实现reduce task负载均衡 3.适当配置Yarmchild的内存参数,需要查阅Yarn的参数配置手册,vcores cpu内存参数 4.适当调整益写参数的大小 5.适当调整合并文件的并行度 <property> <name>mapreduce.task.io.sort.factor</name> <value>10</value> <description>The number of streams to merge at once while sorting files. This determines the number of open file handles.</description> </property> 6.对Map端输出溢写文件使用gizp压缩,节省网络带宽 <property> <name>mapreduce.map.output.compress</name> <value>false</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>