在mapreduce聚合key中所有values的时候,如果一个key对应了很多values,就会产生数据倾斜的问题。这里介绍了一个处理数据倾斜的小技巧,以两个mapper为例:
一个mapper中添加:
//id分裂
Random random = new Random();
int num = random.nextInt(StaticCommonInfo.ID_SPLIT_NUM);
if (!StringUtil.isBlank(deleted) && "0".equals(deleted)
&& !StringUtil.isBlank(picPath) && "0".equals(type)) {
arrayPicPath = picPath.split("/");
SortKeyWritable sortKeyWritable = new SortKeyWritable();
sortKeyWritable.key = dirId + new Integer(num).toString();
sortKeyWritable.value = StaticCommonInfo.IMG_INFO_SIGN;
output.collect(sortKeyWritable,
new Text(arrayPicPath[arrayPicPath.length - 1] + StaticCommonInfo.IMG_INFO_SIGN));
}
另一个mapper中添加:
//id分裂
int[]splitArray = new int[StaticCommonInfo.ID_SPLIT_NUM];
for(int i = 0;i < splitArray.length; i++){
sortKeyWritable.key = id + new Integer(i).toString();
if(ImgSpaceUtil.isSemanticKey(hashModelTable, name)){
sortKeyWritable.value = StaticCommonInfo.MODEL_TABLE;
output.collect(sortKeyWritable, new Text(StaticCommonInfo.MODEL_TABLE));
}else if(ImgSpaceUtil.isSemanticKey(hashDetailTable, name)){
sortKeyWritable.value = StaticCommonInfo.DETAIL_TABLE;
output.collect(sortKeyWritable, new Text(StaticCommonInfo.DETAIL_TABLE));
}
}