在mapreduce中怎样解决数据倾斜

最新推荐文章于 2024-05-14 22:51:43 发布

gdp5211314

最新推荐文章于 2024-05-14 22:51:43 发布

阅读量2.8k

点赞数

分类专栏： hadoop相关文章标签： mapreduce random integer table

本文链接：https://blog.csdn.net/gdp5211314/article/details/7408489

版权

hadoop相关专栏收录该内容

25 篇文章 0 订阅

订阅专栏

在mapreduce聚合key中所有values的时候，如果一个key对应了很多values，就会产生数据倾斜的问题。这里介绍了一个处理数据倾斜的小技巧，以两个mapper为例：

一个mapper中添加：

//id分裂
Random random = new Random();
int num = random.nextInt(StaticCommonInfo.ID_SPLIT_NUM);
if (!StringUtil.isBlank(deleted) && "0".equals(deleted)
&& !StringUtil.isBlank(picPath) && "0".equals(type)) {
arrayPicPath = picPath.split("/");
SortKeyWritable sortKeyWritable = new SortKeyWritable();
sortKeyWritable.key = dirId + new Integer(num).toString();
sortKeyWritable.value = StaticCommonInfo.IMG_INFO_SIGN;
output.collect(sortKeyWritable,
new Text(arrayPicPath[arrayPicPath.length - 1] + StaticCommonInfo.IMG_INFO_SIGN));
}

另一个mapper中添加：

//id分裂
int[]splitArray = new int[StaticCommonInfo.ID_SPLIT_NUM];
for(int i = 0;i < splitArray.length; i++){
sortKeyWritable.key = id + new Integer(i).toString();
if(ImgSpaceUtil.isSemanticKey(hashModelTable, name)){
sortKeyWritable.value = StaticCommonInfo.MODEL_TABLE;
output.collect(sortKeyWritable, new Text(StaticCommonInfo.MODEL_TABLE));
}else if(ImgSpaceUtil.isSemanticKey(hashDetailTable, name)){
sortKeyWritable.value = StaticCommonInfo.DETAIL_TABLE;
output.collect(sortKeyWritable, new Text(StaticCommonInfo.DETAIL_TABLE));
}
}