MapReduce排序分组

最新推荐文章于 2024-04-02 17:38:23 发布

Troy1214

最新推荐文章于 2024-04-02 17:38:23 发布

阅读量916

点赞数

一、什么是inputSplit

InputSplit是指分片，在MapReduce当中作业中，作为map task最小输入单位。分片是基于文件基础上出来的而来的概念，通俗的理解一个文件可以切分为多少个片段，每个片段包括了<文件名，开始位置，长度，位于哪些主机>等信息。在MapTask拿到这些分片后，会知道从哪开始读取数据。

二、Combiner Partitioner Shuffle

Combiner的作用就是对map端的输出先做一次合并，以减少在map和reduce节点之间的数据传输量，以提高网络IO性能

Partitioner:规定哪个key到哪个Reducer的分配过程。分区Partitioner主要作用在于以下两点：

1.根据业务需要，产生多个输出文件

2.多个reduce任务并发运行，提高整体job的运行效率

shuffle：针对多个map任务的输出按照不同的分区（Partition）通过网络复制到不同的reduce任务节点上，这个过程就称作为Shuffle。

三、MapReduce排序分组

private static class A implements WritableComparable<MyNewKey> {
long firstNum;
long secondNum;
?
public MyNewKey() {
}
?
public MyNewKey(long first, long second) {
firstNum = first;
secondNum = second;
}
?
@Override
public void write(DataOutput out) throws IOException {
out.writeLong(firstNum);
out.writeLong(secondNum);
}
?
@Override
public void readFields(DataInput in) throws IOException {
firstNum = in.readLong();
secondNum = in.readLong();
}
?
/*
* 当key进行排序时会调用以下这个compreTo方法
*/
@Override
public int compareTo(MyNewKey anotherKey) {
long min = firstNum - anotherKey.firstNum;
if (min != 0) {
// 说明第一列不相等，则返回两数之间小的数
return (int) min;
} else {
return (int) (secondNum - anotherKey.secondNum);
}
}
}

A implements WritableComparable{

}

IntWritable x-->key int
A---->key

map(k1,v1) // 1,5-->A a1=A(1,5);
k2=A
v2=NullWritable

reduce(k2,v2)
k2->split(1,5)->k3=1 v3=5-->context

Top-key

1
2
3
4

5

class mapper{ //mapper start
1.setup()-------init 1
int[5] tops;

2.map(k1,v1){ // split m-->map m
String val=v1.tostring();
Arrays.sort(tops);
if(tops[0]<val)
tops[0]=val

for(tops)
}

3.cleanup()------------destory 1
{
for(i:tops)
context.write(i,tops[i]); //output
}

} //mapper end

output
1 5
2 4
3 3

Troy1214

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
MapReduce排序分组

一、什么是inputSplit InputSplit是指分片，在MapReduce当中作业中，作为map task最小输入单位。分片是基于文件基础上出来的而来的概念，通俗的理解一个文件可以切分为多少个片段，每个片段包括了等信息。在MapTask拿到这些分片后，会知道从哪开始读取数据。二、Combiner Partitioner Shuffle Combiner的
复制链接

扫一扫