MapReduce工作流程(一)
MapReduce工作流程(二)
流程详解
上面的流程是整个MapReduce最全工作流程,但是Shuffle过程只是从第7步开始到第16步结束,具体Shuffle过程详解,如下:
1)MapTask收集我们的map()方法输出的kv对,放到内存缓冲区中
2)从内存缓冲区不断溢出本地磁盘文件,可能会溢出多个文件
3)多个溢出文件会被合并成大的溢出文件
4)在溢出过程及合并的过程中,都要调用Partitioner进行分区和针对key进行排序
5)ReduceTask根据自己的分区号,去各个MapTask机器上取相应的结果分区数据
6)ReduceTask会取到同一个分区的来自不同MapTask的结果文件,ReduceTask会将这些文件再进行合并(归并排序)
7)合并成大文件后,Shuffle的过程也就结束了,后面进入ReduceTask的逻辑运算过程(从文件中取出一个一个的键值对Group,调用用户自定义的reduce()方法)
注意:
Shuffle中的缓冲区大小会影响到MapReduce程序的执行效率,原则上说,缓冲区越大,磁盘io的次数越少,执行速度就越快。
缓冲区的大小可以通过参数调整,参数:io.sort.mb默认100M。
源码解析流程
Shuffle机制
Map方法之后,Reduce方法之前的数据处理过程称之为Shuffle。
Partition分区
1、问题引出
2、默认Partitioner分区
代码操作:
参照:MapReduce并行度决定机制的代码
job.setNumReduceTasks(10);
job.setNumReduceTasks(5);
3、自定义Partitioner步骤
4、分区总结
5、案例分析
Partition分区案例实操
1.需求
将统计结果按照手机归属地不同省份输出到不同文件中(分区)
(1)输入数据
(2)期望输出数据
手机号136、137、138、139开头都分别放到一个独立的4个文件中,其他开头的放到一个文件中。
代码实操
pom文件依赖:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.study</groupId>
<artifactId>mapreduce1205</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
</dependencies>
</project>
创建MyPartitioner类,
MyPartitioner.java
package com.study.partition;
import com.study.flow.FlowBean;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class MyPartitioner extends Partitioner<Text, FlowBean> {
public int getPartition(Text text, FlowBean flowBean, int numPartitions) {
String phone = text.toString();
switch (phone.substring(0, 3)) {
case "136":
return 0;
case "137":
return 1;
case "138":
return 2;
default:
return 4;
}
}
}
PartitionerDriver.java
package com.study.partition;
import com.study.flow.FlowBean;
import com.study.flow.FlowMapper;
import com.study.flow.FlowReducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class PartitionerDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//1.获取Job实例
Job job = Job.getInstance(new Configuration());
//2.设置类路径
job.setJarByClass(PartitionerDriver.class);
//3.设置Mapper和Reducer
job.setMapperClass(FlowMapper.class);
job.setReducerClass(FlowReducer.class);
job.setNumReduceTasks(5);
job.setPartitionerClass(MyPartitioner.class);
//4. 设置输入输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FlowBean.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FlowBean.class);
//5. 设置输入输出路径
FileInputFormat.setInputPaths(job, new Path("d:\\input"));
FileOutputFormat.setOutputPath(job, new Path("d:\\output"));
//6. 提交
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 : 1);
}
}
运行,控制台打印日志信息:
2019-11-23 14:34:09,044 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
File System Counters
FILE: Number of bytes read=22800
FILE: Number of bytes written=1656440
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=22
Map output records=22
Map output bytes=789
Map output materialized bytes=863
Input split bytes=94
Combine input records=0
Combine output records=0
Reduce input groups=21
Reduce shuffle bytes=863
Reduce input records=22
Reduce output records=21
Spilled Records=44
Shuffled Maps =5
Failed Shuffles=0
Merged Map outputs=5
GC time elapsed (ms)=0
Total committed heap usage (bytes)=1406140416
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1178
File Output Format Counters
Bytes Written=431
Process finished with exit code 0
在d盘生成输出的output文件
总结:
注意:
设置分区数据为:5
设置输出的分区数是:8,超过分区数
所以日志打印会报错
WritableComparable排序
1、排序的概述
1.需求
根据案例2.3产生的结果再次对总流量进行排序。
(1)输入数据
原始数据
第一次处理后的数据
(2)期望输出数据
13509468723 7335 110349 117684
13736230513 2481 24681 27162
13956435636 132 1512 1644
13846544121 264 0 264
代码实操:
pom.xml文件引入
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.study</groupId>
<artifactId>mapreduce1205</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
</dependencies>
</project>
FlowBean.java
package com.study.writablecomparable;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class FlowBean implements Writable, WritableComparable<FlowBean> {
private long upFlow;
private long downFlow;
private long sumFlow;
public FlowBean() {}
@Override
public String toString() {
return upFlow+"\t"+downFlow+"\t"+sumFlow;
}
public void set(long upFlow, long downFlow){
this.upFlow=upFlow;
this.downFlow=downFlow;
this.sumFlow=upFlow+downFlow;
}
public long getUpFlow() {
return upFlow;
}
public void setUpFlow(long upFlow) {
this.upFlow = upFlow;
}
public long getDownFlow() {
return downFlow;
}
public void setDownFlow(long downFlow) {
this.downFlow = downFlow;
}
public long getSumFlow() {
return sumFlow;
}
public void setSumFlow(long sumFlow) {
this.sumFlow = sumFlow;
}
/**
* 序列化方法
* @param out 框架给我们提供的数据接口
* @throws IOException
*/
public void write(DataOutput out) throws IOException {
out.writeLong(upFlow);
out.writeLong(downFlow);
out.writeLong(sumFlow);
}
/**
* 反序列化
* @param in 框架提供的数据源
* @throws IOException
*/
public void readFields(DataInput in) throws IOException {
upFlow=in.readLong();
downFlow=in.readLong();
sumFlow=in.readLong();
}
@Override
public int compareTo(FlowBean o) {
return Long.compare(o.sumFlow, this.sumFlow);
}
}
SortReducer.java
package com.study.writablecomparable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class SortReducer extends Reducer<FlowBean, Text, Text, com.study.writablecomparable.FlowBean> {
@Override
protected void reduce(FlowBean key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
for (Text value : values) {
context.write(value, key);
}
}
}
SortMapper.java
package com.study.writablecomparable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class SortMapper extends Mapper<LongWritable, Text, FlowBean, Text> {
private FlowBean flow = new FlowBean();
private Text phone = new Text();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] fields = value.toString().split("\t");
phone.set(fields[0]);
flow.setUpFlow(Long.parseLong(fields[1]));
flow.setDownFlow(Long.parseLong(fields[2]));
flow.setSumFlow(Long.parseLong(fields[3]));
context.write(flow, phone);
}
}
SortDriver.java
package com.study.writablecomparable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class SortDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException, IOException {
Job job = Job.getInstance(new Configuration());
job.setJarByClass(SortDriver.class);
job.setMapperClass(SortMapper.class);
job.setReducerClass(SortReducer.class);
job.setMapOutputKeyClass(FlowBean.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FlowBean.class);
FileInputFormat.setInputPaths(job, new Path("d:\\output"));
FileOutputFormat.setOutputPath(job, new Path("d:\\output2"));
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 : 1);
}
}
运行控制台日志打印
2019-11-23 16:45:46,415 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
File System Counters
FILE: Number of bytes read=14248
FILE: Number of bytes written=1655326
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=21
Map output records=21
Map output bytes=753
Map output materialized bytes=825
Input split bytes=465
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=825
Reduce input records=21
Reduce output records=21
Spilled Records=42
Shuffled Maps =5
Failed Shuffles=0
Merged Map outputs=5
GC time elapsed (ms)=0
Total committed heap usage (bytes)=2881486848
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=423
File Output Format Counters
Bytes Written=387
Process finished with exit code 0
我们可以在d盘查看,
从output文件生成output2文件
WritableComparable排序案例实操(区内排序)
1.需求
要求每个省份手机号输出的文件中按照总流量内部排序。
2.需求分析
基于前一个需求,增加自定义分区类,分区按照省份手机号设置。
代码实操:
MyPartitioner2.java
package com.study.writablecomparable2;
import com.study.writablecomparable.FlowBean;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class MyPartitioner2 extends Partitioner<FlowBean, Text> {
@Override
public int getPartition(FlowBean flowBean, Text text, int numPartitions) {
switch (text.toString().substring(0, 3)) {
case "136":
return 0;
case "137":
return 1;
case "138":
return 2;
case "139":
return 3;
default:
return 4;
}
}
}
SortDriver.java
package com.study.writablecomparable2;
import com.study.writablecomparable.FlowBean;
import com.study.writablecomparable.SortMapper;
import com.study.writablecomparable.SortReducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class SortDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException, IOException {
Job job = Job.getInstance(new Configuration());
job.setJarByClass(SortDriver.class);
job.setMapperClass(SortMapper.class);
job.setReducerClass(SortReducer.class);
job.setMapOutputKeyClass(FlowBean.class);
job.setMapOutputValueClass(Text.class);
job.setPartitionerClass(MyPartitioner2.class);
job.setNumReduceTasks(5);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FlowBean.class);
FileInputFormat.setInputPaths(job, new Path("d:\\output"));
FileOutputFormat.setOutputPath(job, new Path("d:\\output2"));
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 : 1);
}
}
注意:启动之前,需要把d盘的output2文件删除,再启动程序
运行程序后,控制台日志打印:
2019-11-23 17:06:28,627 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
File System Counters
FILE: Number of bytes read=47274
FILE: Number of bytes written=2768858
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=21
Map output records=21
Map output bytes=753
Map output materialized bytes=945
Input split bytes=465
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=945
Reduce input records=21
Reduce output records=21
Spilled Records=42
Shuffled Maps =25
Failed Shuffles=0
Merged Map outputs=25
GC time elapsed (ms)=8
Total committed heap usage (bytes)=5505024000
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=423
File Output Format Counters
Bytes Written=435
Process finished with exit code 0
我们可以在d盘查看,
从output文件生成output2文件
分区,开头都是一个数据(一致性)
Combiner合并
自定义Combiner实现步骤
自定义一个Combiner继承Reducer,重写Reduce方法
GroupingComparator分组案例实操
需求
有如下订单数据
现在需要求出每一个订单中最贵的商品。
(1)输入数据
(2)期望输出数据
代码操作
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.study</groupId>
<artifactId>mapreduce1205</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
</dependencies>
</project>
OrderBean.java
package com.study.groupingcomparator;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class OrderBean implements WritableComparable<OrderBean> {
private String orderId;
private String productId;
private double price;
public String getOrderId() {
return orderId;
}
public void setOrderId(String orderId) {
this.orderId = orderId;
}
public String getProductId() {
return productId;
}
public void setProductId(String productId) {
this.productId = productId;
}
public double getPrice() {
return price;
}
public void setPrice(double price) {
this.price = price;
}
@Override
public String toString() {
return orderId + "\t" + productId + "\t" + price;
}
@Override
public int compareTo(OrderBean o) {
int compare = this.orderId.compareTo(o.orderId);
if (compare == 0) {
return Double.compare(o.price, this.price);
} else {
return compare;
}
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(orderId);
out.writeUTF(productId);
out.writeDouble(price);
}
@Override
public void readFields(DataInput in) throws IOException {
this.orderId = in.readUTF();
this.productId = in.readUTF();
this.price = in.readDouble();
}
}
OrderMapper.java
package com.study.groupingcomparator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class OrderMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable> {
private OrderBean orderBean = new OrderBean();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] fields = value.toString().split("\t");
orderBean.setOrderId(fields[0]);
orderBean.setProductId(fields[1]);
orderBean.setPrice(Double.parseDouble(fields[2]));
context.write(orderBean, NullWritable.get());
}
}
OrderComparator.java
package com.study.groupingcomparator;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class OrderComparator extends WritableComparator {
@Override
public int compare(WritableComparable a, WritableComparable b) {
OrderBean oa = (OrderBean) a;
OrderBean ob = (OrderBean) b;
return oa.getOrderId().compareTo(ob.getOrderId());
}
}
OrderReducer.java
package com.study.groupingcomparator;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class OrderReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {
@Override
protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
Iterator<NullWritable> iterator = values.iterator();
for (int i = 0; i < 2; i++) {
if (iterator.hasNext()) {
context.write(key, iterator.next());
}
}
}
}
OrderDriver.java
package com.study.groupingcomparator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class OrderDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Job job = Job.getInstance(new Configuration());
job.setJarByClass(OrderDriver.class);
job.setMapperClass(OrderMapper.class);
job.setReducerClass(OrderReducer.class);
job.setMapOutputKeyClass(OrderBean.class);
job.setMapOutputValueClass(NullWritable.class);
job.setGroupingComparatorClass(OrderComparator.class);
job.setOutputKeyClass(OrderBean.class);
job.setOutputValueClass(NullWritable.class);
FileInputFormat.setInputPaths(job, new Path("d:\\input"));
FileOutputFormat.setOutputPath(job, new Path("d:\\output"));
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 : 1);
}
}
代码编写完毕之后,在启动程序之前,先在d盘input文件目录下添加数据
启动程序,若d盘有output文件需要删除
删除后,再启动程序,控制台日志打印信息
上面发现日志错误
解决问题:解决java.lang.Exception: java.lang.NullPointerException
启动程序,控制台日志打印
2019-11-23 18:35:31,522 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
File System Counters
FILE: Number of bytes read=1038
FILE: Number of bytes written=547529
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=7
Map output records=7
Map output bytes=175
Map output materialized bytes=195
Input split bytes=103
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=195
Reduce input records=7
Reduce output records=6
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=468713472
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=150
File Output Format Counters
Bytes Written=136
Process finished with exit code 0
我们可以在d盘查看,从输入input文件,生成output文件
按序号排序
增加一个订单
第一种方式:
OrderReducer.java
package com.study.groupingcomparator;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class OrderReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {
int i=0;
@Override
protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
/** Iterator<NullWritable> iterator = values.iterator();
for (int i = 0; i < 2; i++) {
if (iterator.hasNext()) {
context.write(key, iterator.next());
}
}*/
i++;
for (NullWritable value:values){
context.write(key,value);
}
}
}
第二种方式:
OrderReducer.java
package com.study.groupingcomparator;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class OrderReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {
int i=0;
@Override
protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
/** Iterator<NullWritable> iterator = values.iterator();
for (int i = 0; i < 2; i++) {
if (iterator.hasNext()) {
context.write(key, iterator.next());
}
}*/
i++;
Iterator<NullWritable>iterator=values.iterator();
while (iterator.hasNext()){
NullWritable value=iterator.next();
context.write(key,value);
}
/*for (NullWritable value:values){
context.write(key,value);
}*/
}
}
OrderReducer.java
序号不重复排序(就近原则)
package com.study.groupingcomparator;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class OrderReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {
@Override
protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}