hadoop单元测试方法--使用和增强MRUnit

最新推荐文章于 2024-03-15 16:17:15 发布

XifengHZ

最新推荐文章于 2024-03-15 16:17:15 发布

阅读量1.5k

点赞数

分类专栏： Hadoop & Cloud Computing

Hadoop & Cloud Computing 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

1前言

hadoop的mapreduce提交到集群环境中出问题的定位是比较麻烦的，有时需要一遍遍的修改代码和打出日志来排查一个很小的问题，如果数据量大的话调试起来相当耗时间。因此有必要使用良好的单元测试手段来尽早的消除明显的bug（当然仅有单元测试是不够的，毕竟跟集群的运行环境还是不一样的）。

然而做mapreduce的单元测试会有一些障碍，比如Map和Reduce一些参数对象是在运行时由hadoop框架传入的，例如OutputCollector、Reporter、InputSplit等。这就需要有Mock手段。最初写mapreduce单元测试的时候自己写了几个简单的Mock也基本能满足需要，后来发现MRUnit比我写的要好用所以研究了一下就采用了。MRUnit是专门为hadoop mapreduce写的单元测试框架，API简洁明了，简单实用。但也有一些薄弱的地方，比如不支持MultipleOutputs（很多情况下我们会用MultipleOutputs作为多文件输出，后面将介绍如何增强MRUnit使之支持MultipleOutputs）。

2 MRUnit

MRUnit针对不同测试对象分别使用以下几种Driver：

l MapDriver，针对单独的Map测试。

l ReduceDriver，针对单独的Reduce测试。

l MapReduceDriver，将Map和Reduce连贯起来测试。

l PipelineMapReduceDriver，将多个Map-Reduce pair贯串测试。

MapDriver

单独测试Map的例子，假设我们要计算一个卖家的平均发货速度。Map将搜集每一次发货的时间间隔。针对Map的测试，

//这是被测试的Map

private Map mapper;

private MapDriver<LongWritable, Text, Text, TimeInfo> mapDriver;

@Before

public void setUp() {

mapper = new Map();

mapDriver = new MapDriver<LongWritable, Text, Text, TimeInfo>();

}

@Test

public void testMap_timeFormat2() {

String sellerId = "444";

//模拟输入一行（withInput），假设从这行数据中我们可以获得卖家(sellerId) //某一次时间间隔为10小时.

//我们期望它输出sellerId为key，value为代表1次10小时的TimeInfo对象。 //（withOutput）

//如果输入数据经过Map计算后为期望的结果，那么测试通过。

Text mapInputValue = new Text("……");

mapDriver.withMapper(mapper)

.withInput(null, mapInputValue)

.withOutput(new Text(sellerId), new TimeInfo(1, 10))

.runTest();

}

ReduceDriver

针对Reduce的单独测试，还是这个例子。Reduce为根据Map或Combiner输出的n次时间间隔的总和来计算平均时间。

private Reduce reducer;

@Before

public void setUp() {

reducer = new Reduce();

reduceDriver = new ReduceDriver<Text, TimeInfo, Text, LongWritable>(reducer);

}

@Test

public void testReduce () {

List<TimeInfo> values = new ArrayList<TimeInfo>();

values.add(new TimeInfo(1, 3));//一次3小时

values.add(new TimeInfo(2, 5));//两次总共5小时

values.add(new TimeInfo(3, 7));//三次总共7小时

//values作为444这个卖家的reduce输入，

//期望计算出平均为2小时

reduceDriver.withReducer(reducer)

.withInput(new Text("444"), values)

.withOutput(new Text("444"),new LongWritable(2))

.runTest();

}

MapReduceDriver

以下为Map和Reduce联合测试的例子，

private MapReduceDriver<LongWritable, Text, Text, TimeInfo, Text, LongWritable> mrDriver;

private Map mapper;

private Reduce reducer;

@Before

public void setUp() {

mapper = new Map();

reducer = new Reduce();

mrDriver = new MapReduceDriver<LongWritable, Text, Text, TimeInfo, Text, LongWritable>(mapper,reducer);

}

@Test

public void testMapReduce_3record_1user() {

Text mapInputValue1 = new Text("……");

Text mapInputValue2 = new Text("……");

Text mapInputValue3 = new Text("……");

//我们期望从以上三条Map输入计算后，

//从reduce输出得到444这个卖家的平均时间为2小时.

mrDriver.withInput(null, mapInputValue1)

.withInput(null, mapInputValue2)

.withInput(null, mapInputValue3)

.withOutput(new Text("444"),new LongWritable(2))

.runTest();

}

3 增强MRUnit

下面介绍为MRUnit框架增加了支持MultipleOutputs、从文件加载数据集和自动装配等几个特性，使它更加便于使用。

如何支持MultipleOutputs

然而很多场景下我们需要使用MultipleOutputs作为reduce的多文件输出，MRUnit缺少支持。分析源码后为MRUnit增强扩展了两个Driver：ReduceMultipleOutputsDriver和MapReduceMultipleOutputDriver来支持MultipleOutputs。

ReduceMultipleOutputsDriver

ReduceMultipleOutputsDriver是ReduceDriver的增强版本，假设前面例子中的Reduce使用了MultipleOutputs作为输出，那么Reduce的测试将出现错误。

使用ReduceMultipleOutputsDriver改造上面的测试用例(注意粗体部分),

private Reduce reducer;

@Before

public void setUp() {

reducer = new Reduce();

//注意这里ReduceDriver改为使用ReduceMultipleOutputsDriver

reduceDriver = new ReduceMultipleOutputsDriver<Text, TimeInfo, Text, LongWritable>(reducer);

}

@Test

public void testReduce () {

List<TimeInfo> values = new ArrayList<TimeInfo>();

values.add(new TimeInfo(1, 3));//一次3小时

values.add(new TimeInfo(2, 5));//两次总共5小时

values.add(new TimeInfo(3, 7));//三次总共7小时

//values作为444这个卖家的reduce输入，

//期望计算出平均为2小时

reduceDriver.withReducer(reducer)

.withInput(new Text("444"), values)

//Note

//假设使用id(444)%8的方式来分文件

//表示期望"somePrefix"+444%8这个collector将搜集到数据xxx

. withMutiOutput ("somePrefix"+444%8,new Text("444"),new LongWritable(2))

.runTest();

}

MapReduceMultipleOutputDriver

跟ReduceMultipleOutputsDriver类似，MapReduceMultipleOutputDriver用来支持使用了MultipleOutputs的Map-Reduce联合测试。MapReduceDriver一节中的例子将改为，

private MapReduceDriver<LongWritable, Text, Text, TimeInfo, Text, LongWritable> mrDriver;

private Map mapper;

private Reduce reducer;

@Before

public void setUp() {

mapper = new Map();

reducer = new Reduce();

//改为使用ReduceMultipleOutputsDriver

mrDriver = new ReduceMultipleOutputsDriver<LongWritable, Text, Text, TimeInfo, Text, LongWritable>(mapper, reducer);

}

@Test

public void testMapReduce_3record_1user() {

Text mapInputValue1 = new Text("……");

Text mapInputValue2 = new Text("……");

Text mapInputValue3 = new Text("……");

//我们期望从以上三条Map输入计算后，

//从reduce输出得到444这个卖家的平均时间为2小时.

mrDriver.withInput(null, mapInputValue1)

.withInput(null, mapInputValue2)

.withInput(null, mapInputValue3)

//表示期望"somePrefix"+444%8这个collector将搜集到数据xxx

. withMutiOutput ("somePrefix"+444%8,new Text("444"),new LongWritable(2))

.runTest();

}

如何从文件加载输入

从以上例子看到使用MRUnit需要重复写很多类似的代码，并且需要把输入数据写在代码中，显得不是很优雅，如果能从文件加载数据则会方便很多。因此通过使用annotation和扩展JUnit runner，增强了MRUnit来解决这个问题。

改造上面的例子，使得map的输入自动从文件加载，并且消除大量使用MRUnit框架API的代码。

@RunWith(MRUnitJunit4TestClassRunner.class)

public class XXXMRUseAnnotationTest {

//表示自动初始化mrDriver,并加载数据(如果需要)

@MapInputSet

@MapReduce(mapper = Map.class, reducer = Reduce.class)

private MapReduceDriver<LongWritable, Text, Text, TimeInfo, Text, LongWritable> mrDriver;

@Test

@MapInputSet("ConsignTimeMRUseAnnotationTest.txt")//从这里加载输入数据

public void testMapReduce_3record_1user() {

//只需要编写验证代码

mrDriver. withMutiOutput ("somePrefix"+444%8,new Text("444"),new LongWritable(2))

.runTest();

}

实例代码：

Mapper

    Java代码  
     
    
  
 import org.apache.hadoop.io.IntWritable;   
 import org.apache.hadoop.io.LongWritable;   
 import org.apache.hadoop.io.Text;   
 import org.apache.hadoop.mapreduce.Mapper;   
   
 public class SMSCDRMapper extends Mapper<LongWritable, Text, Text, IntWritable>   
 {   
   
     private Text status = new Text();   
   
     private final static IntWritable addOne = new IntWritable(1);   
   
     @Override  
     protected void map(LongWritable key, Text value, Context context) throws java.io.IOException, InterruptedException   
     {   
   
         //655209;1;796764372490213;804422938115889;6 is the Sample record format   
         String[] line = value.toString().split(";");   
         // If record is of SMS CDR   
         if (Integer.parseInt(line[1]) == 1)   
         {   
             status.set(line[4]);   
             context.write(status, addOne);   
         }   
     }   
   

Reducer:

    Java代码  
     
    
  
 import java.util.List;   
   
 import org.apache.hadoop.io.IntWritable;   
 import org.apache.hadoop.io.LongWritable;   
 import org.apache.hadoop.io.Text;   
 import org.apache.hadoop.mrunit.mapreduce.MapDriver;   
 import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;   
 import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;   
 import org.junit.Before;   
 import org.junit.Test;   
   
 import flex.messaging.io.ArrayList;   
   
 public class SMSCDRMapperReducerTest   
 {   
     MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;   
     ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;   
     MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;   
     @Before  
     public void setup()   
     {   
         SMSCDRMapper mapper = new SMSCDRMapper();   
         SMSCDRReducer reducer = new SMSCDRReducer();   
         mapDriver = MapDriver.newMapDriver(mapper);   
            
         reduceDriver = ReduceDriver.newReduceDriver(reducer);   
         mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);   
     }   
        
     @Test  
     public void testMapper()   
     {   
         mapDriver.withInput(new LongWritable(), new Text(   
         "655209;1;796764372490213;804422938115889;6"));   
         mapDriver.withOutput(new Text("6"), new IntWritable(1));   
         mapDriver.runTest();   
     }   
        
     @Test  
     public void testReducer()   
     {   
         List<IntWritable> values = new ArrayList();   
         values.add(new IntWritable(1));   
         values.add(new IntWritable(1));   
         reduceDriver.withInput(new Text("6"), values);   
         reduceDriver.withOutput(new Text("6"), new IntWritable(2));   
         reduceDriver.runTest();   
     }   
 }  

Test测试类

    Java代码  
     
    
  
 import java.util.List;   
   
 import org.apache.hadoop.io.IntWritable;   
 import org.apache.hadoop.io.LongWritable;   
 import org.apache.hadoop.io.Text;   
 import org.apache.hadoop.mrunit.mapreduce.MapDriver;   
 import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;   
 import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;   
 import org.junit.Before;   
 import org.junit.Test;   
   
 import flex.messaging.io.ArrayList;   
   
 public class SMSCDRMapperReducerTest   
 {   
     MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;   
     ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;   
     MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;   
     @Before  
     public void setup()   
     {   
         SMSCDRMapper mapper = new SMSCDRMapper();   
         SMSCDRReducer reducer = new SMSCDRReducer();   
         mapDriver = MapDriver.newMapDriver(mapper);   
            
         reduceDriver = ReduceDriver.newReduceDriver(reducer);   
         mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);   
     }   
        
     @Test  
     public void testMapper()   
     {   
         mapDriver.withInput(new LongWritable(), new Text(   
         "655209;1;796764372490213;804422938115889;6"));   
         mapDriver.withOutput(new Text("6"), new IntWritable(1));   
         mapDriver.runTest();   
     }   
        
     @Test  
     public void testReducer()   
     {   
         List<IntWritable> values = new ArrayList();   
         values.add(new IntWritable(1));   
         values.add(new IntWritable(1));   
         reduceDriver.withInput(new Text("6"), values);   
         reduceDriver.withOutput(new Text("6"), new IntWritable(2));   
         reduceDriver.runTest();   
     }   
 }  

之前的总结归纳一下：

MRUnit简介：

当hadoop的MapReduce作业提交到集群环境中运行，对于出问题的定位比较是比较麻烦的，有时需要一遍遍的修改代码和打印出日志来排查一个很小的问题，如果数据量大的话调试起来相当耗时间。因此有必要使用良好的单元测试手段来尽早的消除明显的bug。然而做MapReduce的单元测试会有一个障碍，比如Map和Reduce一些参数对象是在运行时由hadoop框架传入的，例如OutputCollector、Reporter、InputSplit等。这就需要有其他手段去完成。MRUnit是专门为Hadoop MapReduce写的单元测试框架，API简单明了，简单实用。但也有一些薄弱的地方，比如不支持MultibleOutputs（很多情况下我们会用MultipleOutputs作为多文件输出，后面将介绍如何加强MRUnit使之支持MultipleOutputs）。

MRUnit安装：

对于在已有Hadoop工程项目中使用MUnit需要遵循如下步骤：

（1）首先下载MRUnit，网址为http://mrunit.apache.org/，下载最新的MRUnit。本人使用的hadoop版本为hadoop 1.0.4 下载的文件为 apache-mrunit-1.0.0-hadoop1-bin.tar.gz

（2）解压缩下载的文件，得到hamcrest-core-1.1.jar junit-4.10.jar mockito-all-1.8.5.jar mrunit-1.0.0-hadoop1.jar

（3）将这四个文件加入到项目的Path中。在eclipse中，选中项目-->右键build path-->configure build path-->add external jars。

MRUnit实例：

我们知道，在进行一般性的JUnit测试时，根据不同的测试对象要采用不同的测试模块来进行，MRUnit针对不同测试对象分别使用一下几种Driver：

MapDriver ，针对单独的Map测试

ReduceDriver，针对单独的Reduce测试。

MapReduceDriver ，将Map和Reduce连贯起来测试。

PipelineMapReduceDriver，将多个Map-Reduce pair贯穿测试。

下面我们首先来看使用MRUnit对自定义的Mapper进行测试的方法。

下面使用经典入门程序worldcount举例，体验下MRUnit的效果。

Map程序：

package com.hadoop;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class TxtMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {
  String []strs=value.toString().split(" ");
  for(String str:strs){
  context.write(new Text(str), new IntWritable(1));
  }
};
}

Reduce程序：

package com.hadoop;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TxtReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException ,InterruptedException {
  int sum=0;
  Iterator<IntWritable>it=values.iterator();
  while(it.hasNext()){
  IntWritable value=it.next();
  sum+=value.get();
  }
  context.write(key, new IntWritable(sum));
};
}

测试程序：

package com.hadoop;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Before;
import org.junit.Test;

public class MapTest{
private Mapper Map;
private MapDriver driver;
@Before
public void init(){
Map=new TxtMapper();
driver=new MapDriver(Map);
}
@SuppressWarnings("unchecked")
@Test
public void testMap()throws Exception{
String text="hello world goodbye world hello hadoop goodbye hadoop";
driver.withInput(new LongWritable(), new Text(text))
.withOutput(new Text("hello"),new IntWritable(1))
.withOutput(new Text("world"),new IntWritable(1))
.withOutput(new Text("goodbye"),new IntWritable(1))
.withOutput(new Text("world"),new IntWritable(1))
.withOutput(new Text("hello"),new IntWritable(1))
.withOutput(new Text("hadoop"),new IntWritable(1))
.withOutput(new Text("goodbye"),new IntWritable(1))
.withOutput(new Text("hadoop"),new IntWritable(2)).runTest();
}

}

选中方法 run as junit test，结果进度条为绿色，证明junit测试正确。

如果将.最后一行写为 withOutput(new Text("hadoop"),new IntWritable(2)).runTest()，则出现下面的错误结果：

13/09/26 15:58:16 ERROR mrunit.TestDriver: Received unexpected output (hadoop, 1) at position 7.
13/09/26 15:58:16 ERROR mrunit.TestDriver: Missing expected output (hadoop, 2) at position 7.

可见MRUnit已经生效。

XifengHZ

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
hadoop单元测试方法--使用和增强MRUnit

1前言 hadoop的mapreduce提交到集群环境中出问题的定位是比较麻烦的，有时需要一遍遍的修改代码和打出日志来排查一个很小的问题，如果数据量大的话调试起来相当耗时间。因此有必要使用良好的单元测试手段来尽早的消除明显的bug（当然仅有单元测试是不够的，毕竟跟集群的运行环境还是不一样的）。然而做mapreduce的单元测试会有一些障碍，比如Map和Redu
复制链接

扫一扫