写不写测试是个人选择问题,对于我自己而言,写测试不是为了有X格,而是为了对代码更有信心。
MapReduce的测试确实没有那么方便,但是还是有办法的。下面的内容主要加工自MRUnit Tutorial,Tutorial中另外还介绍了Counter的测试(也就是如何获取Counter)和Configuration传参数(如何在Mock中获取conf对象)。
1. 基本功 - JUnit
如果不会这个我还能说什么呢,还好很少有人不会。
坐标junit:junit
import org.junit.*;
public class TestCases{
@Test
public void testXXX(){
assertEquals(1 == 1);
}
}
这一部分是代码功能性测试的基础,一般来说是与环境不太相关的都可以用JUnit来做函数级的测试。这部分完成之后才有必要进行下面的Mapper、Reducer测试。
2. MapReduce Mock - MRUnit
坐标
<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>1.1.0</version>
<classifier>hadoop2</classifier>
<scope>test</scope>
</dependency>
注意:需要显式地指定classifier来指定hadoop1还是hadoop2,两者在API上是有区别的。
下面以测试WordCount为例说明如何对各个部分写测试。
2.1 测试Mapper
- 初始化一个MapDriver
WordCount.Map mapper = new WordCount.Map();
mapDriver = MapDriver.newMapDriver(mapper);
- 给定输入检查输出
@Test
public void testMapper() throws IOException {
mapDriver.withInput(new LongWritable(), new Text("a b a"))
.withAllOutput(Lists.newArrayList(
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
))
.runTest();
}
大部分情况下,测试不太可能写得这么优雅,比如遇到了float/double,这个时候就需要把结果取出来判断(这种方式显然才是更灵活的)。
@Test
public void testMpper2() throws IOException {
mapDriver.withInput(new LongWritable(), new Text(
"a b a"));
List<Pair<Text, IntWritable>> actual = mapDriver.run();
List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
);
// apache commons-collection: 判断元素相等,考虑了每个元素的频次
assertTrue(CollectionUtils.isEqualCollection(actual, expected));
assertEquals(actual.get(0).getSecond().get(), 1);
}
2.2 测试Reducer
- 与Mapper类似,需要先初始化一个ReduceDriver
WordCount.Reduce reducer = new WordCount.Reduce();
reduceDriver = ReduceDriver.newReduceDriver(reducer);
- 给定输入检查输出
@Test
public void testReducer() throws IOException {
List<IntWritable> values = Lists.newArrayList();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
reduceDriver.withInput(new Text("a"), values);
reduceDriver.withOutput(new Text("a"), new IntWritable(2));
reduceDriver.runTest();
}
2.3 测试整个流程
- 需要初始化三个部分——
MapDriver
,ReduceDriver
和MapReduceDriver
WordCount.Map mapper = new WordCount.Map();
WordCount.Reduce reducer = new WordCount.Reduce();
mapDriver = MapDriver.newMapDriver(mapper);
reduceDriver = ReduceDriver.newReduceDriver(reducer);
mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
- 设定Map的输入,检查Reduce的输出
@Test
public void testMapReduce() throws IOException {
mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
.withInput(new LongWritable(), new Text("a b b"))
.withAllOutput(Lists.newArrayList(
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
.runTest();
}
3. 附录
- 完整工程
- pom依赖
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>1.1.0</version>
<classifier>hadoop2</classifier>
</dependency>
</dependencies>
- 代码
package du00.tests;
import com.google.common.collect.Lists;
import org.apache.commons.collections.CollectionUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.apache.hadoop.mrunit.types.Pair;
import org.junit.*;
import static org.junit.Assert.*;
import java.io.IOException;
import java.util.List;
public class WordCountTest {
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;
@Before
public void setUp() {
WordCount.Map mapper = new WordCount.Map();
WordCount.Reduce reducer = new WordCount.Reduce();
mapDriver = MapDriver.newMapDriver(mapper);
reduceDriver = ReduceDriver.newReduceDriver(reducer);
mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
}
@Test
public void testMapper() throws IOException {
mapDriver.withInput(new LongWritable(), new Text("a b a"))
.withAllOutput(Lists.newArrayList(
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
))
.runTest();
}
/**
* 有时候结果会比较复杂,取出来抽取结果的一部分比较会是比较好的选择。比如对象的某个字段是double类型的。
*
* @throws IOException
*/
@Test
public void testMpper2() throws IOException {
mapDriver.withInput(new LongWritable(), new Text(
"a b a"));
List<Pair<Text, IntWritable>> actual = mapDriver.run();
List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
);
// apache commons-collection: 判断元素相等,考虑了每个元素的频次
assertTrue(CollectionUtils.isEqualCollection(actual, expected));
assertEquals(actual.get(0).getSecond().get(), 1);
}
@Test
public void testReducer() throws IOException {
List<IntWritable> values = Lists.newArrayList();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
reduceDriver.withInput(new Text("a"), values);
reduceDriver.withOutput(new Text("a"), new IntWritable(2));
reduceDriver.runTest();
}
@Test
public void testMapReduce() throws IOException {
mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
.withInput(new LongWritable(), new Text("a b b"))
.withAllOutput(Lists.newArrayList(
new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
.runTest();
}
}