专题:如何写测试——MapReduce

写不写测试是个人选择问题,对于我自己而言,写测试不是为了有X格,而是为了对代码更有信心。

  MapReduce的测试确实没有那么方便,但是还是有办法的。下面的内容主要加工自MRUnit Tutorial,Tutorial中另外还介绍了Counter的测试(也就是如何获取Counter)和Configuration传参数(如何在Mock中获取conf对象)。

1. 基本功 - JUnit

  如果不会这个我还能说什么呢,还好很少有人不会。

坐标junit:junit

import org.junit.*;

public class TestCases{
  @Test
  public void testXXX(){
   assertEquals(1 == 1);  
  }
}

  这一部分是代码功能性测试的基础,一般来说是与环境不太相关的都可以用JUnit来做函数级的测试。这部分完成之后才有必要进行下面的Mapper、Reducer测试。

2. MapReduce Mock - MRUnit

坐标

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version>1.1.0</version>
    <classifier>hadoop2</classifier>
    <scope>test</scope>
</dependency>

  注意:需要显式地指定classifier来指定hadoop1还是hadoop2,两者在API上是有区别的。
  
  下面以测试WordCount为例说明如何对各个部分写测试。

2.1 测试Mapper

  • 初始化一个MapDriver
WordCount.Map mapper = new WordCount.Map();
mapDriver = MapDriver.newMapDriver(mapper);
  • 给定输入检查输出
@Test
public void testMapper() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text("a b a"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
            ))
            .runTest();
}

  大部分情况下,测试不太可能写得这么优雅,比如遇到了float/double,这个时候就需要把结果取出来判断(这种方式显然才是更灵活的)。

@Test
public void testMpper2() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text(
            "a b a"));
    List<Pair<Text, IntWritable>> actual = mapDriver.run();

    List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
    );

    // apache commons-collection: 判断元素相等,考虑了每个元素的频次
    assertTrue(CollectionUtils.isEqualCollection(actual, expected));

    assertEquals(actual.get(0).getSecond().get(), 1);
}

2.2 测试Reducer

  • 与Mapper类似,需要先初始化一个ReduceDriver
WordCount.Reduce reducer = new WordCount.Reduce();
reduceDriver = ReduceDriver.newReduceDriver(reducer);
  • 给定输入检查输出
@Test
public void testReducer() throws IOException {
    List<IntWritable> values = Lists.newArrayList();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    reduceDriver.withInput(new Text("a"), values);
    reduceDriver.withOutput(new Text("a"), new IntWritable(2));
    reduceDriver.runTest();
}

2.3 测试整个流程

  • 需要初始化三个部分——MapDriver, ReduceDriverMapReduceDriver
WordCount.Map mapper = new WordCount.Map();
WordCount.Reduce reducer = new WordCount.Reduce();
mapDriver = MapDriver.newMapDriver(mapper);
reduceDriver = ReduceDriver.newReduceDriver(reducer);
mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
  • 设定Map的输入,检查Reduce的输出
@Test
public void testMapReduce() throws IOException {
    mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
            .withInput(new LongWritable(), new Text("a b b"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
            .runTest();
}

3. 附录

<dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.mrunit</groupId>
        <artifactId>mrunit</artifactId>
        <version>1.1.0</version>
        <classifier>hadoop2</classifier>
    </dependency>
</dependencies>
  • 代码
package du00.tests;

import com.google.common.collect.Lists;
import org.apache.commons.collections.CollectionUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.apache.hadoop.mrunit.types.Pair;
import org.junit.*;

import static org.junit.Assert.*;

import java.io.IOException;
import java.util.List;

public class WordCountTest {
    MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
    ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
    MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;

    @Before
    public void setUp() {
        WordCount.Map mapper = new WordCount.Map();
        WordCount.Reduce reducer = new WordCount.Reduce();
        mapDriver = MapDriver.newMapDriver(mapper);
        reduceDriver = ReduceDriver.newReduceDriver(reducer);
        mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
    }

    @Test
    public void testMapper() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text("a b a"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
                ))
                .runTest();
    }

    /**
     * 有时候结果会比较复杂,取出来抽取结果的一部分比较会是比较好的选择。比如对象的某个字段是double类型的。
     *
     * @throws IOException
     */
    @Test
    public void testMpper2() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text(
                "a b a"));
        List<Pair<Text, IntWritable>> actual = mapDriver.run();

        List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
        );

        // apache commons-collection: 判断元素相等,考虑了每个元素的频次
        assertTrue(CollectionUtils.isEqualCollection(actual, expected));

        assertEquals(actual.get(0).getSecond().get(), 1);
    }

    @Test
    public void testReducer() throws IOException {
        List<IntWritable> values = Lists.newArrayList();
        values.add(new IntWritable(1));
        values.add(new IntWritable(1));
        reduceDriver.withInput(new Text("a"), values);
        reduceDriver.withOutput(new Text("a"), new IntWritable(2));
        reduceDriver.runTest();
    }

    @Test
    public void testMapReduce() throws IOException {
        mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
                .withInput(new LongWritable(), new Text("a b b"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
                .runTest();
    }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值