专题：如何写测试——MapReduce

最新推荐文章于 2022-03-10 15:34:47 发布

du00

最新推荐文章于 2022-03-10 15:34:47 发布

阅读量467

点赞数

分类专栏： hadoop 文章标签： mapreduce test

本文链接：https://blog.csdn.net/duh2so4/article/details/50287519

版权

hadoop 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

写不写测试是个人选择问题，对于我自己而言，写测试不是为了有X格，而是为了对代码更有信心。

　　MapReduce的测试确实没有那么方便，但是还是有办法的。下面的内容主要加工自MRUnit Tutorial，Tutorial中另外还介绍了Counter的测试（也就是如何获取Counter）和Configuration传参数（如何在Mock中获取conf对象）。

1. 基本功 - JUnit

　　如果不会这个我还能说什么呢，还好很少有人不会。

坐标junit:junit

import org.junit.*;

public class TestCases{
  @Test
  public void testXXX(){
   assertEquals(1 == 1);  
  }
}

　　这一部分是代码功能性测试的基础，一般来说是与环境不太相关的都可以用JUnit来做函数级的测试。这部分完成之后才有必要进行下面的Mapper、Reducer测试。

2. MapReduce Mock - MRUnit

坐标

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version>1.1.0</version>
    <classifier>hadoop2</classifier>
    <scope>test</scope>
</dependency>

　　注意：需要显式地指定classifier来指定hadoop1还是hadoop2，两者在API上是有区别的。
　　
　　下面以测试WordCount为例说明如何对各个部分写测试。

2.1 测试Mapper

初始化一个MapDriver

WordCount.Map mapper = new WordCount.Map();
mapDriver = MapDriver.newMapDriver(mapper);

给定输入检查输出

@Test
public void testMapper() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text("a b a"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
            ))
            .runTest();
}

　　大部分情况下，测试不太可能写得这么优雅，比如遇到了float/double，这个时候就需要把结果取出来判断（这种方式显然才是更灵活的）。

@Test
public void testMpper2() throws IOException {
    mapDriver.withInput(new LongWritable(), new Text(
            "a b a"));
    List<Pair<Text, IntWritable>> actual = mapDriver.run();

    List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
            new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
    );

    // apache commons-collection: 判断元素相等，考虑了每个元素的频次
    assertTrue(CollectionUtils.isEqualCollection(actual, expected));

    assertEquals(actual.get(0).getSecond().get(), 1);
}

2.2 测试Reducer

与Mapper类似，需要先初始化一个ReduceDriver

WordCount.Reduce reducer = new WordCount.Reduce();
reduceDriver = ReduceDriver.newReduceDriver(reducer);

给定输入检查输出

@Test
public void testReducer() throws IOException {
    List<IntWritable> values = Lists.newArrayList();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    reduceDriver.withInput(new Text("a"), values);
    reduceDriver.withOutput(new Text("a"), new IntWritable(2));
    reduceDriver.runTest();
}

2.3 测试整个流程

需要初始化三个部分——MapDriver, ReduceDriver和MapReduceDriver

WordCount.Map mapper = new WordCount.Map();
WordCount.Reduce reducer = new WordCount.Reduce();
mapDriver = MapDriver.newMapDriver(mapper);
reduceDriver = ReduceDriver.newReduceDriver(reducer);
mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);

设定Map的输入，检查Reduce的输出

@Test
public void testMapReduce() throws IOException {
    mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
            .withInput(new LongWritable(), new Text("a b b"))
            .withAllOutput(Lists.newArrayList(
                    new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                    new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
            .runTest();
}

3. 附录

完整工程
pom依赖

<dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.mrunit</groupId>
        <artifactId>mrunit</artifactId>
        <version>1.1.0</version>
        <classifier>hadoop2</classifier>
    </dependency>
</dependencies>

代码

package du00.tests;

import com.google.common.collect.Lists;
import org.apache.commons.collections.CollectionUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.apache.hadoop.mrunit.types.Pair;
import org.junit.*;

import static org.junit.Assert.*;

import java.io.IOException;
import java.util.List;

public class WordCountTest {
    MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
    ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
    MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;

    @Before
    public void setUp() {
        WordCount.Map mapper = new WordCount.Map();
        WordCount.Reduce reducer = new WordCount.Reduce();
        mapDriver = MapDriver.newMapDriver(mapper);
        reduceDriver = ReduceDriver.newReduceDriver(reducer);
        mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
    }

    @Test
    public void testMapper() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text("a b a"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
                ))
                .runTest();
    }

    /**
     * 有时候结果会比较复杂，取出来抽取结果的一部分比较会是比较好的选择。比如对象的某个字段是double类型的。
     *
     * @throws IOException
     */
    @Test
    public void testMpper2() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text(
                "a b a"));
        List<Pair<Text, IntWritable>> actual = mapDriver.run();

        List<Pair<Text, IntWritable>> expected = Lists.newArrayList(
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("b"), new IntWritable(1)),
                new Pair<Text, IntWritable>(new Text("a"), new IntWritable(1))
        );

        // apache commons-collection: 判断元素相等，考虑了每个元素的频次
        assertTrue(CollectionUtils.isEqualCollection(actual, expected));

        assertEquals(actual.get(0).getSecond().get(), 1);
    }

    @Test
    public void testReducer() throws IOException {
        List<IntWritable> values = Lists.newArrayList();
        values.add(new IntWritable(1));
        values.add(new IntWritable(1));
        reduceDriver.withInput(new Text("a"), values);
        reduceDriver.withOutput(new Text("a"), new IntWritable(2));
        reduceDriver.runTest();
    }

    @Test
    public void testMapReduce() throws IOException {
        mapReduceDriver.withInput(new LongWritable(), new Text("a b a"))
                .withInput(new LongWritable(), new Text("a b b"))
                .withAllOutput(Lists.newArrayList(
                        new Pair<Text, IntWritable>(new Text("a"), new IntWritable(3)),
                        new Pair<Text, IntWritable>(new Text("b"), new IntWritable(3))))
                .runTest();
    }
}