MapReduce开发流程



第二章开发MapReduce应用程序

 

一、JUnit测试java程序

单元测试用于测试某一个功能的代码。单元测试可以保证程序的功能正常使用。这个小节简单讲解一下如何在Eclipse中利用JUnit4.xEasyMockMokito进行单元测试。

1.1安装JUnit4.x

第一、JUnit 4.x.jar可以去JUnit的官网下载,并且将其添加到classpath中。第二、Eclipse版本中已经整合了JUnit,可以直接使用JUnit 4.x

1.2. JUnit注解

JUnit使用annotations来区别需要进行测试的方法。编写一个测试用例:给需要测试的方法加上(annotations)@org.JUnit.Test如果你希望检测2个对象是否相等,使用org.JUnit.Assert.*,并调用assertEquals()Static importsJava 5.0或以上版本是有效的,如:importstatic org.JUnit.Assert.*

1.3在用JUnit的进行测试之前,我们先演示一下大多数Java学习者测试程序的方法

首先,编写一个Demo_1类:

package demo_junit;

publicclassDemo_1 {

   publicdouble sum(double a,double b){

      double c=a+b;

      return c;

   }

}

然后,编写一个主函数来测试Demo_1这个类:

package demo_junit;

publicclass MainDemo_1 {

   publicstaticvoid main(String [] args){

      Demo_1 demo=new Demo_1();

      double result=demo.sum(1.0, 1.0);

      System.out.println(result);

   }

}

eclipse控制台可以看到我们预期的输出结果:

1.4Eclipse中运行你的单元测试

首先,新建一个JUnit测试类Demo_1TestDemo_1.java -> New-> JUnit Test Case

然后,在Demo_1Test编写测试代码:

package test.demo_junit;

importstatic org.junit.Assert.*;

import org.junit.Test;

import demo_junit.Demo_1;

publicclass Demo_1Test {

   @Test

   publicvoid test() {

      //fail("Not yet implemented");

      Demo_1 demo=new Demo_1();

      double result=demo.sum(1.0, 1.0);

      double expected = 2.0;//期望值,通过

      assertEquals(result, expected, 0.0);

   }

   @Test

   publicvoid test1() {

      //fail("Not yet implemented");

      Demo_1 demo=new Demo_1();

      double result=demo.sum(1.0, 1.0);

      double expected = 3.0;//错误值,不通过

      assertEquals(result, expected, 0.0);

   }

}

最后,Eclipse的菜单栏中选择:Run As -> JUnit testEclipse将通过绿色和红色的状态栏显示运行的结果。

1.5创建测试集

如果需要测试的用例很多,我们可以创建一个测试集,包含所有需要进行测试的测试用例。

创建一个测试用例Demo_2Test

package test.demo_junit;

importstatic org.junit.Assert.*;

import org.junit.Test;

import demo_junit.Demo_2;

publicclassDemo_2Test {

   @Test

   publicvoid test() {

      //fail("Not yet implemented");

      Demo_2 demo=new Demo_2();

      double result=demo.multi(1.0, 1.0);

      double expected = 1.0;//期望值,通过

      assertEquals(result, expected, 0.0);

   }

}

然后,选项需要测试的类,然后鼠标右击:New-> Other -> JUnit Test Suite

package test.demo_junit;

import org.junit.runner.RunWith;

import org.junit.runners.Suite;

import org.junit.runners.Suite.SuiteClasses;

@RunWith(Suite.class)

@SuiteClasses({ Demo_1Test.class, Demo_2Test.class })

publicclass AllTests {

}

1.6AnnotationsAssert语句

JUnit 4.xAnnotations的使用如下:

1. Annotations

Annotation

描述

@Test public void method()

需要被测试的方法。

@Before public void method()

在每一个测试用例执行前,需要调用的方法。

@After public void method()

在每一个测试用例执行后,需要调用的方法。

@BeforeClass public void method()

所有测试用例执行前,需要调用的方法。

@AfterClass public void method()

所有测试用例执行后,需要调用的方法。

@Ignore

忽略该测试的方法。

@Test(expected=IllegalArgumentException.class)

期望测试用例抛出指定的异常。

@Test(timeout=100)

测试用例期望的执行时间。

JUnit 4.xAssert语句的使用如下:

2. Assert语句

Assert语句

描述

fail(String)

方法失败。

assertTrue(true)

检测是否为真。

assertsEquals([String message], expected, actual)

检测2个对象是否相等

assertsEquals([String message], expected, actual, tolerance)

检测2个对象在允许的精度范围内是否相等。

assertNull([message], object)

检测是否为空。

assertNotNull([message], object)

检测是否为非空。

assertSame([String], expected, actual)

检测2个对象是否是为一个对象。

assertNotSame([String], expected, actual)

检测2个对象是否是为非一个对象。

assertTrue([message], boolean condition)

检测是否为真。

try {a.shouldThroughException(); fail("Failed")} catch (RuntimeException e) {asserttrue(true);}

检测是否抛出异常。

1.7为什么要使用Mock来进行单元测试?

上一个示例非常简单,但是在实际的环境中,需要测试的类可能会依赖与其他的第三方库。最佳的处理方案应该是创建一个mock对象。mock对象是一个空的接口,你可以完全控制实现接口的对象的所有行为。Mock最大的功能是帮你把单元测试的耦合分解开,如果你的代码对另一个类或者接口有依赖,它能够帮你模拟这些依赖,并帮你验证所调用的依赖的行为。

比如一段代码有这样的依赖:

当我们需要测试A类的时候,如果没有mock,则我们需要把整个依赖树都构建出来,而使用mock的话就可以将结构分解开,像下面这样:

1.8 Easy Mock单元测试方法

首先,需要向eclipse中增加Easy Mock包:easymock-3.1.jar

然后,新建增几个类(IncomeCalculator需要被测试):

package income;

publicenum Position {   

   BOSS,PROGRAMMER,SURFER

}

 

package income.exceptions;

publicclass PositionExceptionextends RuntimeException {   

   privatestaticfinallongserialVersionUID = 1L;   

   public PositionException(String message) {       

   super(message);   

   }

}

 

package income.method;

import income.Position;

publicinterface ICalcMethod {   

   publicabstractdouble calc(Position position);

}

 

package income.exceptions;

publicclass CalcMethodExceptionextends RuntimeException {   

   privatestaticfinallongserialVersionUID = 1L;   

   public CalcMethodException(String message) {       

      super(message);   

      }

}

 

package income;

import income.exceptions.CalcMethodException;

import income.exceptions.PositionException;

import income.method.ICalcMethod;

publicclass IncomeCalculator{       

   private ICalcMethodcalcMethod;   

   private Positionposition;   

   publicvoid setCalcMethod(ICalcMethod calcMethod){       

      this.calcMethod = calcMethod;   

      }   

   publicvoid setPosition(Position position){       

      this.position = position;   

      }   

   publicdouble calc (){       

      if (calcMethod==null){           

          thrownew CalcMethodException("CalcMethod not yet maintained");       

          }       

      if (position==null){           

          thrownew PositionException("Position not yet maintained");       

          }       

      returncalcMethod.calc(position);   

      }

}

再新建JUnit测试例IncomeCalculatorTest

package test.demo_junit;

importstatic org.junit.Assert.*;

import org.junit.Test;

import income.IncomeCalculator;

import income.Position;

import income.exceptions.CalcMethodException;

import income.exceptions.PositionException;

import income.method.ICalcMethod;

import org.easymock.EasyMock;

import org.junit.Before;

publicclass IncomeCalculatorTest {

   private ICalcMethodcalcMethod;   

   private IncomeCalculatorcalc;   

   @Before   

   publicvoid setUp()throws Exception {       

      calcMethod = EasyMock.createMock(ICalcMethod.class);       

      calc =new IncomeCalculator();   

      }

   @Test

   publicvoid test() {//测试IncomeCalculator

      EasyMock.expect(calcMethod.calc(Position.BOSS)).andReturn(80000.0)

       .times(2);

      EasyMock.expect(calcMethod.calc(Position.PROGRAMMER))

             .andReturn(60000.0);

      //我们需要调用reply方法使我们的Mock对象有效。

      EasyMock.replay(calcMethod);

      calc.setCalcMethod(calcMethod);

      try {           

          calc.calc();           

          fail("Exception did not occur");       

      }catch (PositionException e) {

      }       

      calc.setPosition(Position.BOSS);       

      assertEquals(80000.0,calc.calc(), 0.0);       

      assertEquals(80000.0,calc.calc(), 0.0);       

      calc.setPosition(Position.PROGRAMMER);       

      assertEquals(60000.0,calc.calc(), 0.0);       

      calc.setPosition(Position.SURFER);

      EasyMock.verify(calcMethod);//调用verify的方法来检查Mock对象是否被调用了

   }

   //expect方法. Mock对象如果针对特殊的参数返回特定的值。

   @Test(expected = CalcMethodException.class)//测试异常

   publicvoid testNoCalc() {       

      calc.setPosition(Position.SURFER);       

      calc.calc();   

      }

   @Test(expected = PositionException.class)  

   publicvoid testNoPosition() {       

      EasyMock.expect(calcMethod.calc(Position.BOSS)).andReturn(80000.0);       

      EasyMock.replay(calcMethod);       

      calc.setCalcMethod(calcMethod);       

      calc.calc();   

      }

   @Test(expected = PositionException.class

      publicvoid test1() {       

       EasyMock.expect(calcMethod.calc(Position.SURFER)).andThrow(

               new PositionException("Don't know this guy")).times(1);

       EasyMock.replay(calcMethod);

       calc.setPosition(Position.SURFER);

       calc.setCalcMethod(calcMethod);

       calc.calc();

   }

}

EasyMock单元测试结果:

1.9Mockito单元测试方法

Mockito也是一种Mock工具,它非常好用。使用Mockito执行后验证的模型,语法更简洁并且更加贴近程序员的思考方式,能够模拟类而不仅仅是接口等等。

首先,需要向eclipse中增加Mockito包:mockito-all-1.8.5.jar

接下来用mock验证一些行为,新建一个单元测试类:

package test.demo_junit;

importstatic org.mockito.Mockito.*;

import org.junit.Test;

import java.awt.List;

import java.util.LinkedList;

publicclass Demo_1mockito {

 

   //@Test

   publicvoid test() {

      //fail("Not yet implemented");

      List mockedList =mock(List.class);//创建 mock       

      mockedList.add("one");//使用mock,增加一个one元素

      //mockedList.clear();  

      mockedList.add("1"); 

      // verification  

      verify(mockedList).add("one");//验证一个one元素 

      //verify(mockedList).clear();

      verify(mockedList).add("1");//如果把"1"改为"2",则会出错

   }

   //@Test

   publicvoid test1() {

      //你可以mock具体的实例,而不仅仅是接口  

      @SuppressWarnings("rawtypes")

      LinkedListmockedList = mock(LinkedList.class);   

      // stubbing  

      when(mockedList.get(0)).thenReturn("first");  

      when(mockedList.get(1)).thenReturn("second"); 

      when(mockedList.get(2)).thenThrow(new RuntimeException());   

      System.out.println(mockedList.get(0));//这个会打印出first

      System.out.println(mockedList.get(1));//这个会打印出second

      //System.out.println(mockedList.get(2)); //这个会抛出异常   

      System.out.println(mockedList.get(999));//这个会打印出null 

      verify(mockedList,atMost(2)).get(0);  //最多调用了两次

   }

   //@Test

   publicvoid test2(){

      @SuppressWarnings("rawtypes")

      LinkedList mockedList =mock(LinkedList.class);

      when(mockedList.get(0)).thenReturn("first");  

      when(mockedList.get(0)).thenReturn("oops");  

      System.out.println(mockedList.get(0));//打印出oops 

      System.out.println(mockedList.get(0));//打印出oops

   }

   @Test

   publicvoid test3(){

      @SuppressWarnings("unchecked")

      LinkedList<String> mockedList =mock(LinkedList.class);

      //using mock   

      mockedList.add("once");  

      mockedList.add("twice");  

      mockedList.add("twice");  

      mockedList.add("three times");  

      mockedList.add("three times");  

      mockedList.add("three times");  

      //following two verifications work exactly the same - times(1) is used by default  

      verify(mockedList).add("once");  

      verify(mockedList,times(1)).add("once");  

      //exact number of invocations verification  

      verify(mockedList,times(2)).add("twice");  

      verify(mockedList,times(3)).add("three times");  

      //verification using never(). never() is an alias to times(0)  

      verify(mockedList,never()).add("never happened");  

      //verification using atLeast()/atMost()  

      verify(mockedList,atLeastOnce()).add("three times");  

      verify(mockedList,atLeast(0)).add("five times");  

      verify(mockedList,atMost(5)).add("three times"); 

   }

}

通过test()代码介绍了mock,使用Mockito的静态方法mock,我们就可以创建一个类的mock实例,这个mock实例拥有ListLinkedList的所有方法接口,并且给这些方法以最基本的实现。对于test()类,在验证阶段,当我们验证这个mock的方法add("one")是否被调用的时候,他不会抛出异常,因为我们确实调用了这个方法,但是当我们验证它是否调用add("2")的时候,就会抛出异常,说明我们没有调用过这个方法,此时的测试就会失败。但是,中间的二句调用了mock的方法,即使将来不验证也没有任何关系。

这里还需要注意以下几点:1. mock实例默认的会给所有的方法添加基本实现:返回null或空集合,或者0等基本类型的值。2.当我们连续两次为同一个方法使用stub的时候,他只会只用最新的一次。3.一旦这个方法被stub了,就会一直返回这个stub的值。

二、mrunit测试Mapper程序

接下来,学习开发MapReduce程序过程。首先写map函数和reduce函数,最好使用单元测试来确保函数的运行符合预期,然后,写一个驱动程序来运行作业,要看这个驱动程序是否可以运行,之后利用本地IDE调试,修改程序。以下程序可以在IDE Eclipse中运行

2.1Mapper函数:

publicclass MaxTemperatureMapperextends

       Mapper<LongWritable, Text, Text, IntWritable> {

   @Override

   publicvoid map(LongWritable key, Text value, Context context)

           throws IOException, InterruptedException {

       String line = value.toString();

       String year = line.substring(15, 19);

       String temp = line.substring(87, 92);

       if (!missing(temp)) {

           int airTemperature = Integer.parseInt(temp);

           context.write(new Text(year),new IntWritable(airTemperature));

       }

   }

   privateboolean missing(String temp) {

       return temp.equals("+9999");

   }

}

Mapper的测试实例:

import java.io.IOException;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;

import org.junit.*;

publicclass MaxTemperatureMapperTest {

   @Test

   publicvoid processesValidRecord()throws IOException, InterruptedException {

       Text value =new Text(

               "0043011990999991950051518004+68750+023550FM-12+0382" +

               // Year ^^^^

"99999V0203201N00261220001CN9999999N9-00111+99999999999");

       // Temperature ^^^^^

 new MapDriver<LongWritable, Text, Text, IntWritable>()

    .withMapper(new MaxTemperatureMapper()).withInputValue(value)

    .withOutput(new Text("1950"),new IntWritable(-11)).runTest();

   }

}

2.2 Reducer函数:

publicclass MaxTemperatureReducerextends

       Reducer<Text, IntWritable, Text, IntWritable> {

   @Override

   publicvoid reduce(Text key, Iterable<IntWritable> values, Context context)

           throws IOException, InterruptedException {

       int maxValue = Integer.MIN_VALUE;

       for (IntWritable value : values) {

           maxValue = Math.max(maxValue, value.get());

       }

       context.write(key,new IntWritable(maxValue));

   }

       }
Reducer的测试函数:

import java.io.IOException;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;

import org.junit.*;

publicclass MaxTemperatureMapperTest {

   @Test

   publicvoid returnsMaximumIntegerInValues()throws IOException,

           InterruptedException {

       new ReduceDriver<Text, IntWritable, Text, IntWritable>()

               .withReducer(new MaxTemperatureReducer())

               .withInputKey(new Text("1950"))

               .withInputValues(

                       Arrays.asList(new IntWritable(10),new IntWritable(5)))

               .withOutput(new Text("1950"),new IntWritable(10)).runTest();

   }

}

2.3本地运行测试数据:本地运行Job

Job驱动程序查找最高气温:

publicclass MaxTemperatureDriverextends Configuredimplements Tool {

   @Override

   publicint run(String[] args)throws Exception {

       if (args.length != 2) {

      System.err.printf("Usage: %s [generic options] <input> <output>\n",

                   getClass().getSimpleName());

           ToolRunner.printGenericCommandUsage(System.err);

           return -1;

       }

       Job job =new Job(getConf(),"Max temperature");

       job.setJarByClass(getClass());

       FileInputFormat.addInputPath(job,new Path(args[0]));

       FileOutputFormat.setOutputPath(job,new Path(args[1]));

       job.setMapperClass(MaxTemperatureMapper.class);

       job.setCombinerClass(MaxTemperatureReducer.class);

       job.setReducerClass(MaxTemperatureReducer.class);

       job.setOutputKeyClass(Text.class);

       job.setOutputValueClass(IntWritable.class);

       return job.waitForCompletion(true) ? 0 : 1;

   }

   publicstaticvoid main(String[] args)throws Exception {

   int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);

       System.exit(exitCode);

   }

}

命令运行驱动程序:(eclipse上把上面程序打包为MaxTemperatureDriver.jar)

# exportHADOOP_CLASSPATH=MaxTemperatureDriver.jar (注意:当前目录)

# hadoop mapreduce_test.MaxTemperatureDriver/usr/xjp/input/1929 output1 (注:1929是数据文件名)

或者bash-4.1$hadoop jar MaxTemperatureDriver.jar /usr/xjp/input/1929 output1

运行的结果是:

Parse函数:

publicclass NcdcRecordParser {

   privatestaticfinalintMISSING_TEMPERATURE = 9999;

   private Stringyear;

   privateintairTemperature;

   private Stringquality;

   publicvoid parse(String record) {

       year = record.substring(15, 19);

       String airTemperatureString;

       // Remove leading plus sign as parseInt doesn't like them

       if (record.charAt(87) =='+') {

           airTemperatureString = record.substring(88, 92);

       }else {

           airTemperatureString = record.substring(87, 92);

       }

       airTemperature = Integer.parseInt(airTemperatureString);

       quality = record.substring(92, 93);

   }

   publicvoid parse(Text record) {

       parse(record.toString());

   }

   publicboolean isValidTemperature() {

       returnairTemperature !=MISSING_TEMPERATURE

               &&quality.matches("[01459]");

   }

   public String getYear() {

       returnyear;

   }

   publicint getAirTemperature() {

       returnairTemperature;

   }

}

利用Parser函数Mapper函数可以写成下面形式:

publicclass MaxTemperatureMapperextends

       Mapper<LongWritable, Text, Text, IntWritable> {

   private NcdcRecordParserparser =new NcdcRecordParser();

 

   @Override

   publicvoid map(LongWritable key, Text value, Context context)

           throws IOException, InterruptedException {

       parser.parse(value);

       if (parser.isValidTemperature()) {

           context.write(new Text(parser.getYear()),

                   new IntWritable(parser.getAirTemperature()));

       }

   }

}

测试驱动程序:

package mapreduce_test;

importstatic org.hamcrest.CoreMatchers.is;

//import static org.hamcrest.MatcherAssert.assertThat;

importstatic org.junit.Assert.assertThat;

importstatic org.hamcrest.CoreMatchers.nullValue;

 

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStream;

import java.io.InputStreamReader;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.FileUtil;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.fs.PathFilter;

import org.junit.Test;

 

publicclass MaxTemperatureDriverTest {

 

publicstaticclass OutputLogFilterimplements PathFilter {

 publicboolean accept(Path path) {

  return !path.getName().startsWith("_");

 }

}

 

@Test

publicvoid test()throws Exception {

 Configuration conf =new Configuration();

 conf.set("fs.default.name","file:///");

 conf.set("mapred.job.tracker","local");

 

 Path input =new Path("input");

 Path output =new Path("output");

 

 FileSystem fs = FileSystem.getLocal(conf);

 fs.delete(output,true);// delete old output

 

 MaxTemperatureDriver driver =new MaxTemperatureDriver();

 driver.setConf(conf);

 int exitCode = driver.run(new String[] {

    input.toString(), output.toString() });

 assertThat(exitCode,is(0));

 checkOutput(conf, output);

}

privatevoid checkOutput(Configuration conf, Path output)throws IOException {

 FileSystem fs = FileSystem.getLocal(conf);

 Path[] outputFiles = FileUtil.stat2Paths(

    fs.listStatus(output,new OutputLogFilter()));

 assertThat(outputFiles.length,is(1));

 BufferedReader actual = asBufferedReader(fs.open(outputFiles[0]));

 BufferedReader expected = asBufferedReader(fs.open(new Path("expected.txt")));

 String expectedLine;

 while ((expectedLine = expected.readLine()) != null) {

  assertThat(actual.readLine(),is(expectedLine));

 }

 assertThat(actual.readLine(),nullValue());

 actual.close();

 expected.close();

}

 

private BufferedReader asBufferedReader(InputStream in) throws IOException {

 returnnew BufferedReader(new InputStreamReader(in));

}

}

注:上面的测试程序通过,需保证增加了如下包:

需要关注的是,checkOutput()方法被调用用以逐行对比实际输出与与其输出。另外上面的这段程序是在linux环境下运行的,如果是在windows下的eclipse上运行上面那段程序,会出现下面的错误:

解决办法是我们需要在Windows系统上安装一个linux模拟器“cygwin”来支持程序的运行。在windows上安装好cygwin,然后在环境变量中添加cygwinbin目录,比如“C:\Program Files\cygwin\bin”,问题得以解决。(cygwin是一个在windows平台上运行的unix模拟环境,它对于学习unix/linux操作环境,或者从unixwindows的应用程序移植,或者进行某些特殊的开发工作,尤其是使用gnu工具集在windows上进行嵌入式系统开发,非常有用。)

在IDE(Eclipse)上测试通过之后,在伪分布平台上测试,然后小数据在全分布平台上测试,之后使用整个数据

参考书籍:Hadoop-The Definitive Guide-3rd

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值