第二章开发MapReduce应用程序
一、JUnit测试java程序
单元测试用于测试某一个功能的代码。单元测试可以保证程序的功能正常使用。这个小节简单讲解一下如何在Eclipse中利用JUnit4.x、EasyMock和Mokito进行单元测试。
1.1安装JUnit4.x
第一、JUnit 4.x.jar可以去JUnit的官网下载,并且将其添加到classpath中。第二、Eclipse版本中已经整合了JUnit,可以直接使用JUnit 4.x。
1.2. JUnit注解
JUnit使用annotations来区别需要进行测试的方法。编写一个测试用例:①给需要测试的方法加上(annotations):@org.JUnit.Test;②如果你希望检测2个对象是否相等,使用org.JUnit.Assert.*,并调用assertEquals();③Static imports在Java 5.0或以上版本是有效的,如:importstatic org.JUnit.Assert.*。
1.3在用JUnit的进行测试之前,我们先演示一下大多数Java学习者测试程序的方法
首先,编写一个Demo_1类:
package demo_junit; publicclassDemo_1 { publicdouble sum(double a,double b){ double c=a+b; return c; } } |
然后,编写一个主函数来测试Demo_1这个类:
package demo_junit; publicclass MainDemo_1 { publicstaticvoid main(String [] args){ Demo_1 demo=new Demo_1(); double result=demo.sum(1.0, 1.0); System.out.println(result); } } |
在eclipse控制台可以看到我们预期的输出结果:
1.4在Eclipse中运行你的单元测试
首先,新建一个JUnit测试类Demo_1Test:Demo_1.java -> New-> JUnit Test Case;
然后,在Demo_1Test编写测试代码:
package test.demo_junit; importstatic org.junit.Assert.*; import org.junit.Test; import demo_junit.Demo_1; publicclass Demo_1Test { @Test publicvoid test() { //fail("Not yet implemented"); Demo_1 demo=new Demo_1(); double result=demo.sum(1.0, 1.0); double expected = 2.0;//期望值,通过 assertEquals(result, expected, 0.0); } @Test publicvoid test1() { //fail("Not yet implemented"); Demo_1 demo=new Demo_1(); double result=demo.sum(1.0, 1.0); double expected = 3.0;//错误值,不通过 assertEquals(result, expected, 0.0); } } |
最后,在Eclipse的菜单栏中选择:Run As -> JUnit test,Eclipse将通过绿色和红色的状态栏显示运行的结果。
1.5创建测试集
如果需要测试的用例很多,我们可以创建一个测试集,包含所有需要进行测试的测试用例。
创建一个测试用例Demo_2Test:
package test.demo_junit; importstatic org.junit.Assert.*; import org.junit.Test; import demo_junit.Demo_2; publicclassDemo_2Test { @Test publicvoid test() { //fail("Not yet implemented"); Demo_2 demo=new Demo_2(); double result=demo.multi(1.0, 1.0); double expected = 1.0;//期望值,通过 assertEquals(result, expected, 0.0); } } |
然后,选项需要测试的类,然后鼠标右击:New-> Other -> JUnit Test Suite
package test.demo_junit; import org.junit.runner.RunWith; import org.junit.runners.Suite; import org.junit.runners.Suite.SuiteClasses; @RunWith(Suite.class) @SuiteClasses({ Demo_1Test.class, Demo_2Test.class }) publicclass AllTests { } |
1.6Annotations和Assert语句
JUnit 4.x中Annotations的使用如下:
表1. Annotations
Annotation | 描述 |
@Test public void method() | 需要被测试的方法。 |
@Before public void method() | 在每一个测试用例执行前,需要调用的方法。 |
@After public void method() | 在每一个测试用例执行后,需要调用的方法。 |
@BeforeClass public void method() | 所有测试用例执行前,需要调用的方法。 |
@AfterClass public void method() | 所有测试用例执行后,需要调用的方法。 |
@Ignore | 忽略该测试的方法。 |
@Test(expected=IllegalArgumentException.class) | 期望测试用例抛出指定的异常。 |
@Test(timeout=100) | 测试用例期望的执行时间。 |
JUnit 4.x中Assert语句的使用如下:
表2. Assert语句
Assert语句 | 描述 |
fail(String) | 方法失败。 |
assertTrue(true) | 检测是否为真。 |
assertsEquals([String message], expected, actual) | 检测2个对象是否相等 |
assertsEquals([String message], expected, actual, tolerance) | 检测2个对象在允许的精度范围内是否相等。 |
assertNull([message], object) | 检测是否为空。 |
assertNotNull([message], object) | 检测是否为非空。 |
assertSame([String], expected, actual) | 检测2个对象是否是为一个对象。 |
assertNotSame([String], expected, actual) | 检测2个对象是否是为非一个对象。 |
assertTrue([message], boolean condition) | 检测是否为真。 |
try {a.shouldThroughException(); fail("Failed")} catch (RuntimeException e) {asserttrue(true);} | 检测是否抛出异常。 |
1.7为什么要使用Mock来进行单元测试?
上一个示例非常简单,但是在实际的环境中,需要测试的类可能会依赖与其他的第三方库。最佳的处理方案应该是创建一个mock对象。mock对象是一个空的接口,你可以完全控制实现接口的对象的所有行为。Mock最大的功能是帮你把单元测试的耦合分解开,如果你的代码对另一个类或者接口有依赖,它能够帮你模拟这些依赖,并帮你验证所调用的依赖的行为。
比如一段代码有这样的依赖:
当我们需要测试A类的时候,如果没有mock,则我们需要把整个依赖树都构建出来,而使用mock的话就可以将结构分解开,像下面这样:
1.8 Easy Mock单元测试方法
首先,需要向eclipse中增加Easy Mock包:easymock-3.1.jar
然后,新建增几个类(IncomeCalculator需要被测试):
package income; publicenum Position { BOSS,PROGRAMMER,SURFER }
package income.exceptions; publicclass PositionExceptionextends RuntimeException { privatestaticfinallongserialVersionUID = 1L; public PositionException(String message) { super(message); } }
package income.method; import income.Position; publicinterface ICalcMethod { publicabstractdouble calc(Position position); }
package income.exceptions; publicclass CalcMethodExceptionextends RuntimeException { privatestaticfinallongserialVersionUID = 1L; public CalcMethodException(String message) { super(message); } }
package income; import income.exceptions.CalcMethodException; import income.exceptions.PositionException; import income.method.ICalcMethod; publicclass IncomeCalculator{ private ICalcMethodcalcMethod; private Positionposition; publicvoid setCalcMethod(ICalcMethod calcMethod){ this.calcMethod = calcMethod; } publicvoid setPosition(Position position){ this.position = position; } publicdouble calc (){ if (calcMethod==null){ thrownew CalcMethodException("CalcMethod not yet maintained"); } if (position==null){ thrownew PositionException("Position not yet maintained"); } returncalcMethod.calc(position); } } |
再新建JUnit测试例IncomeCalculatorTest:
package test.demo_junit; importstatic org.junit.Assert.*; import org.junit.Test; import income.IncomeCalculator; import income.Position; import income.exceptions.CalcMethodException; import income.exceptions.PositionException; import income.method.ICalcMethod; import org.easymock.EasyMock; import org.junit.Before; publicclass IncomeCalculatorTest { private ICalcMethodcalcMethod; private IncomeCalculatorcalc; @Before publicvoid setUp()throws Exception { calcMethod = EasyMock.createMock(ICalcMethod.class); calc =new IncomeCalculator(); } @Test publicvoid test() {//测试IncomeCalculator EasyMock.expect(calcMethod.calc(Position.BOSS)).andReturn(80000.0) .times(2); EasyMock.expect(calcMethod.calc(Position.PROGRAMMER)) .andReturn(60000.0); //我们需要调用reply方法使我们的Mock对象有效。 EasyMock.replay(calcMethod); calc.setCalcMethod(calcMethod); try { calc.calc(); fail("Exception did not occur"); }catch (PositionException e) { } calc.setPosition(Position.BOSS); assertEquals(80000.0,calc.calc(), 0.0); assertEquals(80000.0,calc.calc(), 0.0); calc.setPosition(Position.PROGRAMMER); assertEquals(60000.0,calc.calc(), 0.0); calc.setPosition(Position.SURFER); EasyMock.verify(calcMethod);//调用verify的方法来检查Mock对象是否被调用了 } //expect方法. Mock对象如果针对特殊的参数返回特定的值。 @Test(expected = CalcMethodException.class)//测试异常 publicvoid testNoCalc() { calc.setPosition(Position.SURFER); calc.calc(); } @Test(expected = PositionException.class) publicvoid testNoPosition() { EasyMock.expect(calcMethod.calc(Position.BOSS)).andReturn(80000.0); EasyMock.replay(calcMethod); calc.setCalcMethod(calcMethod); calc.calc(); } @Test(expected = PositionException.class) publicvoid test1() { EasyMock.expect(calcMethod.calc(Position.SURFER)).andThrow( new PositionException("Don't know this guy")).times(1); EasyMock.replay(calcMethod); calc.setPosition(Position.SURFER); calc.setCalcMethod(calcMethod); calc.calc(); } } |
EasyMock单元测试结果:
1.9Mockito单元测试方法
Mockito也是一种Mock工具,它非常好用。使用Mockito执行后验证的模型,语法更简洁并且更加贴近程序员的思考方式,能够模拟类而不仅仅是接口等等。
首先,需要向eclipse中增加Mockito包:mockito-all-1.8.5.jar;
接下来用mock验证一些行为,新建一个单元测试类:
package test.demo_junit; importstatic org.mockito.Mockito.*; import org.junit.Test; import java.awt.List; import java.util.LinkedList; publicclass Demo_1mockito {
//@Test publicvoid test() { //fail("Not yet implemented"); List mockedList =mock(List.class);//创建 mock mockedList.add("one");//使用mock,增加一个one元素 //mockedList.clear(); mockedList.add("1"); // verification verify(mockedList).add("one");//验证一个one元素 //verify(mockedList).clear(); verify(mockedList).add("1");//如果把"1"改为"2",则会出错 } //@Test publicvoid test1() { //你可以mock具体的实例,而不仅仅是接口 @SuppressWarnings("rawtypes") LinkedListmockedList = mock(LinkedList.class); // stubbing when(mockedList.get(0)).thenReturn("first"); when(mockedList.get(1)).thenReturn("second"); when(mockedList.get(2)).thenThrow(new RuntimeException()); System.out.println(mockedList.get(0));//这个会打印出first System.out.println(mockedList.get(1));//这个会打印出second //System.out.println(mockedList.get(2)); //这个会抛出异常 System.out.println(mockedList.get(999));//这个会打印出null verify(mockedList,atMost(2)).get(0); //最多调用了两次 } //@Test publicvoid test2(){ @SuppressWarnings("rawtypes") LinkedList mockedList =mock(LinkedList.class); when(mockedList.get(0)).thenReturn("first"); when(mockedList.get(0)).thenReturn("oops"); System.out.println(mockedList.get(0));//打印出oops System.out.println(mockedList.get(0));//打印出oops } @Test publicvoid test3(){ @SuppressWarnings("unchecked") LinkedList<String> mockedList =mock(LinkedList.class); //using mock mockedList.add("once"); mockedList.add("twice"); mockedList.add("twice"); mockedList.add("three times"); mockedList.add("three times"); mockedList.add("three times"); //following two verifications work exactly the same - times(1) is used by default verify(mockedList).add("once"); verify(mockedList,times(1)).add("once"); //exact number of invocations verification verify(mockedList,times(2)).add("twice"); verify(mockedList,times(3)).add("three times"); //verification using never(). never() is an alias to times(0) verify(mockedList,never()).add("never happened"); //verification using atLeast()/atMost() verify(mockedList,atLeastOnce()).add("three times"); verify(mockedList,atLeast(0)).add("five times"); verify(mockedList,atMost(5)).add("three times"); } } |
通过test()代码介绍了mock,使用Mockito的静态方法mock,我们就可以创建一个类的mock实例,这个mock实例拥有List或LinkedList的所有方法接口,并且给这些方法以最基本的实现。对于test()类,在验证阶段,当我们验证这个mock的方法add("one")是否被调用的时候,他不会抛出异常,因为我们确实调用了这个方法,但是当我们验证它是否调用add("2")的时候,就会抛出异常,说明我们没有调用过这个方法,此时的测试就会失败。但是,中间的二句调用了mock的方法,即使将来不验证也没有任何关系。
这里还需要注意以下几点:1. mock实例默认的会给所有的方法添加基本实现:返回null或空集合,或者0等基本类型的值。2.当我们连续两次为同一个方法使用stub的时候,他只会只用最新的一次。3.一旦这个方法被stub了,就会一直返回这个stub的值。
二、mrunit测试Mapper程序
接下来,学习开发MapReduce程序过程。首先写map函数和reduce函数,最好使用单元测试来确保函数的运行符合预期,然后,写一个驱动程序来运行作业,要看这个驱动程序是否可以运行,之后利用本地IDE调试,修改程序。以下程序可以在IDE Eclipse中运行
2.1Mapper函数:
publicclass MaxTemperatureMapperextends Mapper<LongWritable, Text, Text, IntWritable> { @Override publicvoid map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); String temp = line.substring(87, 92); if (!missing(temp)) { int airTemperature = Integer.parseInt(temp); context.write(new Text(year),new IntWritable(airTemperature)); } } privateboolean missing(String temp) { return temp.equals("+9999"); } } |
Mapper的测试实例:
import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.junit.*; publicclass MaxTemperatureMapperTest { @Test publicvoid processesValidRecord()throws IOException, InterruptedException { Text value =new Text( "0043011990999991950051518004+68750+023550FM-12+0382" + // Year ^^^^ "99999V0203201N00261220001CN9999999N9-00111+99999999999"); // Temperature ^^^^^ new MapDriver<LongWritable, Text, Text, IntWritable>() .withMapper(new MaxTemperatureMapper()).withInputValue(value) .withOutput(new Text("1950"),new IntWritable(-11)).runTest(); } } |
2.2 Reducer函数:
publicclass MaxTemperatureReducerextends Reducer<Text, IntWritable, Text, IntWritable> { @Override publicvoid reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key,new IntWritable(maxValue)); } } |
Reducer的测试函数:
import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.junit.*; publicclass MaxTemperatureMapperTest { @Test publicvoid returnsMaximumIntegerInValues()throws IOException, InterruptedException { new ReduceDriver<Text, IntWritable, Text, IntWritable>() .withReducer(new MaxTemperatureReducer()) .withInputKey(new Text("1950")) .withInputValues( Arrays.asList(new IntWritable(10),new IntWritable(5))) .withOutput(new Text("1950"),new IntWritable(10)).runTest(); } } |
2.3本地运行测试数据:本地运行Job
Job驱动程序查找最高气温:
publicclass MaxTemperatureDriverextends Configuredimplements Tool { @Override publicint run(String[] args)throws Exception { if (args.length != 2) { System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName()); ToolRunner.printGenericCommandUsage(System.err); return -1; } Job job =new Job(getConf(),"Max temperature"); job.setJarByClass(getClass()); FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setCombinerClass(MaxTemperatureReducer.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); return job.waitForCompletion(true) ? 0 : 1; } publicstaticvoid main(String[] args)throws Exception { int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args); System.exit(exitCode); } } |
命令运行驱动程序:(在eclipse上把上面程序打包为MaxTemperatureDriver.jar)
# exportHADOOP_CLASSPATH=MaxTemperatureDriver.jar (注意:当前目录)
# hadoop mapreduce_test.MaxTemperatureDriver/usr/xjp/input/1929 output1 (注:1929是数据文件名)
或者bash-4.1$hadoop jar MaxTemperatureDriver.jar /usr/xjp/input/1929 output1
运行的结果是:
Parse函数:
publicclass NcdcRecordParser { privatestaticfinalintMISSING_TEMPERATURE = 9999; private Stringyear; privateintairTemperature; private Stringquality; publicvoid parse(String record) { year = record.substring(15, 19); String airTemperatureString; // Remove leading plus sign as parseInt doesn't like them if (record.charAt(87) =='+') { airTemperatureString = record.substring(88, 92); }else { airTemperatureString = record.substring(87, 92); } airTemperature = Integer.parseInt(airTemperatureString); quality = record.substring(92, 93); } publicvoid parse(Text record) { parse(record.toString()); } publicboolean isValidTemperature() { returnairTemperature !=MISSING_TEMPERATURE &&quality.matches("[01459]"); } public String getYear() { returnyear; } publicint getAirTemperature() { returnairTemperature; } } |
利用Parser函数Mapper函数可以写成下面形式:
publicclass MaxTemperatureMapperextends Mapper<LongWritable, Text, Text, IntWritable> { private NcdcRecordParserparser =new NcdcRecordParser();
@Override publicvoid map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { parser.parse(value); if (parser.isValidTemperature()) { context.write(new Text(parser.getYear()), new IntWritable(parser.getAirTemperature())); } } } |
测试驱动程序:
package mapreduce_test; importstatic org.hamcrest.CoreMatchers.is; //import static org.hamcrest.MatcherAssert.assertThat; importstatic org.junit.Assert.assertThat; importstatic org.hamcrest.CoreMatchers.nullValue;
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileUtil; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.PathFilter; import org.junit.Test;
publicclass MaxTemperatureDriverTest {
publicstaticclass OutputLogFilterimplements PathFilter { publicboolean accept(Path path) { return !path.getName().startsWith("_"); } }
@Test publicvoid test()throws Exception { Configuration conf =new Configuration(); conf.set("fs.default.name","file:///"); conf.set("mapred.job.tracker","local");
Path input =new Path("input"); Path output =new Path("output");
FileSystem fs = FileSystem.getLocal(conf); fs.delete(output,true);// delete old output
MaxTemperatureDriver driver =new MaxTemperatureDriver(); driver.setConf(conf); int exitCode = driver.run(new String[] { input.toString(), output.toString() }); assertThat(exitCode,is(0)); checkOutput(conf, output); } privatevoid checkOutput(Configuration conf, Path output)throws IOException { FileSystem fs = FileSystem.getLocal(conf); Path[] outputFiles = FileUtil.stat2Paths( fs.listStatus(output,new OutputLogFilter())); assertThat(outputFiles.length,is(1)); BufferedReader actual = asBufferedReader(fs.open(outputFiles[0])); BufferedReader expected = asBufferedReader(fs.open(new Path("expected.txt"))); String expectedLine; while ((expectedLine = expected.readLine()) != null) { assertThat(actual.readLine(),is(expectedLine)); } assertThat(actual.readLine(),nullValue()); actual.close(); expected.close(); }
private BufferedReader asBufferedReader(InputStream in) throws IOException { returnnew BufferedReader(new InputStreamReader(in)); } } |
注:上面的测试程序通过,需保证增加了如下包:
需要关注的是,checkOutput()方法被调用用以逐行对比实际输出与与其输出。另外上面的这段程序是在linux环境下运行的,如果是在windows下的eclipse上运行上面那段程序,会出现下面的错误:
解决办法是我们需要在Windows系统上安装一个linux模拟器“cygwin”来支持程序的运行。在windows上安装好cygwin,然后在环境变量中添加cygwin的bin目录,比如“C:\Program Files\cygwin\bin”,问题得以解决。(cygwin是一个在windows平台上运行的unix模拟环境,它对于学习unix/linux操作环境,或者从unix到windows的应用程序移植,或者进行某些特殊的开发工作,尤其是使用gnu工具集在windows上进行嵌入式系统开发,非常有用。)
在IDE(Eclipse)上测试通过之后,在伪分布平台上测试,然后小数据在全分布平台上测试,之后使用整个数据
参考书籍:Hadoop-The Definitive Guide-3rd