Hadoop权威指南学习（三）——MapReduce应用开发

最新推荐文章于 2021-04-07 10:56:11 发布

lifeising

最新推荐文章于 2021-04-07 10:56:11 发布

阅读量1.9k

点赞数

分类专栏： Hadoop 文章标签： mapreduce hadoop output 作业单元测试 exception

本文链接：https://blog.csdn.net/lazy0zz/article/details/6943547

版权

Hadoop 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

开发MapReduce程序，有一个特定流程：1.写map和reduce函数，并经过单元测试；2. 编写本地测试程序运行作业；3. 在集群上运行，使用IsolationRunner在失败的相同输入数据上运行任务；4. 优化调整，任务剖析，Hadoop提供钩子（hook）辅助分析。

1. 单元测试

import static org.mockito.Mockito.*;	// 使用mock建立模拟
public class MapperTest {
	@Test
	public void test() {
		Mapper mapper = new Mapper();
		Test value ="...";
		OutputCollector<Text, IntWriteable> output = mock(OutputCollector.class);
		mapper.map(null, value, output, null);
		verify(output).collect(new Test(".."), new IntWriteable(..));
		// 缺失值测试
		// verify(output, nerver).collect(any(Text.class), any(IntWriteable.class));
	}
}

2. 本地测试

public class Driver extends Configured implements Tool {
	@Override
	public int run(String[] args) throws Exception {
		// 配置jobConf, 输入输出路径，map和reduce类
		JobClient.runJob(conf);
		return 0;
	}
}

public class DriverTest {
	@Test
	public void test() {
		JobConf conf = new JobConf();
		conf.set("fs.default.name", "file:///");	// 本地文件系统
		conf.set("mapred.job.tracker", "local");	// 本地运行器
		FileSystem fs = FileSystem.getLocal(conf);
		fs.delete(output, true); // delete old output
		Driver driver = new Driver();
		driver.setConf(conf);
		int res = driver.run(new String[]{...});
		checkOutput(conf, output);	// 逐行对比实际输出与预期输出
	}
}

3. 作业调试（在集群上运行：利用 hadoop jar xx.jar mainClass args运行）

System.err.println("error");	// 输出到日志中，可通过Web UI查看
reporter.setStatus("...");	// 设置Task的status
reporter.incrCounter(...);	// 设置Task的counter

任何到标准输出或标准错误流的写操作都直接写到日志相关文件（Streaming方式标准输出被用于map或reduce的输出）

使用远程调试器：IsolationRunner

4. 作业调优

mapper数量，reducer数量，cominer，中间值压缩，自定义序列，调整shuffle

5. MapReduce工作流

将一个问题分解成多个mapreduce作业来执行： 1. 可以将一个mapper实现的功能分割到不同的mapper中，使用Hadoop自带的ChainMapper类库将其连接成一个mapper，再结合ChainReducer； 2. 运行多个作业时，可使用现行的作业链或者有向无环图（DAG）控制作业顺序执行，如使用JobControl。

lifeising

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop权威指南学习（三）——MapReduce应用开发

开发MapReduce程序，有一个特定流程：1.写map和reduce函数，并经过单元测试；2. 编写本地测试程序运行作业；3. 在集群上运行，使用IsolationRunner在失败的相同输入数据上运行任务；4. 优化调整，任务剖析，Hadoop提供钩子（hook）辅助分析。1. 单元测试import static org.mockito.Mockito.*; // 使用mock建立
复制链接

扫一扫