MapReduce应用开发
用于配置的API
加载&读取配置
Configuration类
Configuration conf=new Configuration();
conf.addResource(“configuration_1.xml”);
assertThat(conf.get(“color”),is(“yellow”));
资源合并
Configuration conf=new Configuration();
conf.addResource(“configuration_1.xml”);
conf.addResource(“configuration_2.xml”);
变量扩展
1、用其他属性扩展
assertThat(conf.get(“size-weight”),is(“12,heavy”));
2、系统属性优先级高于资源文件中的定义的属性:
System.setProperty(“size”,“15”);
assertThat(conf.get(“size-weight”),is(“14,heavy”));
配置开发环境
编译和测试MapReduce应用的Maven POM
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>com.hadoopbook</groupId>
<artifactId>hadoop-book-mr-dev</artifactId>
<version>4.0</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEconding>
<hadoop.version>2.5.1</hadoop.version>
</properties>
<dependencies>
<!--Hadoop main client artifact-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!--Unit test artfacts-->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>1.1.0</version>
<classifier>hadoop2</classifier>
<scope>test</scope>
</dependency>
<!--Hadoop test artfact for running mimi clusters -->
<dependency>
<groupId>.arg.apache.hadoop</groupId>
<artifactId>hadoop-mimicluster<artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<finalName>hadoop-examples</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<gourpId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.5</version>
<configuration>
<outputDirectory>$<baseDir></outputDirectory>
</configuration>
</plugin>
</plugins>
</build>
</project>
MapReduce Combiner
默认是不启动的;
本质是Reducer,Combiner与Reducer的区别是运行位置不同;
MapReduce编程指南:
编程技巧:
执行流程:
map阶段
逻辑切片,按行读取、map方法处理、分区partition、内存缓冲区溢出(排序sort)、merge合并
reduce阶段
复制拉取,合并merge,排序、调用reduce方法
key:
排序:a-z字典序
分区:key.hashcode%reducetask个数
分组:key相同的一组
mr
自定义对象序列化
自定义排序
自定义分区
自定义分组
自定义分组扩展:topN