按照API上的说明:
/**
* The ChainMapper class allows to use multiple Mapper classes within a single
* Map task.
* <p/>
* The Mapper classes are invoked in a chained (or piped) fashion, the output of
* the first becomes the input of the second, and so on until the last Mapper,
* the output of the last Mapper will be written to the task's output.
* <p/>
* The key functionality of this feature is that the Mappers in the chain do not
* need to be aware that they are executed in a chain. This enables having
* reusable specialized Mappers that can be combined to perform composite
* operations within a single task.
* <p/>
* Special care has to be taken when creating chains that the key/values output
* by a Mapper are valid for the following Mapper in the chain. It is assumed
* all Mappers and the Reduce in the chain use maching output and input key and
* value classes as no conversion is done by the chaining code.
* <p/>
* Using the ChainMapper and the ChainReducer classes is possible to compose
* Map/Reduce jobs that look like <code>[MAP+ / REDUCE MAP*]</code>. And
* immediate benefit of this pattern is a dramatic reduction in disk IO.
* <p/>
* IMPORTANT: There is no need to specify the output key/value classes for the
* ChainMapper, this is done by the addMapper for the last mapper in the chain.
* <p/>
**/
实例代码:
package com.joey.mapred.chainjobs;
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text