分析Hadoop自带WordCount例子的执行过程

最新推荐文章于 2024-08-04 11:07:52 发布

hitmediaman

最新推荐文章于 2024-08-04 11:07:52 发布

阅读量2.4k

点赞数

分类专栏： hadoop 文章标签： hadoop string java null properties hashmap

hadoop 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

在Hadoop的发行包中也附带了例子的源代码，WordCount.java类的主函数实现如下所示：

Java代码

public static void main(String[] args) throws Exception {
int res = ToolRunner.run( new Configuration(), new WordCount(), args);
System.exit(res);
}
}

public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new WordCount(), args);
    System.exit(res);
}

}

我们先从主函数入手吧，一点点地按照“深度遍历”的思想，分解掉这个WordCount字频统计工具，让我们更清晰地看到到底在Hadoop中是如何进行工作的。

首先从ToolRunner的run方法开始，run方法需要三个参数，第一个是一个Configuration类的实例。第二个是 WorCount类的实例，args就是从控制台接收到的命令行数组。可见，估计分析到我们的WordCount还非常非常的远，因为 Configuration类与args数组就够追踪一会了。

下面是ToolRunner的run方法的实现：

Java代码

public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if (conf == null ) { // 即使传入的conf为null，仍然会在这里实例化一个配置类Configuration的对象
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args); // 根据指定的conf和args数组实例化一个GenericOptionsParser类的对象，构造GenericOptionsParser类对象能实现对Hadoop通用的配置信息进行解析
// Tool类是一个接口，WordCount工具就是实现了Tool接口，Tool接口中只是定义了一个run方法，即实现一个Tool必须要知道这个Tool的实现类的对象怎样run。
// 因为Tool接口实现了Configurable接口，在Configurable接口中可以为一个Tool设置初始化配置，即使用setConf()方法
tool.setConf(conf);
//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs(); // 返回从控制台输入的命令行参数的数组
return tool.run(toolArgs); // 根据toolArgs数组指定的命令启动WordCount实例运行，返回实现Tool接口的实现类的对象的执行状态码
}

public static int run(Configuration conf, Tool tool, String[] args) 
    throws Exception{
    if(conf == null) { // 即使传入的conf为null，仍然会在这里实例化一个配置类Configuration的对象
      conf = new Configuration();
    }
    GenericOptionsParser parser = new GenericOptionsParser(conf, args); // 根据指定的conf和args数组实例化一个GenericOptionsParser类的对象，构造GenericOptionsParser类对象能实现对Hadoop通用的配置信息进行解析
    // Tool类是一个接口，WordCount工具就是实现了Tool接口，Tool接口中只是定义了一个run方法，即实现一个Tool必须要知道这个Tool的实现类的对象怎样run。

    // 因为Tool接口实现了Configurable接口，在Configurable接口中可以为一个Tool设置初始化配置，即使用setConf()方法
    tool.setConf(conf);
    
    //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs(); // 返回从控制台输入的命令行参数的数组
    return tool.run(toolArgs); // 根据toolArgs数组指定的命令启动WordCount实例运行，返回实现Tool接口的实现类的对象的执行状态码
}

上面的run方法应该是执行WordCount例子的最高层的方法，最抽象了。

程序一开始，首先要解析Hadoop配置文件，对应于Hadoop根目录下的conf目录下。其中的配置类为Configuration，构造一个Configuration对象，使用如下所示构造方法：

Java代码

public Configuration() {
if (LOG.isDebugEnabled()) {
LOG.debug(StringUtils.stringifyException(new IOException( "config()" )));
}
resources.add("hadoop-default.xml" );
resources.add("hadoop-site.xml" );
}

  public Configuration() {
    if (LOG.isDebugEnabled()) {
      LOG.debug(StringUtils.stringifyException(new IOException("config()")));
    }
    resources.add("hadoop-default.xml");
    resources.add("hadoop-site.xml");
}

实例化一个Configuration对象，就是将conf目录中的hadoop-default.xml和hadoop-site.xml配置文件加入到private ArrayList<Object> resources中，以便再进一步解析。

真正解析Hadoop的配置文件的是一个GenericOptionsParser通用选项解析器类，需要提供一个Configuration对象的，同时指定一个命令行参数数组。

如下是GenericOptionsParser类的构造方法：

Java代码

public GenericOptionsParser(Configuration conf, String[] args) {
this (conf, new Options(), args); // 这里额外又多增加了一个Options对象作为参数
}

 public GenericOptionsParser(Configuration conf, String[] args) {
    this(conf, new Options(), args); // 这里额外又多增加了一个Options对象作为参数
}

Options类是一个选项对象的集合，用于描述在应用中可能使用到的命令行参数。可以通过查看Options类的构造方法：

Java代码

public Options()
{
// nothing to do
}

  public Options()
    {
        // nothing to do
    }

其实，什么也没有做。然而，可以动态为一个Options对象添加指定的选项的。

又调用了GenericOptionsParser类的另一个构造方法，如下所示：

Java代码

public GenericOptionsParser(Configuration conf, Options options, String[] args) {
parseGeneralOptions(options, conf, args);
}

  public GenericOptionsParser(Configuration conf, Options options, String[] args) {
    parseGeneralOptions(options, conf, args);
}

继续调用GenericOptionsParser类的成员方法parseGeneralOptions()来进一步解析配置选项：

Java代码

/**
* Parse the user-specified options, get the generic options, and modify
* configuration accordingly
* @param conf Configuration to be modified
* @param args User-specified arguments
* @return Command-specific arguments
*/
rivate String[] parseGeneralOptions(Options opts, Configuration conf,
String[] args) {
opts = buildGeneralOptions(opts);
CommandLineParser parser = new GnuParser();
try {
commandLine = parser.parse(opts, args, true );
processGeneralOptions(conf, commandLine);
return commandLine.getArgs();
} catch (ParseException e) {
LOG.warn("options parsing failed: " +e.getMessage());
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp("general options are: " , opts);
}
return args;

 /**
   * Parse the user-specified options, get the generic options, and modify
   * configuration accordingly
   * @param conf Configuration to be modified
   * @param args User-specified arguments
   * @return Command-specific arguments
   */
private String[] parseGeneralOptions(Options opts, Configuration conf, 
      String[] args) {
    opts = buildGeneralOptions(opts);
    CommandLineParser parser = new GnuParser();
    try {
      commandLine = parser.parse(opts, args, true);
      processGeneralOptions(conf, commandLine);
      return commandLine.getArgs();
    } catch(ParseException e) {
      LOG.warn("options parsing failed: "+e.getMessage());

      HelpFormatter formatter = new HelpFormatter();
      formatter.printHelp("general options are: ", opts);
    }
    return args;
}

其中，commandLine是GenericOptionsParser类的一个私有成员变量。

上面GenericOptionsParser类的成员方法parseGeneralOptions()可以作为解析Hadoop配置选项的一个高层的抽象方法了。

其中的buildGeneralOptions()接收Options opts然后又返回了opts，如下所示：

Java代码

/**
* Specify properties of each generic option
*/
@SuppressWarnings ( "static-access" )
private Options buildGeneralOptions(Options opts) {
Option fs = OptionBuilder.withArgName("local|namenode:port" )
.hasArg()
.withDescription("specify a namenode" )
.create("fs" );
Option jt = OptionBuilder.withArgName("local|jobtracker:port" )
.hasArg()
.withDescription("specify a job tracker" )
.create("jt" );
Option oconf = OptionBuilder.withArgName("configuration file" )
.hasArg()
.withDescription("specify an application configuration file" )
.create("conf" );
Option property = OptionBuilder.withArgName("property=value" )
.hasArgs()
.withArgPattern("=" , 1 )
.withDescription("use value for given property" )
.create('D' );
opts.addOption(fs);
opts.addOption(jt);
opts.addOption(oconf);
opts.addOption(property);
return opts;
}

 /**
   * Specify properties of each generic option
   */
@SuppressWarnings("static-access")
private Options buildGeneralOptions(Options opts) {
    Option fs = OptionBuilder.withArgName("local|namenode:port")
    .hasArg()
    .withDescription("specify a namenode")
    .create("fs");
    Option jt = OptionBuilder.withArgName("local|jobtracker:port")
    .hasArg()
    .withDescription("specify a job tracker")
    .create("jt");
    Option oconf = OptionBuilder.withArgName("configuration file")
    .hasArg()
    .withDescription("specify an application configuration file")
    .create("conf");
    Option property = OptionBuilder.withArgName("property=value")
    .hasArgs()
    .withArgPattern("=", 1)
    .withDescription("use value for given property")
    .create('D');

    opts.addOption(fs);
    opts.addOption(jt);
    opts.addOption(oconf);
    opts.addOption(property);
    
    return opts;
}

这里说明一下Option类及其如何设置一个Option类的实例。

在buildGeneralOptions()方法接收Options opts然后又返回了opts，在这个过程中已经改变了opts的值。如下所示：

Java代码

/**
* Specify properties of each generic option
*/
@SuppressWarnings ( "static-access" )
private Options buildGeneralOptions(Options opts) {
Option fs = OptionBuilder.withArgName("local|namenode:port" )
.hasArg()
.withDescription("specify a namenode" )
.create("fs" );
Option jt = OptionBuilder.withArgName("local|jobtracker:port" )
.hasArg()
.withDescription("specify a job tracker" )
.create("jt" );
Option oconf = OptionBuilder.withArgName("configuration file" )
.hasArg()
.withDescription("specify an application configuration file" )
.create("conf" );
Option property = OptionBuilder.withArgName("property=value" )
.hasArgs()
.withArgPattern("=" , 1 )
.withDescription("use value for given property" )
.create('D' );
opts.addOption(fs);
opts.addOption(jt);
opts.addOption(oconf);
opts.addOption(property);
return opts;
}

  /**
   * Specify properties of each generic option
   */
@SuppressWarnings("static-access")
private Options buildGeneralOptions(Options opts) {
    Option fs = OptionBuilder.withArgName("local|namenode:port")
    .hasArg()
    .withDescription("specify a namenode")
    .create("fs");
    Option jt = OptionBuilder.withArgName("local|jobtracker:port")
    .hasArg()
    .withDescription("specify a job tracker")
    .create("jt");
    Option oconf = OptionBuilder.withArgName("configuration file")
    .hasArg()
    .withDescription("specify an application configuration file")
    .create("conf");
    Option property = OptionBuilder.withArgName("property=value")
    .hasArgs()
    .withArgPattern("=", 1)
    .withDescription("use value for given property")
    .create('D');

    opts.addOption(fs);
    opts.addOption(jt);
    opts.addOption(oconf);
    opts.addOption(property);
    
    return opts;
}

开始传进来一个opts，它并没有任何内容(是指Option类的对象，即一个选项)，因为从开始实例化就没有配置过Options opts。但是，在上面代码的后面部分，已经为opts设置内容了，其实就是设置添加Option类的对象到Options中去。

看看具体都添加了一些什么信息。拿出一项来看看：

Java代码

Option fs = OptionBuilder.withArgName( "local|namenode:port" )
.hasArg()
.withDescription("specify a namenode" )
.create("fs" );
opts.addOption(fs);

     Option fs = OptionBuilder.withArgName("local|namenode:port")
    .hasArg()
    .withDescription("specify a namenode")
    .create("fs");

     opts.addOption(fs);

Option代表了一个命令行，我们看一下Option类的定义：

Java代码

package org.apache.commons.cli;
import java.util.ArrayList;
import java.util.regex.Pattern;
public class Option {
// 参数值没有被指定时，用-1表示
public static final int UNINITIALIZED = - 1 ;
// 参数值为无穷时使用-2表示
public static final int UNLIMITED_VALUES = - 2 ;
// 标识一个Option的字符串名称
private String opt;
// Option使用长名称表示
private String longOpt;
// 表明一个Option是否有一个相关的参数
private boolean hasArg;
// 表明一个Option的参数的名称
private String argName = "arg" ;
// 一个Option的描述信息
private String description;
// 一个Option是否是必须指定的
private boolean required;
// 是否一个Option的参数值是可选的
private boolean optionalArg;
// 一个Option可以具有参数值的个数
private int numberOfArgs = UNINITIALIZED;
// 一个Option的类型
private Object type;
// 参数值列表
private ArrayList values = new ArrayList();
// 指定用作分隔符的字符
private char valuesep;
// 参数样式及其它出现的次数
private Pattern argPattern;
private int limit;
/**
* 构造一个Option.
*
* @param opt 标识一个Option的名称
* @param description 一个Option的描述信息
*/
public Option(String opt, String description)
throws IllegalArgumentException
{
this (opt, null , false , description); // 调用其他构造方法
}
// 另一种构造方法
public Option(String opt, boolean hasArg, String description)
throws IllegalArgumentException
{
this (opt, null , hasArg, description);
}
// 还是构造一个Option
public Option(String opt, String longOpt, boolean hasArg,
String description)
throws IllegalArgumentException
{
// 验证一个Option是合法的
OptionValidator.validateOption(opt);
this .opt = opt;
this .longOpt = longOpt;
// if hasArg is set then the number of arguments is 1
if (hasArg)
{
this .numberOfArgs = 1 ;
}
this .hasArg = hasArg;
this .description = description;
}
// 返回Option的ID
public int getId()
{
return getKey().charAt( 0 );
}
/**
* Returns the 'unique' Option identifier.
*
* @return the 'unique' Option identifier
*/
String getKey()
{
// if 'opt' is null, then it is a 'long' option
if (opt == null )
{
return this .longOpt;
}
return this .opt;
}
/**
* 返回一个Option的name
*/
public String getOpt()
{
return this .opt;
}
/**
* 返回一个Option的类型
*/
public Object getType()
{
return this .type;
}
/**
* 设置一个Option的类型
*/
public void setType(Object type)
{
this .type = type;
}
/**
* 返回一个Option的长名称
*/
public String getLongOpt()
{
return this .longOpt;
}
/**
* 设置一个Option的长名称
*/
public void setLongOpt(String longOpt)
{
this .longOpt = longOpt;
}
/**
* 设置一个Option是否具有一个可选的参数
*/
public void setOptionalArg( boolean optionalArg)
{
this .optionalArg = optionalArg;
}
/**
* 返回一个Option的是否具有可选的参数
*/
public boolean hasOptionalArg()
{
return this .optionalArg;
}
/**
* 是否Option具有一个长名称
*/
public boolean hasLongOpt()
{
return ( this .longOpt != null );
}
/**
* 是否一个Option有一个必需的参数
*/
public boolean hasArg()
{
return ( this .numberOfArgs > 0 ) || (numberOfArgs == UNLIMITED_VALUES);
}
// 返回一个Option的描述信息
public String getDescription()
{
return this .description;
}
// 设置一个Option的描述信息
public void setDescription(String description)
{
this .description = description;
}
// 是否一个Option需要指定一个参数
public boolean isRequired()
{
return this .required;
}
// 设置一个Option的参数是否必需
public void setRequired( boolean required)
{
this .required = required;
}
// 设置这个参数值的显示名称
public void setArgName(String argName)
{
this .argName = argName;
}
// 返回这个参数值的显示名称
public String getArgName()
{
return this .argName;
}
// 是否这个参数值的显示名称已经被设置了
public boolean hasArgName()
{
return ( this .argName != null && this .argName.length() > 0 );
}
// 是否一个Option可以具有多个参数值
public boolean hasArgs()
{
return ( this .numberOfArgs > 1 )
|| (this .numberOfArgs == UNLIMITED_VALUES);
}
// 设置一个Option具有的参数值的个数
public void setArgs( int num)
{
this .numberOfArgs = num;
}
// 设置值的分隔符字符
public void setValueSeparator( char sep)
{
this .valuesep = sep;
}
// 返回值的分隔符字符
public char getValueSeparator()
{
return this .valuesep;
}
// 是否一个Option指定了值的分隔符字符
public boolean hasValueSeparator()
{
return ( this .valuesep > 0 );
}
// 一个Option是否指定多了参数的样式
public boolean hasArgPattern()
{
return (limit!= 0 &&argPattern!= null );
}
public void setArgPattern( String argPattern, int limit )
{
if (argPattern== null || argPattern.length()== 0 || limit== 0 )
return ;
this .argPattern = Pattern.compile(argPattern);
this .limit = limit;
}
// 返回一个Option具有参数的个数
public int getArgs()
{
return this .numberOfArgs;
}
// 设置一个Option的值
void addValue(String value)
{
switch (numberOfArgs)
{
case UNINITIALIZED:
throw new RuntimeException( "NO_ARGS_ALLOWED" );
default :
processValue(value);
}
}
// 检查参数样式
private void checkArgPattern( String arg ) {
if (!hasArgPattern()) {
add(arg);
} else {
String [] tokens = argPattern.split(arg, -1 );
if (tokens.length != limit+ 1 )
throw new RuntimeException( "ARG_PATTERN_NOT_MATCH" );
for ( int i= 0 ; i<= limit; i++) {
add(tokens[i]);
}
}
}
// 处理一个Option的值
private void processValue(String value)
{
// this Option has a separator character
if (hasValueSeparator())
{
// get the separator character
char sep = getValueSeparator();
// store the index for the value separator
int index = value.indexOf(sep);
// while there are more value separators
while (index != - 1 )
{
// next value to be added
if (values.size()/(limit+ 1 ) == (numberOfArgs - 1 ))
{
break ;
}
// store
checkArgPattern(value.substring(0 , index));
// parse
value = value.substring(index + 1 );
// get new index
index = value.indexOf(sep);
}
}
// check if the argment matches specified pattern; if yes,
// store the actual value or the last value that has been parsed
checkArgPattern(value);
}
// 向一个Option添加值，如果参数的个数大于0并且有足够的列表的时候才可以添加
private void add(String value)
{
if ((numberOfArgs > 0 ) && (values.size() > (numberOfArgs - 1 )))
{
throw new RuntimeException( "Cannot add value, list full." );
}
this .values.add(value);
}
// 返回Option的值
public String getValue()
{
return hasNoValues() ? null : (String) this .values.get( 0 );
}
// 返回指定的Option的值
public String getValue( int index)
throws IndexOutOfBoundsException
{
return hasNoValues() ? null : (String) this .values.get(index);
}
// 返回一个Option的值，或者第一个值，如果它没有值就返回一个默认值
public String getValue(String defaultValue)
{
String value = getValue();
return (value != null ) ? value : defaultValue;
}
// 以一个字符串数组的形式返回一个Option的所有值
public String[] getValues()
{
return hasNoValues()
? null : (String[]) this .values.toArray( new String[] { });
}
// 以列表的形式返回一个Option的所有值
public java.util.List getValuesList()
{
return this .values;
}
// 用于调试使用的
public String toString()
{
StringBuffer buf = new StringBuffer().append( "[ option: " );
buf.append(this .opt);
if ( this .longOpt != null )
{
buf.append(" " ).append( this .longOpt);
}
buf.append(" " );
if (hasArg)
{
buf.append("+ARG" );
}
buf.append(" :: " ).append( this .description);
if ( this .type != null )
{
buf.append(" :: " ).append( this .type);
}
buf.append(" ]" );
return buf.toString();
}
// 一个Option是否可以是任意值
private boolean hasNoValues()
{
return this .values.size() == 0 ;
}
}

package org.apache.commons.cli;

import java.util.ArrayList;
import java.util.regex.Pattern;

public class Option {

   // 参数值没有被指定时，用-1表示 
    public static final int UNINITIALIZED = -1;

    // 参数值为无穷时使用-2表示
    public static final int UNLIMITED_VALUES = -2;

    //   标识一个Option的字符串名称
    private String opt;

    // Option使用长名称表示
    private String longOpt;

    // 表明一个Option是否有一个相关的参数
    private boolean hasArg;

    // 表明一个Option的参数的名称 
    private String argName = "arg";

    // 一个Option的描述信息
    private String description;

    // 一个Option是否是必须指定的
    private boolean required;

    // 是否一个Option的参数值是可选的
    private boolean optionalArg;

    // 一个Option可以具有参数值的个数 
     private int numberOfArgs = UNINITIALIZED;

    // 一个Option的类型
    private Object type;

    // 参数值列表
    private ArrayList values = new ArrayList();

    // 指定用作分隔符的字符
    private char valuesep;

    // 参数样式及其它出现的次数 
    private Pattern argPattern;
    private int limit;

    /**
     * 构造一个Option.
     *
     * @param opt 标识一个Option的名称 
     * @param description   一个Option的描述信息
     */
    public Option(String opt, String description)
           throws IllegalArgumentException
    {
        this(opt, null, false, description); // 调用其他构造方法
    }

    // 另一种构造方法
    public Option(String opt, boolean hasArg, String description)
           throws IllegalArgumentException
    {
        this(opt, null, hasArg, description);
    }

    // 还是构造一个Option
    public Option(String opt, String longOpt, boolean hasArg, 
                  String description)
           throws IllegalArgumentException
    {
        // 验证一个Option是合法的
        OptionValidator.validateOption(opt);

        this.opt = opt;
        this.longOpt = longOpt;

       // if hasArg is set then the number of arguments is 1
        if (hasArg)
        {
            this.numberOfArgs = 1;
        }

        this.hasArg = hasArg;
        this.description = description;
    }

    // 返回Option的ID
    public int getId()
    {
        return getKey().charAt(0);
    }

   /**
     * Returns the 'unique' Option identifier.
     * 
     * @return the 'unique' Option identifier
     */
    String getKey()
    {
        // if 'opt' is null, then it is a 'long' option
        if (opt == null)
        {
            return this.longOpt;
        }

        return this.opt;
    }

    /** 
     * 返回一个Option的name
     */
    public String getOpt()
    {
        return this.opt;
    }

    /** 
     * 返回一个Option的类型
     */
    public Object getType()
    {
        return this.type;
    }

   /** 
     * 设置一个Option的类型
     */
    public void setType(Object type)
    {
        this.type = type;
    }

    /** 
     * 返回一个Option的长名称
     */
    public String getLongOpt()
    {
        return this.longOpt;
    }

    /** 
     * 设置一个Option的长名称
     */
    public void setLongOpt(String longOpt)
    {
        this.longOpt = longOpt;
    }

    /** 
     * 设置一个Option是否具有一个可选的参数
     */
    public void setOptionalArg(boolean optionalArg)
    {
        this.optionalArg = optionalArg;
    }

    /** 
     * 返回一个Option的是否具有可选的参数
     */
    public boolean hasOptionalArg()
    {
        return this.optionalArg;
    }

    /** 
     * 是否Option具有一个长名称
     */
    public boolean hasLongOpt()
    {
        return (this.longOpt != null);
    }

    /** 
     * 是否一个Option有一个必需的参数
     */
    public boolean hasArg()
    {
        return (this.numberOfArgs > 0) || (numberOfArgs == UNLIMITED_VALUES);
    }
      // 返回一个Option的描述信息
    public String getDescription()
    {
        return this.description;
    }

      // 设置一个Option的描述信息
    public void setDescription(String description)
    {
        this.description = description;
    }

    // 是否一个Option需要指定一个参数
    public boolean isRequired()
    {
        return this.required;
    }

    // 设置一个Option的参数是否必需
    public void setRequired(boolean required)
    {
        this.required = required;
    }

    // 设置这个参数值的显示名称
    public void setArgName(String argName)
    {
        this.argName = argName;
    }

     // 返回这个参数值的显示名称
    public String getArgName()
    {
        return this.argName;
    }

     //   是否这个参数值的显示名称已经被设置了
      public boolean hasArgName()
    {
        return (this.argName != null && this.argName.length() > 0);
    }

     //   是否一个Option可以具有多个参数值
    public boolean hasArgs()
    {
        return (this.numberOfArgs > 1) 
                || (this.numberOfArgs == UNLIMITED_VALUES);
    }

     // 设置一个Option具有的参数值的个数
    public void setArgs(int num)
    {
        this.numberOfArgs = num;
    }

     // 设置值的分隔符字符
    public void setValueSeparator(char sep)
    {
        this.valuesep = sep;
    }

     // 返回值的分隔符字符
    public char getValueSeparator()
    {
        return this.valuesep;
    }

     //   是否一个Option指定了值的分隔符字符
    public boolean hasValueSeparator()
    {
        return (this.valuesep > 0);
    }

     //   一个Option是否指定多了参数的样式
    public boolean hasArgPattern()
    {
        return (limit!=0&&argPattern!=null);
    }

    public void setArgPattern( String argPattern, int limit )
    {
        if(argPattern==null || argPattern.length()==0 || limit==0 )
          return;
        this.argPattern = Pattern.compile(argPattern);
        this.limit = limit;
    }

     //   返回一个Option具有参数的个数
    public int getArgs()
    {
        return this.numberOfArgs;
    }

     // 设置一个Option的值
    void addValue(String value)
    {
        switch (numberOfArgs)
        {
        case UNINITIALIZED:
            throw new RuntimeException("NO_ARGS_ALLOWED");

        default:
            processValue(value);
        }
    }

     //   检查参数样式
    private void checkArgPattern( String arg ) {
      if(!hasArgPattern()) {
        add(arg);
      } else {
        String [] tokens = argPattern.split(arg, -1);
        if(tokens.length != limit+1)
          throw new RuntimeException("ARG_PATTERN_NOT_MATCH");
        for(int i=0; i<= limit; i++) {
          add(tokens[i]);
        }
      }
    }

     // 处理一个Option的值
    private void processValue(String value)
    {
        // this Option has a separator character
        if (hasValueSeparator())
        {
            // get the separator character
            char sep = getValueSeparator();

            // store the index for the value separator
            int index = value.indexOf(sep);

            // while there are more value separators
            while (index != -1)
            {
                // next value to be added 
                if (values.size()/(limit+1) == (numberOfArgs - 1))
                {
                    break;
                }


                // store
                checkArgPattern(value.substring(0, index));


                // parse
                value = value.substring(index + 1);


                // get new index
                index = value.indexOf(sep);
            }
        }


        // check if the argment matches specified pattern; if yes,
        // store the actual value or the last value that has been parsed
        checkArgPattern(value);
    }

    //   向一个Option添加值，如果参数的个数大于0并且有足够的列表的时候才可以添加
      private void add(String value)
    {
        if ((numberOfArgs > 0) && (values.size() > (numberOfArgs - 1)))
        {
            throw new RuntimeException("Cannot add value, list full.");
        }


        this.values.add(value);
    }

     //   返回Option的值
    public String getValue()
    {
        return hasNoValues() ? null : (String) this.values.get(0);
    }

     // 返回指定的Option的值
    public String getValue(int index)
        throws IndexOutOfBoundsException
    {
        return hasNoValues() ? null : (String) this.values.get(index);
    }

     // 返回一个Option的值，或者第一个值，如果它没有值就返回一个默认值     
    public String getValue(String defaultValue)
    {
        String value = getValue();

        return (value != null) ? value : defaultValue;
    }

     //   以一个字符串数组的形式返回一个Option的所有值
    public String[] getValues()
    {
        return hasNoValues()
               ? null : (String[]) this.values.toArray(new String[] { });
    }

     //   以列表的形式返回一个Option的所有值
    public java.util.List getValuesList()
    {
        return this.values;
    }

     //   用于调试使用的
    public String toString()
    {
        StringBuffer buf = new StringBuffer().append("[ option: ");

        buf.append(this.opt);

        if (this.longOpt != null)
        {
            buf.append(" ").append(this.longOpt);
        }

        buf.append(" ");

        if (hasArg)
        {
            buf.append("+ARG");
        }

        buf.append(" :: ").append(this.description);

        if (this.type != null)
        {
            buf.append(" :: ").append(this.type);
        }

        buf.append(" ]");

        return buf.toString();
    }

     //   一个Option是否可以是任意值
    private boolean hasNoValues()
    {
        return this.values.size() == 0;
    }
}

可以看出，一个Option所具有的信息很多：长名称(longOpt)、短名称(name)、类型(type)、样式(pattern)、参数个数(numberOfArgs)、参数值的字符分隔符、ID，描述等等。

只有设置好了这些Option的信息，调用private Options buildGeneralOptions(Options opts) 方法时候返回的Options可以被后面进行解析使用。

继续向下看：

Java代码

Option fs = OptionBuilder.withArgName( "local|namenode:port" )
.hasArg()
.withDescription("specify a namenode" )
.create("fs" );
opts.addOption(fs);

     Option fs = OptionBuilder.withArgName("local|namenode:port")
    .hasArg()
    .withDescription("specify a namenode")
    .create("fs");

     opts.addOption(fs);

有一个很重要的类OptionBuilder，它才完成了“充实”一个Option的过程，然后经过多次调用，会将多个Option都添加到opts列表中。

看一看OptionBuilder类的withArgName()方法：

Java代码

/**
* The next Option created will have the specified argument value
* name.
*
* @param name the name for the argument value
* @return the OptionBuilder instance
*/
public static OptionBuilder withArgName(String name)
{
OptionBuilder.argName = name;
return instance;
}

     /**
     * The next Option created will have the specified argument value 
     * name.
     *
     * @param name the name for the argument value
     * @return the OptionBuilder instance
     */
    public static OptionBuilder withArgName(String name)
    {
        OptionBuilder.argName = name;

        return instance;
    }

上面，为一个OptionBuilder的实例指定一个参数(argName)为name，实际上是返回了一个具有name的OptionBuilder实例。

然后，又调用了hasArg()方法，它也是OptionBuilder类的静态方法：

Java代码

/**
* The next Option created will require an argument value.
*
* @return the OptionBuilder instance
*/
public static OptionBuilder hasArg()
{
OptionBuilder.numberOfArgs = 1 ;
return instance;
}

     /**
     * The next Option created will require an argument value.
     *
     * @return the OptionBuilder instance
     */
    public static OptionBuilder hasArg()
    {
        OptionBuilder.numberOfArgs = 1;

        return instance;
    }

为刚才指定参数名的那个OptionBuilder实例设置了参数的个数，因为第一次设置，当然个数为1了。

调用withDescription()方法来设定描述信息：

Java代码

/**
* The next Option created will have the specified description
*
* @param newDescription a description of the Option's purpose
* @return the OptionBuilder instance
*/
public static OptionBuilder withDescription(String newDescription)
{
OptionBuilder.description = newDescription;
return instance;
}

     /**
     * The next Option created will have the specified description
     *
     * @param newDescription a description of the Option's purpose
     * @return the OptionBuilder instance
     */
    public static OptionBuilder withDescription(String newDescription)
    {
        OptionBuilder.description = newDescription;

        return instance;
    }

比较关键的是最后一步调用，通过调用OptionBuilder类的create()方法才真正完成了一个Option的创建：

Java代码

/**
* Create an Option using the current settings and with
* the specified Option <code>char</code>.
*
* @param opt the <code>java.lang.String</code> representation
* of the Option
* @return the Option instance
* @throws IllegalArgumentException if <code>opt</code> is not
* a valid character. See Option.
*/
public static Option create(String opt)
throws IllegalArgumentException
{
// create the option
Option option = new Option(opt, description);
// set the option properties
option.setLongOpt(longopt);
option.setRequired(required);
option.setOptionalArg(optionalArg);
option.setArgs(numberOfArgs);
option.setType(type);
option.setValueSeparator(valuesep);
option.setArgName(argName);
option.setArgPattern(argPattern, limit);
// reset the OptionBuilder properties
OptionBuilder.reset();
// return the Option instance
return option;
}

     /**
     * Create an Option using the current settings and with 
     * the specified Option <code>char</code>.
     *
     * @param opt the <code>java.lang.String</code> representation 
     * of the Option
     * @return the Option instance
     * @throws IllegalArgumentException if <code>opt</code> is not
     * a valid character. See Option.
     */
    public static Option create(String opt)
                         throws IllegalArgumentException
    {
        // create the option
        Option option = new Option(opt, description);


        // set the option properties
        option.setLongOpt(longopt);
        option.setRequired(required);
        option.setOptionalArg(optionalArg);
        option.setArgs(numberOfArgs);
        option.setType(type);
        option.setValueSeparator(valuesep);
        option.setArgName(argName);
        option.setArgPattern(argPattern, limit);


        // reset the OptionBuilder properties
        OptionBuilder.reset();

        // return the Option instance
        return option;
    }

从上面一个Option的设置，我们可以看出来，OptionBuilder类其实是一个辅助工具，用来收集与一个Option相关的信息，从而将这些信息一次全部赋予到一个新建的Option对象上，这个对象现在具有详细的信息了。

接着，通过CommandLineParser parser的parse方法，可以知道public abstract class Parser implements CommandLineParser，从抽象类Parser中找到parse的实现：

Java代码

public CommandLine parse(Options options, String[] arguments,
boolean stopAtNonOption)
throws ParseException
return parse(options, arguments, null , stopAtNonOption);

     public CommandLine parse(Options options, String[] arguments, 
                             boolean stopAtNonOption)
        throws ParseException
    {
        return parse(options, arguments, null, stopAtNonOption);
    }

参数stopAtNonOption表明，如果解析过程中遇到的是一个空选项是否仍然继续解析。从前面parseGeneralOptions方法中commandLine = parser.parse(opts, args, true);可知：我们传递过来一个true。

再次调用Parser类的重载成员方法parse()，如下所示，解析过程非常详细：

Java代码

/**
* Parse the arguments according to the specified options and
* properties.
*
* @param options the specified Options
* @param arguments the command line arguments
* @param properties command line option name-value pairs
* @param stopAtNonOption stop parsing the arguments when the first
* non option is encountered.
*
* @return the list of atomic option and value tokens
*
* @throws ParseException if there are any problems encountered
* while parsing the command line tokens.
*/
public CommandLine parse(Options options, String[] arguments,
Properties properties, boolean stopAtNonOption)
throws ParseException
{
// initialise members
this .options = options;
requiredOptions = options.getRequiredOptions();
cmd = new CommandLine();
boolean eatTheRest = false ;
if (arguments == null )
{
arguments = new String[ 0 ];
}
List tokenList = Arrays.asList(flatten(this .options,
arguments,
stopAtNonOption));
ListIterator iterator = tokenList.listIterator();
// process each flattened token
while (iterator.hasNext())
{
String t = (String) iterator.next();
// the value is the double-dash
if ( "--" .equals(t))
{
eatTheRest = true ;
}
// the value is a single dash
else if ( "-" .equals(t))
{
if (stopAtNonOption)
{
eatTheRest = true ;
}
else
{
cmd.addArg(t);
}
}
// the value is an option
else if (t.startsWith( "-" ))
{
if (stopAtNonOption && !options.hasOption(t))
{
eatTheRest = true ;
cmd.addArg(t);
}
else
{
processOption(t, iterator);
}
}
// the value is an argument
else
{
cmd.addArg(t);
if (stopAtNonOption)
{
eatTheRest = true ;
}
}
// eat the remaining tokens
if (eatTheRest)
{
while (iterator.hasNext())
{
String str = (String) iterator.next();
// ensure only one double-dash is added
if (! "--" .equals(str))
{
cmd.addArg(str);
}
}
}
}
processProperties(properties);
checkRequiredOptions();
return cmd;
}

     /**
     * Parse the arguments according to the specified options and
     * properties.
     *
     * @param options the specified Options
     * @param arguments the command line arguments
     * @param properties command line option name-value pairs
     * @param stopAtNonOption stop parsing the arguments when the first
     * non option is encountered.
     *
     * @return the list of atomic option and value tokens
     *
     * @throws ParseException if there are any problems encountered
     * while parsing the command line tokens.
     */
    public CommandLine parse(Options options, String[] arguments, 
                             Properties properties, boolean stopAtNonOption)
        throws ParseException
    {
        // initialise members
        this.options = options;
        requiredOptions = options.getRequiredOptions();
        cmd = new CommandLine();

        boolean eatTheRest = false;

        if (arguments == null)
        {
            arguments = new String[0];
        }

        List tokenList = Arrays.asList(flatten(this.options, 
                                               arguments, 
                                               stopAtNonOption));

        ListIterator iterator = tokenList.listIterator();

       // process each flattened token
        while (iterator.hasNext())
        {
            String t = (String) iterator.next();

            // the value is the double-dash
            if ("--".equals(t))
            {
                eatTheRest = true;
            }

            // the value is a single dash
            else if ("-".equals(t))
            {
                if (stopAtNonOption)
                {
                    eatTheRest = true;
                }
                else
                {
                    cmd.addArg(t);
                }
            }

            // the value is an option
            else if (t.startsWith("-"))
            {
                if (stopAtNonOption && !options.hasOption(t))
                {
                    eatTheRest = true;
                    cmd.addArg(t);
                }
                else
                {
                    processOption(t, iterator);
                }
            }

            // the value is an argument
            else
            {
                cmd.addArg(t);

                if (stopAtNonOption)
                {
                    eatTheRest = true;
                }
            }

            // eat the remaining tokens
            if (eatTheRest)
            {
                while (iterator.hasNext())
                {
                    String str = (String) iterator.next();

                    // ensure only one double-dash is added
                    if (!"--".equals(str))
                    {
                        cmd.addArg(str);
                    }
                }
            }
        }

        processProperties(properties);
        checkRequiredOptions();

        return cmd;
    }

解析之后，返回CommandLine类的实例，从而GenericOptionsParser类的成员变量commandLine获取到了一个引用。commandLine是GenericOptionsParser类的一个私有成员变量。

看一下CommandLine类的实现：

Java代码

package org.apache.commons.cli;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
/**
* Represents list of arguments parsed against
* a {@link Options} descriptor.
*
* It allows querying of a boolean {@link #hasOption(String opt)},
* in addition to retrieving the {@link #getOptionValue(String opt)}
* for options requiring arguments.
*/
public class CommandLine {
// 不能识别的 options/arguments
private List args = new LinkedList();
/** the processed options */
private Map options = new HashMap();
/** the option name map */
private Map names = new HashMap();
/** Map of unique options for ease to get complete list of options */
private Map hashcodeMap = new HashMap();
/** the processed options */
private Option[] optionsArray;
// 创建一个命令行CommandLine的实例。
CommandLine()
{
// nothing to do
}
// 从options这个HashMap中查看，判断是否opt已经被设置了
public boolean hasOption(String opt)
{
return options.containsKey(opt);
}
// 调用hasOption()方法，从options这个HashMap中查看，判断是否opt已经被设置了
public boolean hasOption( char opt)
{
return hasOption(String.valueOf(opt));
}
// 根据String opt返回Option的Object类型
public Object getOptionObject(String opt)
{
String res = getOptionValue(opt);
if (!options.containsKey(opt))
{
return null ;
}
Object type = ((Option) options.get(opt)).getType();
return (res == null ) ? null : TypeHandler.createValue(res, type);
}
// 根据char opt返回Option的Object类型
public Object getOptionObject( char opt)
{
return getOptionObject(String.valueOf(opt));
}
// 根据指定的String opt获取Option的值
public String getOptionValue(String opt)
{
String[] values = getOptionValues(opt);
return (values == null ) ? null : values[ 0 ];
}
// 根据指定的char opt获取Option的值
public String getOptionValue( char opt)
{
return getOptionValue(String.valueOf(opt));
}
/**
* Retrieves the array of values, if any, of an option.
*
* @param opt string name of the option
* @return Values of the argument if option is set, and has an argument,
* otherwise null.
*/
public String[] getOptionValues(String opt)
{
opt = Util.stripLeadingHyphens(opt);
String key = opt;
if (names.containsKey(opt))
{
key = (String) names.get(opt);
}
if (options.containsKey(key))
{
return ((Option) options.get(key)).getValues();
}
return null ;
}
// 根据指定的String opt，返回Option的值的一个数组
public String[] getOptionValues( char opt)
{
return getOptionValues(String.valueOf(opt));
}
// 根据指定的String opt和String defaultValue获取Option的值
public String getOptionValue(String opt, String defaultValue)
{
String answer = getOptionValue(opt);
return (answer != null ) ? answer : defaultValue;
}
// 根据指定的char opt和String defaultValue获取Option的值
public String getOptionValue( char opt, String defaultValue)
{
return getOptionValue(String.valueOf(opt), defaultValue);
}
// 返回不能够解析的Option和参数的一个数组
public String[] getArgs()
{
String[] answer = new String[args.size()];
args.toArray(answer);
return answer;
}
// 返回不能够解析的Option和参数的一个列表
public List getArgList()
{
return args;
}
/**
* jkeyes
* - commented out until it is implemented properly
* <p>Dump state, suitable for debugging.</p>
*
* @return Stringified form of this object
*/
public String toString() {
StringBuffer buf = new StringBuffer();
buf.append("[ CommandLine: [ options: " );
buf.append(options.toString());
buf.append(" ] [ args: " );
buf.append(args.toString());
buf.append(" ] ]" );
return buf.toString();
}
/**
* Add left-over unrecognized option/argument.
*
* @param arg the unrecognised option/argument.
*/
void addArg(String arg)
{
args.add(arg);
}
// 向CommandLine中添加一个Option，其中Option的值(可能多个)被存储
void addOption(Option opt)
{
hashcodeMap.put(new Integer(opt.hashCode()), opt);
String key = opt.getKey();
if (key == null )
{
key = opt.getLongOpt();
}
else
{
names.put(opt.getLongOpt(), key);
}
options.put(key, opt);
}
// 返回CommandLine的Option 成员表的一个迭代器
public Iterator iterator()
{
return hashcodeMap.values().iterator();
}
// 返回处理过的Option的对象数组
public Option[] getOptions()
{
Collection processed = options.values();
// reinitialise array
optionsArray = new Option[processed.size()];
// return the array
return (Option[]) processed.toArray(optionsArray);
}
}

package org.apache.commons.cli;

import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

/** 
* Represents list of arguments parsed against
* a {@link Options} descriptor.
*
* It allows querying of a boolean {@link #hasOption(String opt)},
* in addition to retrieving the {@link #getOptionValue(String opt)}
* for options requiring arguments.
*/
public class CommandLine {

    // 不能识别的 options/arguments 
    private List args = new LinkedList();

    /** the processed options */
    private Map options = new HashMap();

    /** the option name map */
    private Map names = new HashMap();

    /** Map of unique options for ease to get complete list of options */
    private Map hashcodeMap = new HashMap();

    /** the processed options */
    private Option[] optionsArray;

     // 创建一个命令行CommandLine的实例。
    CommandLine()
    {
        // nothing to do
    }

    // 从options这个HashMap中查看，判断是否opt已经被设置了
    public boolean hasOption(String opt)
    {
        return options.containsKey(opt);
    }

     // 调用hasOption()方法，从options这个HashMap中查看，判断是否opt已经被设置了
    public boolean hasOption(char opt)
    {
        return hasOption(String.valueOf(opt));
    }

    // 根据String opt返回Option的Object类型
    public Object getOptionObject(String opt)
    {
        String res = getOptionValue(opt);

        if (!options.containsKey(opt))
        {
            return null;
        }

        Object type = ((Option) options.get(opt)).getType();

        return (res == null)        ? null : TypeHandler.createValue(res, type);
    }

     // 根据char opt返回Option的Object类型
    public Object getOptionObject(char opt)
    {
        return getOptionObject(String.valueOf(opt));
    }

    // 根据指定的String opt获取Option的值
    public String getOptionValue(String opt)
    {
        String[] values = getOptionValues(opt);

        return (values == null) ? null : values[0];
    }

     // 根据指定的char opt获取Option的值
    public String getOptionValue(char opt)
    {
        return getOptionValue(String.valueOf(opt));
    }

    /** 
     * Retrieves the array of values, if any, of an option.
     *
     * @param opt string name of the option
     * @return Values of the argument if option is set, and has an argument,
     * otherwise null.
     */
    public String[] getOptionValues(String opt)
    {
        opt = Util.stripLeadingHyphens(opt);

        String key = opt;

        if (names.containsKey(opt))
        {
            key = (String) names.get(opt);
        }

        if (options.containsKey(key))
        {
            return ((Option) options.get(key)).getValues();
        }

        return null;
    }

     // 根据指定的String opt，返回Option的值的一个数组
    public String[] getOptionValues(char opt)
    {
        return getOptionValues(String.valueOf(opt));
    }

     // 根据指定的String opt和String defaultValue获取Option的值
    public String getOptionValue(String opt, String defaultValue)
    {
        String answer = getOptionValue(opt);

        return (answer != null) ? answer : defaultValue;
    }

     // 根据指定的char opt和String defaultValue获取Option的值
    public String getOptionValue(char opt, String defaultValue)
    {
        return getOptionValue(String.valueOf(opt), defaultValue);
    }

     // 返回不能够解析的Option和参数的一个数组
    public String[] getArgs()
    {
        String[] answer = new String[args.size()];

        args.toArray(answer);

        return answer;
    }

      // 返回不能够解析的Option和参数的一个列表
    public List getArgList()
    {
        return args;
    }

    /** 
     * jkeyes
     * - commented out until it is implemented properly
     * <p>Dump state, suitable for debugging.</p>
     *
     * @return Stringified form of this object
     */
    public String toString() {
        StringBuffer buf = new StringBuffer();
            
        buf.append("[ CommandLine: [ options: ");
        buf.append(options.toString());
        buf.append(" ] [ args: ");
        buf.append(args.toString());
        buf.append(" ] ]");
            
        return buf.toString();
    }

    /**
     * Add left-over unrecognized option/argument.
     *
     * @param arg the unrecognised option/argument.
     */
    void addArg(String arg)
    {
        args.add(arg);
    }

   // 向CommandLine中添加一个Option，其中Option的值(可能多个)被存储
    void addOption(Option opt)
    {
        hashcodeMap.put(new Integer(opt.hashCode()), opt);

        String key = opt.getKey();

        if (key == null)
        {
            key = opt.getLongOpt();
        }
        else
        {
            names.put(opt.getLongOpt(), key);
        }

        options.put(key, opt);
    }

     // 返回CommandLine的Option 成员表的一个迭代器
    public Iterator iterator()
    {
        return hashcodeMap.values().iterator();
    }

    // 返回处理过的Option的对象数组
    public Option[] getOptions()
    {
        Collection processed = options.values();


        // reinitialise array
        optionsArray = new Option[processed.size()];

        // return the array
        return (Option[]) processed.toArray(optionsArray);
    }
}

一个CommandLine中包含一个重要的HashMap，里面存储的是键值对，即(key, opt)，通过它可以非常方便地设置和访问。

接着在parseGeneralOptions方法中调用processGeneralOptions()方法，进行处理：

Java代码

processGeneralOptions(conf, commandLine);

processGeneralOptions(conf, commandLine);

processGeneralOptions的处理过程如下：

Java代码

/**
* Modify configuration according user-specified generic options
* @param conf Configuration to be modified
* @param line User-specified generic options
*/
private void processGeneralOptions(Configuration conf,
CommandLine line) {
if (line.hasOption( "fs" )) {
conf.set("fs.default.name" , line.getOptionValue( "fs" ));
}
if (line.hasOption( "jt" )) {
conf.set("mapred.job.tracker" , line.getOptionValue( "jt" ));
}
if (line.hasOption( "conf" )) {
conf.addResource(new Path(line.getOptionValue( "conf" )));
}
if (line.hasOption( 'D' )) {
String[] property = line.getOptionValues('D' );
for ( int i= 0 ; i<property.length- 1 ; i=i+ 2 ) {
if (property[i]!= null )
conf.set(property[i], property[i+1 ]);
}
}

    /**
     * Modify configuration according user-specified generic options
     * @param conf Configuration to be modified
     * @param line User-specified generic options
     */
     private void processGeneralOptions(Configuration conf,
      CommandLine line) {
    if (line.hasOption("fs")) {
      conf.set("fs.default.name", line.getOptionValue("fs"));
    }

    if (line.hasOption("jt")) {
      conf.set("mapred.job.tracker", line.getOptionValue("jt"));
    }
    if (line.hasOption("conf")) {
      conf.addResource(new Path(line.getOptionValue("conf")));
    }
    if (line.hasOption('D')) {
      String[] property = line.getOptionValues('D');
      for(int i=0; i<property.length-1; i=i+2) {
        if (property[i]!=null)
          conf.set(property[i], property[i+1]);
      }
    }
}

传进去一个CommandLine实例，通过CommanLine的信息，来设置Configuration conf对象。设置Configuration conf对象的目的是：为Hadoop的Tool工作而设置的，比如WordCount这个工具，在运行开始时需要获取到Hadoop的配置信息的，这个就需要从这里设置的Configuration conf对象来获取。

上面这个processGeneralOptions()方法，是根据CommanLine的对象，获取到所有参数值的一个数组，并返回。

到此位置，前面都是为了初始化一个GenericOptionsParser parser解析器所做的工作：

Java代码

GenericOptionsParser parser = new GenericOptionsParser(conf, args);

GenericOptionsParser parser = new GenericOptionsParser(conf, args);

进而，可以使用 GenericOptionsParser类的实例parser 来获取Hadoop的通用参数了：

Java代码

//get the args w/o generic hadoop args
tring[] toolArgs = parser.getRemainingArgs();

     //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs();

已经具备了运行Hadoop工具的条件了，可以启动了：

Java代码

return tool.run(toolArgs);

  return tool.run(toolArgs);

可以根据返回的状态码检查工具运行情况。

上面Tool tool就是我们实例化的WordCount对象，这时候才进入到WordCount实现中。

前面终于把命令行和Hadoop的配置类说完了，其实就是为了获取Hadoop的配置信息，在这些配置存在的环境下才能进行Tool的运行工作。

众所周之，Hadoop实现了Google的MapReduce算法，所以对于一个Hadoop的Tool必须实现Map函数和Reduce函数了，分别在处理数据的工作中进行映射和化简。

那么WordCount这个工具自然也要实现Map和Reduce函数了。

要知道，在WordCount中，定义了两个成员变量，如下所示：

Java代码

private final static IntWritable one = new IntWritable( 1 );
rivate Text word = new Text();

     private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

IntWritable类是一个为整数可以进行写、可以进行比较而定义的，比如统计单词出现频率就是一个整数。

Text类是用来存储文本内容的，存储的文本内容经过了编码、解码等等操作。可以参考org.apache.hadoop.io.Text类来了解更多信息。

先看Map的实现，如下所示：

Java代码

/**
* MapClass是一个内部静态类。统计数据文件中每一行的单词。
*/
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable( 1 );
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}

/**
   * MapClass是一个内部静态类。统计数据文件中每一行的单词。
   */
public static class MapClass extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    
    public void map(LongWritable key, Text value, 
                    OutputCollector<Text, IntWritable> output, 
                    Reporter reporter) throws IOException {
      String line = value.toString();
      StringTokenizer itr = new StringTokenizer(line);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(word, one);
      }
    }
}

StringTokenizer是将String line = value.toString();这个从文本中获取到的可能很长的不规范(带空格或者其他分隔符，这里默认就是空格作为分隔符的)的字符串进行处理，提取由空格作为分隔符的每个单词。

然后，word.set(itr.nextToken());将提取出来的单词设置到Text word中，看一下org.apache.hadoop.io.Text类的set()方法，如下所示：

Java代码

public void set(String string) {
try {
ByteBuffer bb = encode(string, true ); // 将传进来的单词string进行编码后放到字节缓冲区ByteBuffer bb中
bytes = bb.array(); // bytes是一个字节数组，是Text的成员
length = bb.limit(); // length是单词字符串转化为字节后的长度，length是Text的成员
}catch (CharacterCodingException e) {
throw new RuntimeException( "Should not have happened " + e.toString());
}
}

  public void set(String string) {
    try {
      ByteBuffer bb = encode(string, true); // 将传进来的单词string进行编码后放到字节缓冲区ByteBuffer bb中
      bytes = bb.array(); // bytes是一个字节数组，是Text的成员
      length = bb.limit(); // length是单词字符串转化为字节后的长度，length是Text的成员
    }catch(CharacterCodingException e) {
      throw new RuntimeException("Should not have happened " + e.toString()); 
    }
}

上面map()方法中，OutputCollector<Text, IntWritable> output是一个输出收集器，因为执行一个Map任务需要输出中间结果的，以便于下一个步骤进行Reduce任务进行合并简化。

OutputCollector<Text, IntWritable>是一个接口，先看看这个接口的定义吧，非常简单：

Java代码

package org.apache.hadoop.mapred;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
/**
* 收集<key, value>对。用于Map-Reduce框架，可以被Mapper或者Reducer使用
*/
public interface OutputCollector<K extends WritableComparable,
V extends Writable> {
/** Adds a key/value pair to the output.
*
* @param key the key to collect.
* @param value to value to collect.
* @throws IOException
*/
void collect(K key, V value) throws IOException;
}

package org.apache.hadoop.mapred;

import java.io.IOException;

import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;


/**
* 收集<key, value>对。用于Map-Reduce框架，可以被Mapper或者Reducer使用  
*/
public interface OutputCollector<K extends WritableComparable,
                                 V extends Writable> {

/** Adds a key/value pair to the output.
   *
   * @param key the key to collect.
   * @param value to value to collect.
   * @throws IOException
   */
void collect(K key, V value) throws IOException;
}

收集<key,value>对很好理解的，比如，读取文本文件中的一行，使用StringTokenizer提取后，假如得到一个单词 “shirdrn”，要对其统计词频，并且是第一次出现这个单词，则<key,value>对<shridrn,1>就可以表示，然后继续读取；如果再次碰到“shirdrn”这个单词，依然如上面，<key,value>对就是<shridrn,1>。在执行Map任务的时候，<key,value>对可能存在重复，因为Map任务没有对它进行合并。

如果必要的话。可以使用Combiner在Map完成之后先进行一次中间结果的合并，对上面出现两次的“shirdrn”，合并后就是<shridrn,2>。注意了，这里只是一个Map任务，假如另一个Map任务也多次出现“shirdrn”这个单词，例如执行一个 Combiner合并后变为<shridrn,4>。

使用Combiner合并后得到的结果仍然是一个中间结果，也就是说，对于某项任务(对应着多个Map子任务)执行完Map任务后，例如上面的两个，全部的中间结果中存在这样两个键值对：<shridrn,2>和<shridrn,4>，而我们的目的是要统计 “shirdrn”的词频，期望得到的结果是<shridrn,6>，这就要执行Reduce任务了，Reduce任务输出的不是中间结果了，是最终结果，即有一个输出或者0个输出。

另外，执行Combiner进行中间结果的合并输出中间结果之前，可能需要进行一个排序操作，对Map任务执行的输出结果进行排序后在进行Combiner合并。

OutputCollector<Text, IntWritable>是一个接口，要想知道它的collect方法是如何进行收集数据的，需要看它的具体实现类了，先看一看它的两个具体实现类：

首先，在MapTask类中定义了一个MapOutputCollector<K extends WritableComparable, V extends Writable>接口，它继承自OutputCollector<Text, IntWritable>接口，从而DirectMapOutputCollector<K extends WritableComparable,V extends Writable>类实现了MapOutputCollector<K, V>接口，MapOutputBuffer类也实现了MapOutputCollector接口。

其中，DirectMapOutputCollector类和MapOutputBuffer类都是OutputCollector接口的间接实现，但是在这两个具体实现类中，定义的collect()方法的功能是不同的，DirectMapOutputCollector类中的 collect()方法实现非常简单而且直接：

Java代码

public void collect(K key, V value) throws IOException {
this .out.write(key, value);
}

     public void collect(K key, V value) throws IOException {
      this.out.write(key, value);
    }

这里主要是使用了RecordWriter接口的具体实现类LineRecordWriter的write()方法来完成单词(应该是Map任务执行生成的key)的收集，LineRecordWriter类是定义在 org.apache.hadoop.mapred.TextOutputFormat类内容的一个静态类，看看write()方法是如何收集写入的：

Java代码

public synchronized void write(K key, V value)
throws IOException {
boolean nullKey = key == null || key instanceof NullWritable;
boolean nullValue = value == null || value instanceof NullWritable;
if (nullKey && nullValue) {
return ;
}
if (!nullKey) {
writeObject(key);
}
if (!(nullKey || nullValue)) {
out.write(tab);
}
if (!nullValue) {
writeObject(value);
}
out.write(newline);

     public synchronized void write(K key, V value)
      throws IOException {

      boolean nullKey = key == null || key instanceof NullWritable;
      boolean nullValue = value == null || value instanceof NullWritable;
      if (nullKey && nullValue) {
        return;
      }
      if (!nullKey) {
        writeObject(key);
      }
      if (!(nullKey || nullValue)) {
        out.write(tab);
      }
      if (!nullValue) {
        writeObject(value);
      }
      out.write(newline);
    }

其实非常容易，当key和value都不为空的时候才将它们收集写入，并且适当处理TAB和换行，上面的tab和newline在类中静态定义：

Java代码

static {
try {
tab = "/t" .getBytes(utf8);
newline = "/n" .getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException( "can't find " + utf8 + " encoding" );
}

     static {
      try {
        tab = "/t".getBytes(utf8);
        newline = "/n".getBytes(utf8);
      } catch (UnsupportedEncodingException uee) {
        throw new IllegalArgumentException("can't find " + utf8 + " encoding");
      }
    }

现在明白多了，如果在看一看writeObject()方法的实现就更加明白了：

Java代码

private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0 , to.getLength());
} else {
out.write(o.toString().getBytes(utf8));
}

     private void writeObject(Object o) throws IOException {
      if (o instanceof Text) {
        Text to = (Text) o;
        out.write(to.getBytes(), 0, to.getLength());
      } else {
        out.write(o.toString().getBytes(utf8));
      }
    }

其实将一个<key,value>对设置到了一个Text的对象中。Text对象当然可以包含很多个<key,value>对了，只要使用指定的分隔符分割就行了。上面已经使用回车换行符了，具体样式是这样的：设置一个key，例如 “shirdrn”，再设置一个value，例如6，然后在来一个回车换行符，多个的话，就形如下所示：

Java代码

shirdrn 6
master 2
hear 4

shirdrn      6
master       2 
hear       4

在writeObject()方法中，还要进行输出呢，使用java.io.DataOutputStream.write()方法进行输出的，写到指定的文件系统或者缓存中，这里是中间结果应该是写入到缓存中，因为Map任务结束会立即执行Reduce任务来对中间结果进行合并输出。

再看一下Reduce的实现：

Java代码

/**
* Reduce是一个内部静态类。作为统计单词数量的中间结果类，由于这个例子简单无须执行中间结果的合并。
*/
public static class Reduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int sum = 0 ;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

/**
   * Reduce是一个内部静态类。作为统计单词数量的中间结果类，由于这个例子简单无须执行中间结果的合并。
   */
public static class Reduce extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {
    
    public void reduce(Text key, Iterator<IntWritable> values,
                       OutputCollector<Text, IntWritable> output, 
                       Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
        sum += values.next().get();
      }
      output.collect(key, new IntWritable(sum));
    }
}

reduce方法的第二个参数为Iterator<IntWritable> values，是一个迭代器类型，即是多个value的迭代器，通过这个迭代器可以得到多个value。又由于第一个参数指定了key，那么这个迭代器就是与这个key相关的了，即每个value都是key的value，如果统计词频，只要将多个value进行求和运算即可。

最后同样要输出了，这次是输出到文件系统中的指定文件中了，因为是最终结果。

最后一个就是Map/Reduce的核心驱动部分了，目的就是要让WordCount这个工具正常地运行起来，看run方法的实现：

Java代码

/**
* map/reduce程序的驱动部分，用于实现提交map/reduce任务。
*/
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), WordCount. class );
conf.setJobName("wordcount" );
// the keys are words (strings)
conf.setOutputKeyClass(Text.class );
// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class );
conf.setMapperClass(MapClass.class );
conf.setCombinerClass(Reduce.class );
conf.setReducerClass(Reduce.class );
List<String> other_args = new ArrayList<String>();
for ( int i= 0 ; i < args.length; ++i) {
try {
if ( "-m" .equals(args[i])) {
conf.setNumMapTasks(Integer.parseInt(args[++i]));
} else if ( "-r" .equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else {
other_args.add(args[i]);
}
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " +
args[i-1 ]);
return printUsage();
}
}
// Make sure there are exactly 2 parameters left.
if (other_args.size() != 2 ) {
System.out.println("ERROR: Wrong number of parameters: " +
other_args.size() + " instead of 2." );
return printUsage();
}
conf.setInputPath(new Path(other_args.get( 0 )));
conf.setOutputPath(new Path(other_args.get( 1 )));
JobClient.runJob(conf);
return 0 ;
}

/**
   * map/reduce程序的驱动部分，用于实现提交map/reduce任务。
   */
public int run(String[] args) throws Exception {
    JobConf conf = new JobConf(getConf(), WordCount.class);
    conf.setJobName("wordcount");

    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);
    
    conf.setMapperClass(MapClass.class);        
    conf.setCombinerClass(Reduce.class);
    conf.setReducerClass(Reduce.class);
    
    List<String> other_args = new ArrayList<String>();
    for(int i=0; i < args.length; ++i) {
      try {
        if ("-m".equals(args[i])) {
          conf.setNumMapTasks(Integer.parseInt(args[++i]));
        } else if ("-r".equals(args[i])) {
          conf.setNumReduceTasks(Integer.parseInt(args[++i]));
        } else {
          other_args.add(args[i]);
        }
      } catch (NumberFormatException except) {
        System.out.println("ERROR: Integer expected instead of " + args[i]);
        return printUsage();
      } catch (ArrayIndexOutOfBoundsException except) {
        System.out.println("ERROR: Required parameter missing from " +
                           args[i-1]);
        return printUsage();
      }
    }
    // Make sure there are exactly 2 parameters left.
    if (other_args.size() != 2) {
      System.out.println("ERROR: Wrong number of parameters: " +
                         other_args.size() + " instead of 2.");
      return printUsage();
    }
    conf.setInputPath(new Path(other_args.get(0)));
    conf.setOutputPath(new Path(other_args.get(1)));
        
    JobClient.runJob(conf);
    return 0;
}

在run()方法中，值得注意的是JobConf这个类，它是一个任务配置类。它是Configuration的子类，因为在继承了Configuration的关于Hadoop的基本配置以外，还有自己的一些针对任务的相关配置。

JobConf类应该是相当重要的。我们主要围绕在WordCount这个工具中使用到的一些方法进行了解。

首先要实例化一个JobConf类的对象：

Java代码

JobConf conf = new JobConf(getConf(), WordCount. class );

JobConf conf = new JobConf(getConf(), WordCount.class);

通过这个初始化代码行来看一下JobConf类的构造方法：

Java代码

public JobConf(Configuration conf, Class exampleClass) {
this (conf);
setJarByClass(exampleClass);
}

public JobConf(Configuration conf, Class exampleClass) {
    this(conf);  
    setJarByClass(exampleClass);
}

首先，调用该类的具有一个Configuration类型参数的构造方法，其实就是继承自Configuration类，如下所示：

Java代码

public JobConf(Configuration conf) {
super (conf);
}

public JobConf(Configuration conf) {
    super(conf);
}

然后，调用setJarByClass()方法，根据指定的类名称来设置当前运行任务的任务配置包含的Jar文件，方法如下所示：

Java代码

public void setJarByClass(Class cls) {
String jar = findContainingJar(cls);
if (jar != null ) {
setJar(jar);
}
}

public void setJarByClass(Class cls) {
    String jar = findContainingJar(cls);
    if (jar != null) {
      setJar(jar);
    }   
}

这里首先要查找包含的Jar文件(返回的是Jar文件的字符串描述)，如果不空再调用 setJar(jar);为任务配置进行设置。

看一下如何进行查找的，在findContainingJar()方法中有实现过程：

Java代码

private static String findContainingJar(Class my_class) {
ClassLoader loader = my_class.getClassLoader(); // 获取到指定类Class my_class的类加载器
String class_file = my_class.getName().replaceAll("//." , "/" ) + ".class" ; // 获取到类文件
try {
for (Enumeration itr = loader.getResources(class_file);
itr.hasMoreElements();) {
URL url = (URL) itr.nextElement();
if ( "jar" .equals(url.getProtocol())) { // 迭代出的URL是否支持jar协议
String toReturn = url.getPath(); // 获取这个URL的path
if (toReturn.startsWith( "file:" )) {
toReturn = toReturn.substring("file:" .length()); // 提取path中“file:”字符串后面的文件名字符串
}
toReturn = URLDecoder.decode(toReturn, "UTF-8" ); // 解码
return toReturn.replaceAll( "!.*$" , "" ); // 格式化：去掉文件名称中的"!.*$"
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return null ;
}

private static String findContainingJar(Class my_class) {
    ClassLoader loader = my_class.getClassLoader();   // 获取到指定类Class my_class的类加载器
    String class_file = my_class.getName().replaceAll("//.", "/") + ".class"; // 获取到类文件
    try {
      for(Enumeration itr = loader.getResources(class_file);
          itr.hasMoreElements();) {
        URL url = (URL) itr.nextElement();
        if ("jar".equals(url.getProtocol())) { // 迭代出的URL是否支持jar协议
          String toReturn = url.getPath(); // 获取这个URL的path
          if (toReturn.startsWith("file:")) { 
            toReturn = toReturn.substring("file:".length()); //   提取path中“file:”字符串后面的文件名字符串
          }
          toReturn = URLDecoder.decode(toReturn, "UTF-8"); // 解码
          return toReturn.replaceAll("!.*$", ""); // 格式化：去掉文件名称中的"!.*$"
        }
      }
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
    return null;
}

查找获得了一个jar字符串，然后setJar(jar) ，方法如下：

Java代码

/**
* Set the user jar for the map-reduce job.
*
* @param jar the user jar for the map-reduce job.
*/
public void setJar(String jar) { set( "mapred.jar" , jar); }

/**
   * Set the user jar for the map-reduce job.
   * 
   * @param jar the user jar for the map-reduce job.
   */
public void setJar(String jar) { set("mapred.jar", jar); }

上面set("mapred.jar", jar);方法是继承自Configuration类的方法，如下所示：

Java代码

/**
* Set the <code>value</code> of the <code>name</code> property.
*
* @param name property name.
* @param value property value.
*/
public void set(String name, String value) {
getOverlay().setProperty(name, value);
getProps().setProperty(name, value);
}

/** 
   * Set the <code>value</code> of the <code>name</code> property.
   * 
   * @param name property name.
   * @param value property value.
   */
public void set(String name, String value) {
    getOverlay().setProperty(name, value);
    getProps().setProperty(name, value);
}

上面把set("mapred.jar", jar); 设置到Properties变量中了，而properties和overlay都是Configuration类的成员：

Java代码

private Properties properties;
private Properties overlay;

private Properties properties;
private Properties overlay;

到这里，JobConf conf已经进行了基本的任务配置，加载类设置。

接着就要继续更详细的配置了。

设置任务名称：

Java代码

conf.setJobName( "wordcount" );

conf.setJobName("wordcount");

setJobName()方法实现：

Java代码

public void setJobName(String name) {
set("mapred.job.name" , name);
}

public void setJobName(String name) {
    set("mapred.job.name", name);
}

即，任务名称为wordcount。

设置任务的key的Class：

Java代码

// the keys are words (strings)
conf.setOutputKeyClass(Text.class );

    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);

对应于：

Java代码

public void setOutputKeyClass(Class<? extends WritableComparable> theClass) {
setClass("mapred.output.key.class" , theClass, WritableComparable. class );
}

  public void setOutputKeyClass(Class<? extends WritableComparable> theClass) {
    setClass("mapred.output.key.class", theClass, WritableComparable.class);
}

设置任务的value的Class：

Java代码

// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class );

    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);

对应于：

Java代码

public void setOutputValueClass(Class<? extends Writable> theClass) {
setClass("mapred.output.value.class" , theClass, Writable. class );
}

public void setOutputValueClass(Class<? extends Writable> theClass) {
    setClass("mapred.output.value.class", theClass, Writable.class);
}

设置MapperClass、CombinerClass、ReducerClass的Class：

Java代码

conf.setMapperClass(MapClass. class );
conf.setCombinerClass(Reduce.class );
conf.setReducerClass(Reduce.class );

    conf.setMapperClass(MapClass.class);        
    conf.setCombinerClass(Reduce.class);
    conf.setReducerClass(Reduce.class);

其中，CombinerClass就是使用的ReducerClass，就在ReducerClass中完成一次完成合并简化操作。

接着往下看，List<String> other_args是接收输入的一些特定参数，这里是指设置Map任务和Reduce任务的数量，即-m和-r参数，通过一个循环判断是否指定了这些参数，如果指定了，要分别将其设置到任务的配置中去，以便任务启动之时能够按照我们定制的方式进行执行。

Java代码

if ( "-m" .equals(args[i])) {
conf.setNumMapTasks(Integer.parseInt(args[++i]));
} else if ( "-r" .equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else {
other_args.add(args[i]);
}

        if ("-m".equals(args[i])) {
          conf.setNumMapTasks(Integer.parseInt(args[++i]));
        } else if ("-r".equals(args[i])) {
          conf.setNumReduceTasks(Integer.parseInt(args[++i]));
        } else {
          other_args.add(args[i]);
        }

一个是设置Map任务数量：

Java代码

public void setNumMapTasks( int n) { setInt( "mapred.map.tasks" , n); }

public void setNumMapTasks(int n) { setInt("mapred.map.tasks", n); }

另一个是设置Reduce任务数量：

Java代码

public void setNumReduceTasks( int n) { setInt( "mapred.reduce.tasks" , n); }

public void setNumReduceTasks(int n) { setInt("mapred.reduce.tasks", n); }

使用命令行，要指定任务输入目录和输出目录：

Java代码

conf.setInputPath( new Path(other_args.get( 0 )));
conf.setOutputPath(new Path(other_args.get( 1 )));

    conf.setInputPath(new Path(other_args.get(0)));
    conf.setOutputPath(new Path(other_args.get(1)));

设置输入目录的方法调用：

Java代码

public void setInputPath(Path dir) {
dir = new Path(getWorkingDirectory(), dir);
set("mapred.input.dir" , dir.toString());
}

public void setInputPath(Path dir) {
    dir = new Path(getWorkingDirectory(), dir);
    set("mapred.input.dir", dir.toString());
}

设置任务输出目录的方法调用：

Java代码

public void setOutputPath(Path dir) {
dir = new Path(getWorkingDirectory(), dir);
set("mapred.output.dir" , dir.toString());

   public void setOutputPath(Path dir) {
    dir = new Path(getWorkingDirectory(), dir);
    set("mapred.output.dir", dir.toString());
}

它们都调用了一个得到当前工作目录的绝对路径的方法getWorkingDirectory()，如下所示：

Java代码

/**
* Get the current working directory for the default file system.
*
* @return the directory name.
*/
public Path getWorkingDirectory() {
String name = get("mapred.working.dir" );
if (name != null ) {
return new Path(name);
} else {
try {
Path dir = FileSystem.get(this ).getWorkingDirectory();
set("mapred.working.dir" , dir.toString());
return dir;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}

/**
   * Get the current working directory for the default file system.
   * 
   * @return the directory name.
   */
public Path getWorkingDirectory() {
    String name = get("mapred.working.dir");
    if (name != null) {
      return new Path(name);
    } else {
      try {
        Path dir = FileSystem.get(this).getWorkingDirectory();
        set("mapred.working.dir", dir.toString());
        return dir;
      } catch (IOException e) {
        throw new RuntimeException(e);
      }
    }
}

最后，任务配置完成后，进行启动：

Java代码

JobClient.runJob(conf);

JobClient.runJob(conf);

这个启动过程可是非常复杂了，你可以通过JobClient类的runJob()方法看到。

转自：http://radarradar.iteye.com/blog/289261

hitmediaman

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
分析Hadoop自带WordCount例子的执行过程

在Hadoop的发行包中也附带了例子的源代码，WordCount.java类的主函数实现如下所示：Java代码 public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configu
复制链接

扫一扫

专栏目录