【转】分析Hadoop自带WordCount例子的执行过程(3)

最新推荐文章于 2022-08-16 17:54:30 发布

phinecos

最新推荐文章于 2022-08-16 17:54:30 发布

阅读量1k

点赞数

分类专栏： Search Engine 文章标签： hadoop hashmap string null properties iterator

本文链接：https://blog.csdn.net/phinecos/article/details/4611999

版权

Search Engine 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

继续向下看：

Option fs = OptionBuilder.withArgName( " local|namenode:port " )
    .hasArg()
    .withDescription( " specify a namenode " )
    .create( " fs " );
     opts.addOption(fs);

有一个很重要的类OptionBuilder，它才完成了“充实”一个Option的过程，然后经过多次调用，会将多个Option都添加到opts列表中。

看一看OptionBuilder类的withArgName()方法：

/**
     * The next Option created will have the specified argument value
     * name.
     *
     * @param name the name for the argument value
     * @return the OptionBuilder instance
      */
     public static OptionBuilder withArgName(String name)
    {
        OptionBuilder.argName = name;
         return instance;
    }

上面，为一个OptionBuilder的实例指定一个参数(argName)为name，实际上是返回了一个具有name的OptionBuilder实例。

然后，又调用了hasArg()方法，它也是OptionBuilder类的静态方法：

   /**
     * The next Option created will require an argument value.
     *
     * @return the OptionBuilder instance
      */
     public static OptionBuilder hasArg()
    {
        OptionBuilder.numberOfArgs = 1 ;
         return instance;
    }

为刚才指定参数名的那个OptionBuilder实例设置了参数的个数，因为第一次设置，当然个数为1了。

调用withDescription()方法来设定描述信息：

/**
     * The next Option created will have the specified description
     *
     * @param newDescription a description of the Option's purpose
     * @return the OptionBuilder instance
      */
     public static OptionBuilder withDescription(String newDescription)
    {
        OptionBuilder.description = newDescription;
         return instance;
    }

比较关键的是最后一步调用，通过调用OptionBuilder类的create()方法才真正完成了一个Option的创建：

/**
     * Create an Option using the current settings and with
     * the specified Option <code>char</code>.
     *
     * @param opt the <code>java.lang.String</code> representation
     * of the Option
     * @return the Option instance
     * @throws IllegalArgumentException if <code>opt</code> is not
     * a valid character. See Option.
      */
     public static Option create(String opt)
                          throws IllegalArgumentException
    {
         // create the option
        Option option = new Option(opt, description);

         // set the option properties
        option.setLongOpt(longopt);
        option.setRequired(required);
        option.setOptionalArg(optionalArg);
        option.setArgs(numberOfArgs);
        option.setType(type);
        option.setValueSeparator(valuesep);
        option.setArgName(argName);
        option.setArgPattern(argPattern, limit);

         // reset the OptionBuilder properties
        OptionBuilder.reset();
         // return the Option instance
         return option;
    }

从上面一个Option的设置，我们可以看出来，OptionBuilder类其实是一个辅助工具，用来收集与一个Option相关的信息，从而将这些信息一次全部赋予到一个新建的Option对象上，这个对象现在具有详细的信息了。

接着，通过CommandLineParser parser的parse方法，可以知道public abstract class Parser implements CommandLineParser，从抽象类Parser中找到parse的实现：

public CommandLine parse(Options options, String[] arguments,
                              boolean stopAtNonOption)
         throws ParseException
    {
         return parse(options, arguments, null , stopAtNonOption);
    }

参数stopAtNonOption表明，如果解析过程中遇到的是一个空选项是否仍然继续解析。从前面parseGeneralOptions方法中commandLine = parser.parse(opts, args, true);可知：我们传递过来一个true。

再次调用Parser类的重载成员方法parse()，如下所示，解析过程非常详细：

/**
     * Parse the arguments according to the specified options and
     * properties.
     *
     * @param options the specified Options
     * @param arguments the command line arguments
     * @param properties command line option name-value pairs
     * @param stopAtNonOption stop parsing the arguments when the first
     * non option is encountered.
     *
     * @return the list of atomic option and value tokens
     *
     * @throws ParseException if there are any problems encountered
     * while parsing the command line tokens.
      */
     public CommandLine parse(Options options, String[] arguments,
                             Properties properties, boolean stopAtNonOption)
         throws ParseException
    {
         // initialise members
         this .options = options;
        requiredOptions = options.getRequiredOptions();
        cmd = new CommandLine();
         boolean eatTheRest = false ;
         if (arguments == null )
        {
            arguments = new String[ 0 ];
        }
        List tokenList = Arrays.asList(flatten( this .options,
                                               arguments,
                                               stopAtNonOption));
        ListIterator iterator = tokenList.listIterator();
        // process each flattened token
         while (iterator.hasNext())
        {
            String t = (String) iterator.next();
             // the value is the double-dash
             if ( " -- " .equals(t))
            {
                eatTheRest = true ;
            }
             // the value is a single dash
             else if ( " - " .equals(t))
            {
                 if (stopAtNonOption)
                {
                    eatTheRest = true ;
                }
                 else
                {
                    cmd.addArg(t);
                }
            }
             // the value is an option
             else if (t.startsWith( " - " ))
            {
                 if (stopAtNonOption && ! options.hasOption(t))
                {
                    eatTheRest = true ;
                    cmd.addArg(t);
                }
                 else
                {
                    processOption(t, iterator);
                }
            }
             // the value is an argument
             else
            {
                cmd.addArg(t);
                 if (stopAtNonOption)
                {
                    eatTheRest = true ;
                }
            }
             // eat the remaining tokens
             if (eatTheRest)
            {
                 while (iterator.hasNext())
                {
                    String str = (String) iterator.next();
                     // ensure only one double-dash is added
                     if ( ! " -- " .equals(str))
                    {
                        cmd.addArg(str);
                    }
                }
            }
        }
        processProperties(properties);
        checkRequiredOptions();
         return cmd;
    }

解析之后，返回CommandLine类的实例，从而GenericOptionsParser类的成员变量commandLine获取到了一个引用。commandLine是GenericOptionsParser类的一个私有成员变量。

看一下CommandLine类的实现：

package org.apache.commons.cli;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
/**
* Represents list of arguments parsed against
* a { @link Options} descriptor.
*
* It allows querying of a boolean { @link #hasOption(String opt)},
* in addition to retrieving the { @link #getOptionValue(String opt)}
* for options requiring arguments.
*/
public class CommandLine {
     // 不能识别的 options/arguments
     private List args = new LinkedList();
     /** the processed options */
     private Map options = new HashMap();
     /** the option name map */
     private Map names = new HashMap();
     /** Map of unique options for ease to get complete list of options */
     private Map hashcodeMap = new HashMap();
     /** the processed options */
     private Option[] optionsArray;
      // 创建一个命令行CommandLine的实例。
    CommandLine()
    {
         // nothing to do
    }
     // 从options这个HashMap中查看，判断是否opt已经被设置了
     public boolean hasOption(String opt)
    {
         return options.containsKey(opt);
    }
      // 调用hasOption()方法，从options这个HashMap中查看，判断是否opt已经被设置了
     public boolean hasOption( char opt)
    {
         return hasOption(String.valueOf(opt));
    }
     // 根据String opt返回Option的Object类型
     public Object getOptionObject(String opt)
    {
        String res = getOptionValue(opt);
         if ( ! options.containsKey(opt))
        {
             return null ;
        }
        Object type = ((Option) options.get(opt)).getType();
         return (res == null )         ? null : TypeHandler.createValue(res, type);
    }
      // 根据char opt返回Option的Object类型
     public Object getOptionObject( char opt)
    {
         return getOptionObject(String.valueOf(opt));
    }
     // 根据指定的String opt获取Option的值
     public String getOptionValue(String opt)
    {
        String[] values = getOptionValues(opt);
         return (values == null ) ? null : values[ 0 ];
    }
      // 根据指定的char opt获取Option的值
     public String getOptionValue( char opt)
    {
         return getOptionValue(String.valueOf(opt));
    }
     /**
     * Retrieves the array of values, if any, of an option.
     *
     * @param opt string name of the option
     * @return Values of the argument if option is set, and has an argument,
     * otherwise null.
      */
     public String[] getOptionValues(String opt)
    {
        opt = Util.stripLeadingHyphens(opt);
        String key = opt;
         if (names.containsKey(opt))
        {
            key = (String) names.get(opt);
        }
         if (options.containsKey(key))
        {
             return ((Option) options.get(key)).getValues();
        }
         return null ;
    }
      // 根据指定的String opt，返回Option的值的一个数组
     public String[] getOptionValues( char opt)
    {
         return getOptionValues(String.valueOf(opt));
    }
      // 根据指定的String opt和String defaultValue获取Option的值
     public String getOptionValue(String opt, String defaultValue)
    {
        String answer = getOptionValue(opt);
         return (answer != null ) ? answer : defaultValue;
    }
      // 根据指定的char opt和String defaultValue获取Option的值
     public String getOptionValue( char opt, String defaultValue)
    {
         return getOptionValue(String.valueOf(opt), defaultValue);
    }
      // 返回不能够解析的Option和参数的一个数组
     public String[] getArgs()
    {
        String[] answer = new String[args.size()];
        args.toArray(answer);
         return answer;
    }
       // 返回不能够解析的Option和参数的一个列表
     public List getArgList()
    {
         return args;
    }
     /**
     * jkeyes
     * - commented out until it is implemented properly
     * <p>Dump state, suitable for debugging.</p>
     *
     * @return Stringified form of this object
      */
     public String toString() {
        StringBuffer buf = new StringBuffer();

        buf.append( " [ CommandLine: [ options: " );
        buf.append(options.toString());
        buf.append( " ] [ args: " );
        buf.append(args.toString());
        buf.append( " ] ] " );

         return buf.toString();
    }
     /**
     * Add left-over unrecognized option/argument.
     *
     * @param arg the unrecognised option/argument.
      */
     void addArg(String arg)
    {
        args.add(arg);
    }
    // 向CommandLine中添加一个Option，其中Option的值(可能多个)被存储
     void addOption(Option opt)
    {
        hashcodeMap.put( new Integer(opt.hashCode()), opt);
        String key = opt.getKey();
         if (key == null )
        {
            key = opt.getLongOpt();
        }
         else
        {
            names.put(opt.getLongOpt(), key);
        }
        options.put(key, opt);
    }
      // 返回CommandLine的Option 成员表的一个迭代器
     public Iterator iterator()
    {
         return hashcodeMap.values().iterator();
    }
     // 返回处理过的Option的对象数组
     public Option[] getOptions()
    {
        Collection processed = options.values();

         // reinitialise array
        optionsArray = new Option[processed.size()];
         // return the array
         return (Option[]) processed.toArray(optionsArray);
    }
}

一个CommandLine中包含一个重要的HashMap，里面存储的是键值对，即(key, opt)，通过它可以非常方便地设置和访问。

接着在parseGeneralOptions方法中调用processGeneralOptions()方法，进行处理：

processGeneralOptions(conf, commandLine);

processGeneralOptions的处理过程如下：

/**
   * Modify configuration according user-specified generic options
   * @param conf Configuration to be modified
   * @param line User-specified generic options
    */
private void processGeneralOptions(Configuration conf,
      CommandLine line) {
     if (line.hasOption( " fs " )) {
      conf.set( " fs.default.name " , line.getOptionValue( " fs " ));
    }
     if (line.hasOption( " jt " )) {
      conf.set( " mapred.job.tracker " , line.getOptionValue( " jt " ));
    }
     if (line.hasOption( " conf " )) {
      conf.addResource( new Path(line.getOptionValue( " conf " )));
    }
     if (line.hasOption( ' D ' )) {
      String[] property = line.getOptionValues( ' D ' );
       for ( int i = 0 ; i < property.length - 1 ; i = i + 2 ) {
         if (property[i] != null )
          conf.set(property[i], property[i + 1 ]);
      }
    }
}

传进去一个CommandLine实例，通过CommanLine的信息，来设置Configuration conf对象。设置Configuration conf对象的目的是：为Hadoop的Tool工作而设置的，比如WordCount这个工具，在运行开始时需要获取到Hadoop的配置信息的，这个就需要从这里设置的Configuration conf对象来获取。

上面这个processGeneralOptions()方法，是根据CommanLine的对象，获取到所有参数值的一个数组，并返回。

到此位置，前面都是为了初始化一个GenericOptionsParser parser解析器所做的工作：

GenericOptionsParser parser = new GenericOptionsParser(conf, args);

进而，可以使用 GenericOptionsParser类的实例parser 来获取Hadoop的通用参数了：

// get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();

已经具备了运行Hadoop工具的条件了，可以启动了：

return tool.run(toolArgs);

可以根据返回的状态码检查工具运行情况。

上面Tool tool就是我们实例化的WordCount对象，这时候才进入到WordCount实现中。

phinecos

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【转】分析Hadoop自带WordCount例子的执行过程(3)

继续向下看：Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/--> Option fs = OptionBuilder.withArgName("local|namenode:port") .hasArg() .withDes
复制链接

扫一扫