Dissecting The Nutch Crawler -Aside: net.nutch.util.NutchConfig

 
     英文原文出处: DissectingTheNutchCrawler
  转载本文请注明出处:http://blog.csdn.net/pwlazy

Aside: net.nutch.util.NutchConfig

If you have been reading the code along with our discussion, you may have noticed several "private static final" variables at the start of the "command" class definitions. For example, net.nutch.db.WebDBInjector has these definitions for DEFAULT_INTERVAL and NEW_INJECTED_PAGE_NAME:

private static final byte DEFAULT_INTERVAL =
(byte)NutchConf.getInt("db.default.fetch.interval", 30);

private static final float NEW_INJECTED_PAGE_SCORE =
NutchConf.getFloat("db.score.injected", 2.0f);

The values are loaded by calls to net.nutch.util.NutchConf, which is, intuitively enough, a class that loads configuration files. It has two static variables, "List resourceNames" and "Properties properties".The class has several static methods to manipulate these variables. Here's a summary of its operations:

  1. resourceNames is initialized with the strings "nutch-default.xml" and "nutch-site.xml"

  2. "properties" is initially null

  3. A call to one of the "getXXX" methods results in a call to getProps(). If (properties == null), loadResource() is successively called with the values from "resourceNames".

  4. loadResource() loads each file, parses theXML, and sets values in "properties" per the config



附上 net.nutch.util.NutchConfig

如果你随着我们的讨论看代码,你会在几个与命令对应的类的开始处看到几个 "private static final"变量。例如 net.nutch.db.WebDBInjector类的DEFAULT_INTERVAL和 NEW_INJECTED_PAGE_NAME属性就有这种限制符,看以下代码:

private static final byte DEFAULT_INTERVAL =
(byte)NutchConf.getInt("db.default.fetch.interval", 30);

private static final float NEW_INJECTED_PAGE_SCORE =
NutchConf.getFloat("db.score.injected", 2.0f);

通过调用net.nutch.util.NutchConf可以加载上面那些变量的值,你完全可以凭直觉知道net.nutch.util.NutchConf就是一个加载配置文件的类。它有两个静态变量: "resourceNames(List 类型)" 和 "properties(Properties 类型)"。该类有些静态方法可以操作这些变量。以下是操作的总结:

  1. 通过"nutch-default.xml" 和  "nutch-site.xml" 初始化resourceNames
  2. properties开始是null
  3. 对getXXX方法的调用会首先调用getProps,如果properties == null,那么接着调用loadResource并传入resourceNames的各个值
  4. 针对resourceNames中定义的每个配置文件,loadResource方法回加载,然后解析,最后将解析结果植入到properties中
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值