Dissecting The Nutch Crawler -Aside: net.nutch.util.NutchConfig

最新推荐文章于 2022-04-29 14:04:53 发布

pwlazy

最新推荐文章于 2022-04-29 14:04:53 发布

阅读量1.4k

点赞数

分类专栏： search engine 文章标签： properties variables methods byte null class

search engine 专栏收录该内容

29 篇文章 0 订阅

订阅专栏

英文原文出处： DissectingTheNutchCrawler
转载本文请注明出处：http://blog.csdn.net/pwlazy

Aside: net.nutch.util.NutchConfig

If you have been reading the code along with our discussion, you may have noticed several "private static final" variables at the start of the "command" class definitions. For example, net.nutch.db.WebDBInjector has these definitions for DEFAULT_INTERVAL and NEW_INJECTED_PAGE_NAME:

private static final byte DEFAULT_INTERVAL =
  (byte)NutchConf.getInt("db.default.fetch.interval", 30);

private static final float NEW_INJECTED_PAGE_SCORE =
 NutchConf.getFloat("db.score.injected", 2.0f);

The values are loaded by calls to net.nutch.util.NutchConf, which is, intuitively enough, a class that loads configuration files. It has two static variables, "List resourceNames" and "Properties properties".The class has several static methods to manipulate these variables. Here's a summary of its operations:

resourceNames is initialized with the strings "nutch-default.xml" and "nutch-site.xml"
"properties" is initially null
A call to one of the "getXXX" methods results in a call to getProps(). If (properties == null), loadResource() is successively called with the values from "resourceNames".
loadResource() loads each file, parses theXML, and sets values in "properties" per the config

附上 net.nutch.util.NutchConfig

如果你随着我们的讨论看代码，你会在几个与命令对应的类的开始处看到几个 "private static final"变量。例如 net.nutch.db.WebDBInjector类的DEFAULT_INTERVAL和 NEW_INJECTED_PAGE_NAME属性就有这种限制符，看以下代码：

private static final byte DEFAULT_INTERVAL =
  (byte)NutchConf.getInt("db.default.fetch.interval", 30);

private static final float NEW_INJECTED_PAGE_SCORE =
 NutchConf.getFloat("db.score.injected", 2.0f);

通过调用net.nutch.util.NutchConf可以加载上面那些变量的值，你完全可以凭直觉知道net.nutch.util.NutchConf就是一个加载配置文件的类。它有两个静态变量： "resourceNames(List 类型)" 和 "properties(Properties 类型)"。该类有些静态方法可以操作这些变量。以下是操作的总结：

通过"nutch-default.xml" 和 "nutch-site.xml" 初始化resourceNames
properties开始是null
对getXXX方法的调用会首先调用getProps，如果properties == null，那么接着调用loadResource并传入resourceNames的各个值
针对resourceNames中定义的每个配置文件，loadResource方法回加载，然后解析，最后将解析结果植入到properties中