nutch1.6安装使用中错误解决方法

最新推荐文章于 2024-09-22 13:19:22 发布

树上骑个猴

最新推荐文章于 2024-09-22 13:19:22 发布

阅读量494

点赞数 1

分类专栏： Java开发学习文章标签： nutch 网络

Java开发学习专栏收录该内容

35 篇文章 5 订阅

订阅专栏

本文为小编在使用nutch 1.6中遇到“Nutch Fetcher: No agents listed in ‘http.agent.name’ property” 的第一个，该问题解决方法：原文网址：http://blog.csdn.net/chaishen10000/article/details/7183382

网络上大多解释是：在{nutch}/conf下找到nutch-default.xml

如果一开始的属性设置为：

<property> 
<name> http.agent.name</name> 
<value> </value> 
</property>

则可能会抛出Fetcher: No agents listed in ‘http.agent.name’ property的错误提示。原因在于<value></value>中的值为空，自己加上一些东西（我想应该是随意的），改成如下所示：

<property> 
<name> http.agent.name</name> 
<value> ZB nutch agent</value> 
</property>

这种方法在nutch1.6中无效，仔细分析后发现，1.6中存在“/runtime/local”目录，所有运行都是在该目录下。找到该目录下的conf/nutch-default.xml，按上述办法即可解决。

第二个遇到的问题：

Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.

解决方法：nutch-default.xml中的<name>http.robots.agents</name>中加入spider,* 。官方并不建议这么做，最好将下面的代码复制到nutch-site.xml，默认会覆盖nutch-default.xml中的配置（推荐）。

<property>
  <name>http.agent.name</name>
  <value>spider</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.
 
  NOTE: You should also check other related properties:
 
 http.robots.agents
 http.agent.description
 http.agent.url
 http.agent.email
 http.agent.version
 
  and set their values appropriately.
 
  </description>
</property>
 
<property>
  <name>http.robots.agents</name>
  <value>spider,*</value>
  <description>The agent strings we'll look for in robots.txt files,
  comma-separated, in decreasing order of precedence. You should
  put the value of http.agent.name as the first agent name, and keep the
  default * at the end of the list. E.g.: BlurflDev,Blurfl,*
  </description>
</property>

（三个Input path doesn't exist问题：