eclipse 搭建完成nutch 2.2.1 之后,运行,报错如下:
org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=http
at org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:91)
at org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:491)
此问题的解决办法:
修改前:
<property>
<name>plugin.includes</name>
<value>protocol-selenium|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic
</value>
</property>
修改后:
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic
</value>
</property>
就修改protocol-selenium 修改为 protocol-http 或者httpclient
建议改为:httpclient
网上查:
Selenium也是一个用于Web应用程序测试的工具。Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。支持的浏览器包括IE(7、8、9)、Mozilla Firefox、Mozilla Suite等