AboutPlugin

AboutPlugins

 

Nutch's plugin system is based on the one used in Eclipse 2.x. Plugins are central to how nutch works. All of the parsing, indexing and searching that nutch does is actually accomplished by various plugins.

In writing a plugin, you're actually providing one or more extensions of the existing extension-points . The core Nutch extension-pointsare themselves defined in a plugin, the NutchExtensionPoints plugin (they are listed in the NutchExtensionPoints plugin.xml file). Each extension-point defines an interface that must be implemented by the extension. The core extension points are:

  • OnlineClusterer -- An extension point interface for online search results clustering algorithms (from javadoc).
  • IndexingFilter -- Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse (from javadoc).
  • Ontology
  • Parser -- Parser implementations read through fetched documents in order to extract data to be indexed. This is what you need to implement if you want Nutch to be able to parse a new type of content, or extract more data from currently parseable content.
  • HtmlParseFilter -- Permits one to add additional metadata to HTML parses (from javadoc).
  • Protocol -- Protocol implementations allow nutch to use different protocols (ftp, http, etc.) to fetch documents.
  • QueryFilter -- Extension point for query translation. Permits one to add metadata to a query (from javadoc).
  • URLFilter -- URLFilter implementations limit the URLs that nutch attempts to fetch. The RegexURLFilter distributed with Nutch provides a great deal of control over what URLs Nutch crawls, however if you have very complicated rules about what URLs you want to crawl, you can write your own implementation.

NutchAnalyzer -- An extension point that provides some language specific analyzers (see MultiLingualSupport proposal). Since it is in development stage, it is not in released javadoc.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值