[转贴] 一个灵活的索引工具

转载 2005年05月23日 18:15:00

SWISH-E is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other files. See the article How to Index Anything by Josh Rabinowitz in the Linux Journal for more information.

Key features

  • Quickly index a large number of documents in different formats including text, HTML, and XML
  • Use "filters" to index other types of files such as PDF, gzip, or Postscript.
  • Includes a web spider for indexing remote documents over HTTP. Follows Robots Exclusion Rules (including META tags).?
  • Can use an external program to supply documents to Swish-e, such as an advanced spider for your web server or a program to read and format records from a relational database.
  • Document "properties" (some subset of the source document, usually defined as a META or XML elements) may be stored in the index and returned with search results
  • Document summaries can be returned with each search
  • Word stemming, soundex, metaphone, and double-metaphone indexing for ``fuzzy'' searching
  • Phrase searching and wildcard searching
  • Limit searches to HTML links
  • Use powerful Regular Expressions to select documents for indexing or exclusion
  • Easily limit searches to parts or all of your web site
  • Results can be sorted by relevance or by any number of properties in ascending or descending order
  • Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements.
  • Can report structural errors in your XML and HTML documents
  • Index file is portable between platforms.
  • A Swish-e library is provided to allow embedding Swish-e into your applications for very fast searching. A Perl module is available that provides a standard API for accessing Swish-e.
  • Includes example search script with context summaries and search term and phrase highlighting. Can be used with popular Perl templating systems.
  • Swish-e is fast.
  • It's open source and FREE! You can customize Swish-e and you can contribute your fancy new features to the project.
  • Supported by on-line user and developer groups

Further information about SWISH-E is available at http://www.swish-e.org/.

相关文章推荐

动易系统的论坛转贴工具

  • 2010年06月06日 12:26
  • 4KB
  • 下载

转贴:目前流行的缺陷管理工具

缺陷管理工具: 1.   Bugzilla 2.  Bugfree 3.  TestDirector (QualityCenter) 4.  ClearQuest 5.  JIRA 6....
  • cs02308
  • cs02308
  • 2014年04月30日 16:33
  • 410

[转贴]一个老程序员的心里话说到人的心坎

[转贴]一个老程序员的心里话说到人的心坎 诸位,咱当电子工程师也是十余年了,不算有出息,环顾四周,也没有看见几个有出息的!回顾工程师生涯,感慨万千,愿意讲几句掏心窝子的话,也算给咱们师弟...
  • wu_zf
  • wu_zf
  • 2011年12月08日 17:20
  • 534

手把手教你把Vim改装成一个IDE编程环境(图文)(好文转贴)

手把手教你把Vim改装成一个IDE编程环境(图文)By:吴垠Date:2007-09-07Version:0.5Email:lazy.fox.wu#gmail.comHomepage:http://b...

[转贴] .NET委托:一个C#睡前故事

英文版原作者:Chris Sells(www.sellsbrothers.com) 翻译:袁晓辉(www.farproc.com http://blog.csdn.net/uoyevoli) ...

有空请看,转贴 —— 一个中国工程师眼中的三星

转自“天涯,国际观察”,作者:我是主力2012   发现最近很多韩国人或者韩国雇佣的人天天在国观发三星的消息,感觉牛逼得一塌糊涂。   韩国人的自豪感直线上升,我想通过三星谈谈什么才是国家的核心...

转贴:一个简单的C#多线程间同步的例子

在开发中经常会遇到线程的例子,如果某个后台操作比较费时间,我们就可以启动一个线程去执行那个费时的操作,同时程序继续执行。在某些情况下可能会出现多个线程的同步协同的问题,下面的例子就展示了在两个线程之间...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:[转贴] 一个灵活的索引工具
举报原因:
原因补充:

(最多只允许输入30个字)