regain 安装

一、修改增加中文分词模块为 Paoding-analysis

非常简单,只需要修改一个源码文件。

源代码文件(以下都用下划线表示):src\net\sf\regainRegainToolKit.java

import net.paoding.analysis.analyzer.PaodingAnalyzer;
import org.apache.lucene.analysis.cn.ChineseAnalyzer;

public static Analyzer createAnalyzer(String analyzerType,
String[] stopWordList, String[] exclusionList, String[] untokenizedFieldNames)
throws RegainException

if (analyzerType.equalsIgnoreCase("english")) {
analyzerClassName = StandardAnalyzer.class.getName();
} else if (analyzerType.equalsIgnoreCase("german")) {
analyzerClassName = GermanAnalyzer.class.getName();
} else if (analyzerType.equalsIgnoreCase("chinese")){
analyzerClassName = ChineseAnalyzer.class.getName();//Add by ping.
} else if (analyzerType.equalsIgnoreCase("paoding")){
analyzerClassName = PaodingAnalyzer.class.getName();//Add by ping.
}

源码修改只涉及以上一个文件,但是要完整编译和最终运成功,还需要其他修改。
主要包括:
1.修改ant的编译配置文件build.xml,
2.拷贝paoding-analysis.jar到lib目录。

build.xml修改如下:
[这里摘录修改的片段,修改增加部分为粗体]
...
<target name="runtime-desktop" depends="prepare-once, runtime-desktop-fast">
<echo message="Creating the jars ..." />
<fileset id="desktop-common-jars" dir="build/included-lib-classes/common">
<include name="org/apache/lucene/**"/>
<include name="org/apache/log4j/**"/>
<include name="org/apache/regexp/**"/>
<!-- Add by ping. -->
<include name="net/paoding/analysis/**"/>
<include name="paoding-*.properties"/>
<include name="org/apache/commons/**"/>

...
<target name="runtime-server" depends="prepare-once, runtime-server-fast, -web-temps">
<jar jarfile="build/runtime/crawler/${programname.file}-crawler.jar"
compress="false"
index="true">
<manifest>
<attribute name="Main-Class" value="net.sf.regain.crawler.Main"/>
</manifest>
<fileset dir="build/included-lib-classes/common">
<include name="org/apache/lucene/**"/>
<include name="org/apache/log4j/**"/>
<include name="org/apache/regexp/**"/>

<!-- Add by ping. -->
<include name="net/paoding/analysis/**"/>
<include name="paoding-*.properties"/>
<include name="org/apache/commons/**"/>
...

<mkdir dir="build/runtime/search/webapps"/>
<war destfile="build/runtime/search/webapps/${programname.file}.war"
webxml="web/server/web-inf/web.xml">
<classes dir="build/classes">
<exclude name="net/sf/regain/crawler/**"/>
<exclude name="net/sf/regain/ui/desktop/**"/>
<exclude name="net/sf/regain/util/sharedtag/simple/**"/>
<exclude name="net/sf/regain/util/ui/**"/>
</classes>
<lib dir="lib">
<include name="lucene-*.jar"/>
<include name="jakarta-regexp-*.jar"/>
<include name="log4j-*.jar"/>
<!--Add by ping.-->
<include name="paoding-*.jar"/>
<include name="commons-logging*.jar"/>
</lib>

...
<mkdir dir="${deploy-target.dir}/${programname.file}/WEB-INF/lib"/>
<copy todir="${deploy-target.dir}/${programname.file}/WEB-INF/lib">
<fileset dir="lib">
<include name="lucene-*.jar"/>
<include name="jakarta-regexp-*.jar"/>
<include name="log4j-*.jar"/>
<!--Add by ping.-->
<include name="paoding-*.jar"/>
<include name="commons-logging*.jar"/>
</fileset>
</copy>


二、修改查询结果片段长度


1.默认查询结果显示片段为100个字节,
个人认为比较短,可以修改为结果片段长度为300.

lucene\contrib\highlighter\src\java
org.apache.lucene.search.highlight
SimpleFragmenter.java

public class SimpleFragmenter implements Fragmenter
{
private static final int DEFAULT_FRAGMENT_SIZE =100*3;
定于查询结果片段的长度。默认为100字节,修改为300字节



三、另外,对查询结果页面进行稍微修改。

1.package net.sf.regain.search.results;
SingleSearchRusults.jsp

public void highlightHitDocument(int index)
resHighlSummary = highlighter.getBestFragments(tokenStream, text, 3,
" . . . . . . <br><span class=\"resultTag\">[Result]</span> ");
定于查询结果显示。

2.web\web\common
search.jsp

<search:list msgNoResults="<tr><td colspan='2'>{msg:noResultsFound}<br/><br/></td></tr>">
<tr><td colspan="2">
<search:hit_typeicon imgpath="img/ext"/> <search:hit_link/>
<span class="hitDetails">
(<search:msg key="relevance"/>: <search:hit_score/>)<br/>
<span class="resultTag">[Result]</span>
<search:hit_field field="summary"/><br/>
<search:hit_content/>
<search:hit_path after="<br/>" createLinks="true"/>
<search:hit_field field="mimetype"/> 
<span class="hitInfo"><search:hit_url beautified="true"/> - <search:hit_size/></span><br/>
<br/></span>
</td></tr>
</search:list>

查询结果显示页面和显示数据域的定义。


3.增加显示样式
src\web\common
regain.css

.resultTag {
color: #0000FF;
font-weight: bold;
}

4.一点小修饰,获取文章内容的按钮默认是德文,翻译成英文表示。
src/net/sf/regain/search/sharedlib/hit/ContentTag.java
protected void printEndTag(PageRequest request, PageResponse response,
Document hit, int hitIndex)
throws RegainException {

String content = null;
content = hit.get("content");
if (content != null) {
String hitNumber = Integer.toString(hitIndex + 1);
response.print("<input type=\"button\" class=\"button\" οnclick=\"return toggleMe('hit_" +
hitNumber + "')\" value=\"Click here Get " + hitNumber + " content\">");


property文件
词典库文件
编码问题


regain增加paoding中文分词以及server端版本设置
原文来自:http://monner.iteye.com/blog/254804
———————————————————————-
补充:
用paoding中文分词,先建立词典
vi /etc/profile
export PAODING_DIC_HOME=/data/paoding/dic
将paoding的dic目录里的内容copy到 /data/paoding/dic
windows设置见手册

另外导入lucene/contrib/memory下的包lucene-memory到regain/lib中.再编译.

server版本中有个问题需要修改.如果出现乱码可尝试将
src/net/sf/regain/search/SearchToolkit.java
修改为下面的
queryString = query.toString().trim();

//add by robin
try {
queryString = new String(queryString.getBytes(”iso-8859-1″),”UTF-8″);
} catch (Exception e) {
}
request.setContextAttribute(SEARCH_QUERY_CONTEXT_ATTR_NAME, queryString);
}


return queryString;
——————————-
regain的服务器版本端配置关键修改点



file:///home/admin/domains/25q.net/

然后在

file:///home/admin/domains/25q.net/

这里两处路径都需要加.否则会导致 index empty的错误

原文部分内容:

一、修改增加中文分词模块为 Paoding-analysis

非常简单,只需要修改一个源码文件。

源代码文件(以下都用下划线表示):src\net\sf\regainRegainToolKit.java

import net.paoding.analysis.analyzer.PaodingAnalyzer;
import org.apache.lucene.analysis.cn.ChineseAnalyzer;

public static Analyzer createAnalyzer(String analyzerType,
String[] stopWordList, String[] exclusionList, String[] untokenizedFieldNames)
throws RegainException

if (analyzerType.equalsIgnoreCase(”english”)) {
analyzerClassName = StandardAnalyzer.class.getName();
} else if (analyzerType.equalsIgnoreCase(”german”)) {
analyzerClassName = GermanAnalyzer.class.getName();
} else if (analyzerType.equalsIgnoreCase(”chinese”)){
analyzerClassName = ChineseAnalyzer.class.getName();//Add by ping.
} else if (analyzerType.equalsIgnoreCase(”paoding”)){
analyzerClassName = PaodingAnalyzer.class.getName();//Add by ping.
}

源码修改只涉及以上一个文件,但是要完整编译和最终运成功,还需要其他修改。
主要包括:
1.修改ant的编译配置文件build.xml,
2.拷贝paoding-analysis.jar到lib目录。

build.xml修改见原文地址
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值