jackrabbit1.4对纯文本(text/plain )的全文检索支持

jackrabbit1.4对纯文本(text/plain )的全文检索支持

采用JCR(JSR170)内容数据库实现API jackrabbit 开发SmartNote程序的时候,发现了一个怪异的现象。

jackrabbit 全文检索对Word、PDF、Html等格式的文件有很好的支持,但是无法对纯文本(text/plain)进行检索。

这种现象让人感觉到很奇怪,如果把文本文件存放在proptery中,可以被检索到,存放在nt:file就没法检索到了。

检查默认产生的workspace.xml文件,可以看到如下信息

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="textFilterClasses" value="org.apache.jackrabbit.extractor.MsWordTextExtractor,org.apache.jackrabbit.extractor.MsExcelTextExtractor,org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,org.apache.jackrabbit.extractor.RTFTextExtractor,org.apache.jackrabbit.extractor.HTMLTextExtractor,org.apache.jackrabbit.extractor.XMLTextExtractor"/>
<param name="extractorPoolSize " value="2"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>

可以在textFilterClasses中看见对msword、msexcel、ppt、pdf等文件的分词解析。

独缺对text文档解析,只需在其中加上org.apache.jackrabbit.extractor.DefaultTextExtractor就可以解析text文件了

另外需要设置编码,否则也无法检索到纯文本内容,如下是代码片段

Node note = root.addNode("note");
Node filenode = note.addNode("file", "nt:file");
Node resnode = filenode.addNode("jcr:content", "nt:resource");
resnode.setProperty("jcr:mimeType", “text/plain”);
resnode.setProperty("jcr:encoding", "utf-8");

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
提示错误[ERROR] [ERROR] Some problems were encountered while processing the POMs: [ERROR] Unresolveable build extension: Plugin org.apache.maven.wagon:wagon-webdav-jackrabbit:1.0-beta-6 or one of its dependencies could not be resolved: The following artifacts could not be resolved: commons-httpclient:commons-httpclient:jar:3.1 (absent): Could not transfer artifact commons-httpclient:commons-httpclient:jar:3.1 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/146.75.112.215] failed: connect timed out @ @ [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR] The project org.drools:droolsjbpm-integration:7.74.0-SNAPSHOT (D:\droolsjbpm-integration-main\droolsjbpm-integration-main\pom.xml) has 1 error [ERROR] Unresolveable build extension: Plugin org.apache.maven.wagon:wagon-webdav-jackrabbit:1.0-beta-6 or one of its dependencies could not be resolved: The following artifacts could not be resolved: commons-httpclient:commons-httpclient:jar:3.1 (absent): Could not transfer artifact commons-httpclient:commons-httpclient:jar:3.1 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/146.75.112.215] failed: connect timed out -> [Help 2] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/PluginManagerException
最新发布
06-09

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值