JRE Version Migration Guide&Lucene JDK版本影响&升级注意

最新推荐文章于 2023-10-06 20:51:54 发布

xiaobluesky

最新推荐文章于 2023-10-06 20:51:54 发布

阅读量925

点赞数

分类专栏： Lucene

Lucene 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

JRE Version Migration Guide

If possible, use the same JRE major version at both index and search time. When upgrading to a different JRE major version, consider re-indexing.

尽量保证索引和查询时，使用相同的JRE版本，当升级了JRE版本的时候，最好考虑重建索引。

Different JRE major versions may implement different versions of Unicode, which will change the way some parts of Lucene treat your text.

For example: with Java 1.4, LetterTokenizer will split around the character U+02C6, but with Java 5 it will not. This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4.

不同版本的JRE版本实现了不同版本的unicode。

如：jdk 1.4实现Unicode 3.0 jdk1.5实现Unicode 4.0

For reference, JRE major versions with their corresponding Unicode versions:

Java 1.4, Unicode 3.0
Java 5, Unicode 4.0
Java 6, Unicode 4.0
Java 7, Unicode 6.0
Java 8, Unicode 6.2
Java 9 (not yet released / offcially supported by Lucene), Unicode 7.0

In general, whether or not you need to re-index largely depends upon the data that you are searching, and what was changed in any given Unicode version. For example, if you are completely sure that your content is limited to the "Basic Latin" range of Unicode, you can safely ignore this.

Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION

StandardAnalyzer will return the same results under Java 5 as it did under Java 1.4. This is because it is largely independent of the runtime JRE for Unicode support, (with the exception of lowercasing). However, no changes to casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are using this Analyzer you are NOT affected.
SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and LowerCaseTokenizer may return different results, along with many other Analyzers and TokenStreams in Lucene's analysis modules. If you are using one of these components, you may be affected.