主要涉及几个类
NumericRangeQuery 数值型检索类,含(NumericRangeTermEnum)数值型词项迭代器
NumericUtils 索引和检索时,数值型运算类
NumericTokenStream 索引时解析数值型字段的类
NumericField
一、核心函数
1.1 数值转换函数intToPrefixCoded
// 关键函数,使用前缀码转换将数值用字符串表示
// 数值使用前缀码转换(intToPrefixCoded)为字符串
public static int intToPrefixCoded(final int val, final int shift, final char[] buffer)
{
if (shift>31 || shift<0)
throw new IllegalArgumentException("Illegal shift value, must be 0..31");
// 10000000000000000000000000000000
// 0000 0000 0001 0001 0101 0100 0000 1010
// 1000 0000 0000 0000 0000 0000 0000 0000
// 1000 0000 0001 0001 0101 0100 0000 1010
// 补码 - 取反加一
// 1000 0000 0001 0001 0101 0100 0000 1001
// 0111 1111 1110 1110 1010 1011 1111 0110
int nChars = (31-shift)/7 + 1, len = nChars+1;
buffer[0] = (char)( shift);
int sortableBits = val ^ 0x80000000;//异或
sortableBits >>>= shift; // 逻辑移位
System.out.println(sortableBits);
while (nChars>=1)
{
// Store 7 bits per character for good efficiency when UTF-8 encoding.
// The whole number is right-justified so that lucene can prefix-encode
// the terms more efficiently.
buffer[nChars--] = (char)(sortableBits & 0x7f); // & 1111111 // 取低七位
sortableBits >>>= 7; // 右移七位
}
// 低位字节存储高位值,这样比较可以从高位起
return len;
}
例如对
int nMinLongitude = 1135626;
int nMaxLongitude = 1135632;
做前缀码转换(intToPrefixCoded)为字符串
由于低位字节存储高位值,因此数字高位的相同意味着字符串前缀的相同
字符串内容(取码值)由低位至高位为
8 0 69 40 10 和
8 0 69 40 16
可见他们有相同的前缀,因为lucene在词项编码存储的时候使用了相同前缀编码
因此此两个int用字符串表示的词项有相同前缀且顺序稳定(做排序而言)
1.2 位图标记函数
位图标记过程
public void set(long index)
{
int wordNum = expandingWordNum(index); // 第几个字节
int bit = (int)index & 0x3f; // 第几位置1
long bitmask = 1L << bit;
bits[wordNum] |= bitmask;
}
调用过程
IndexSearcher.search(Weight, Filter, Collector) line: 245
ConstantScoreQuery$ConstantWeight.scorer(IndexReader, boolean, boolean) line: 81
ConstantScoreQuery$ConstantScorer.<init>(ConstantScoreQuery, Similarity, IndexReader, Weight) line: 116
MultiTermQueryWrapperFilter.getDocIdSet(IndexReader) line: 171
MultiTermQueryWrapperFilter$2(MultiTermQueryWrapperFilter$TermGenerator).generate(IndexReader, TermEnum) line: 115
MultiTermQueryWrapperFilter$2.handleDoc(int) line: 169
OpenBitSet.set(long) line: 233
二、索引过程
索引
使用intToPrefixCoded函数将数值转换为字符串
转换结果,数值高位依次相等,字符串前缀依次相同,这样的结果是在查询时可以从值较小开始扫描,以后取的词项都是
前缀相同且值较大的词项或者值较大的词项。满足了区间扫描的过程
三、检索过程
检索
// 构建数值型查询
Integer min = new Integer(nMinLongitude);
Integer max = new Integer(nMaxLongitude);
// 生成数值型查询类NumericRangeQuery,可设置查询的步长
Query query = NumericRangeQuery.newIntRange(field,min, max,true, true);// 标志位为是否包括上下确界
// 重写query,生成用于数值型查询的词项迭代器NumericRangeTermEnum
调用过程如下
IndexSearcher(Searcher).createWeight(Query) line: 232
NumericRangeQuery(Query).weight(Searcher) line: 98
IndexSearcher.rewrite(Query) line: 306
NumericRangeQuery(MultiTermQuery).rewrite(IndexReader) line: 382
MultiTermQuery$1(MultiTermQuery$ConstantScoreAutoRewrite).rewrite(IndexReader, MultiTermQuery) line: 227
NumericRangeQuery.getEnum(IndexReader) line: 302
protected FilteredTermEnum getEnum(final IndexReader reader)
{
生成词项迭代器
return new NumericRangeTermEnum(reader);
}
// 生成迭代器同时依旧步长切分数值范围(若干个块)
NumericUtils.splitIntRange(new NumericUtils.IntRangeBuilder()
// 切分值填充于rangeBounds,每个块有上下确界
// 过程续上一步
NumericRangeQuery$NumericRangeTermEnum.<init>(NumericRangeQuery, IndexReader) line: 449
NumericUtils.splitIntRange(NumericUtils$IntRangeBuilder, int, int, int) line: 359
NumericUtils.splitRange(Object, int, int, long, long) line: 367
遍历所有合符条件的词项,根据词项的postings做位图标记
// 执行过程如下
IndexSearcher(Searcher).search(Query, Collector) line: 130
IndexSearcher.search(Weight, Filter, Collector) line: 245
ConstantScoreQuery$ConstantWeight.scorer(IndexReader, boolean, boolean) line: 81
ConstantScoreQuery$ConstantScorer.<init>(ConstantScoreQuery, Similarity, IndexReader, Weight) line: 116
MultiTermQueryWrapperFilter.getDocIdSet(IndexReader) line: 171
(MultiTermQueryWrapperFilter$TermGenerator).generate(IndexReader, TermEnum) line: 100
MultiTermQueryWrapperFilter$2
// 位图标记过程
abstract class MultiTermQueryWrapperFilter::TermGenerator
{
public void generate(IndexReader reader, TermEnum enumerator) throws IOException
{
final int[] docs = new int[32];
final int[] freqs = new int[32];
TermDocs termDocs = reader.termDocs();
try {
int termCount = 0;
do {
Term term = enumerator.term(); // "enumerator"= NumericRangeTermEnum (id=579)
if (term == null)
break;
termCount++;
termDocs.seek(term);
while (true) {
// 读取该词项的postings
final int count = termDocs.read(docs, freqs);
if (count != 0)
{
for(int i=0;i<count;i++)
{
handleDoc(docs[i]); // 标记过程
}
} else {
break;
}
}
} while (enumerator.next());
query.incTotalNumberOfTerms(termCount); // 下一个符合的词项
} finally {
termDocs.close();
}
}
abstract public void handleDoc(int doc);
}