Lucene DocValues 多值写入顺序并不能保证
目录
1、背景
在工作中使用到ES 5.3.2的脚本(painless)排序时,业务逻辑较为复杂,需要获取存储的字段值列表,比如
sortfield:["value3","value1","value2"]
之前使用时只是判断是否存在,是布尔型判断。
但新的业务需求需要使用其顺序,此时测试发现脚本中获取的顺序并不是写入的数据,此时需要调研下写入顺序是否和取出后的顺序不一致,脚本使用代码示例:
if(doc.containsKey('sortfield') && doc['sortfield'].values.contains('value1')){...}
2、ES和Lucene
ES 5.3.2底层使用的是Lucene 6.4.2,其doc values也是使用的底层Lucene的实现,故需要查找Lucene的doc values实现过程。
2.1、Lucene doc values字段介绍
这里测试用到两个类型:
NumericDocValuesField:数值型多值存储
SortedDocValuesField:其他型多值存储
2.2、模拟写入和读取
maven-pom:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>kite-lucene-learn</artifactId>
<groupId>kite-lucene-learn</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>lucene642</artifactId>
<dependencies>
<!-- https://mvnrepository.com/artifact/junit/junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>compile</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>6.4.2</version>
</dependency>
</dependencies>
</project>
package docvalues;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.document.SortedSetDocValuesField;
import org.apache.lucene.index.*;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.junit.Test;
import java.io.IOException;
public class TestDocValues {
final static String FIELDNAME_VAL = "sortfield";
final static String FIELDNAME_ID = "id";
@Test
public void testDocValuesMultiValuesSort() throws IOException {
RAMDirectory ram = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(ram, config)<