lucene源码分析—flush
在前几章的分析中经常遇到flush操作,即当索引的相关数据存入内存中的某些数据结构后,再适当的实际就会通过flush函数将这些数据写入文件中,本章就开始分析flush函数,从DocumentsWriter的doflush函数开始分析,下面来看。
DocumentsWriter::doflush
private boolean doFlush(DocumentsWriterPerThread flushingDWPT) throws IOException, AbortingException {
boolean hasEvents = false;
while (flushingDWPT != null) {
hasEvents = true;
boolean success = false;
SegmentFlushTicket ticket = null;
try {
try {
ticket = ticketQueue.addFlushTicket(flushingDWPT);
final int flushingDocsInRam = flushingDWPT.getNumDocsInRAM();
boolean dwptSuccess = false;
try {
final FlushedSegment newSegment = flushingDWPT.flush();
ticketQueue.addSegment(ticket, newSegment);
dwptSuccess = true;
} finally {
subtractFlushedNumDocs(flushingDocsInRam);
if (!flushingDWPT.pendingFilesToDelete().isEmpty()) {
putEvent(new DeleteNewFilesEvent(flushingDWPT.pendingFilesToDelete()));
hasEvents = true;
}
if (!dwptSuccess) {
putEvent(new FlushFailedEvent(flushingDWPT.getSegmentInfo()));
hasEvents = true;
}
}
success = true;
} finally {
if (!success && ticket != null) {
ticketQueue.markTicketFailed(ticket);
}
}
if (ticketQueue.getTicketCount() >= perThreadPool.getActiveThreadStateCount()) {
putEvent(ForcedPurgeEvent.INSTANCE);
break;
}
} finally {
flushControl.doAfterFlush(flushingDWPT);
}
flushingDWPT = flushControl.nextPendingFlush();
}
if (hasEvents) {
putEvent(MergePendingEvent.INSTANCE);
}
final double ramBufferSizeMB = config.getRAMBufferSizeMB();
if (ramBufferSizeMB != IndexWriterConfig.DISABLE_AUTO_FLUSH &&
flushControl.getDeleteBytesUsed() > (1024*1024*ramBufferSizeMB/2)) {
hasEvents = true;
if (!this.applyAllDeletes(deleteQueue)) {
putEvent(ApplyDeletesEvent.INSTANCE);
}
}
return hasEvents;
}
传入的参数flushingDWPT是DocumentsWriterPerThread类型代表一个索引文档的线程。
ticketQueue被定义为DocumentsWriterFlushQueue,用来同步多个flush线程,其addFlushTicket定义如下,
DocumentsWriter::doflush->DocumentsWriterFlushQueue::addFlushTicket
synchronized SegmentFlushTicket addFlushTicket(DocumentsWriterPerThread dwpt) {
incTickets();
boolean success = false;
try {
final SegmentFlushTicket ticket = new SegmentFlushTicket(dwpt.prepareFlush());
queue.add(ticket);
success = true;
return ticket;
} finally {
if (!success) {
decTickets();
}
}
}
addFlushTicket函数首先通过incTickets增加计数。prepareFlush操作在flush还没开始前将一些被标记的文档删除。该函数主要创建一个SegmentFlushTicket并添加进内部队列queue中。。
回到DocumentsWriter的doflush函数中,该函数继续通过getNumDocsInRAM获得在内存中的文档数,然后调用DocumentsWriterPerThread的flush函数继续进行。
DocumentsWriter::doflush->DocumentsWriterPerThread::flush
FlushedSegment flush() throws IOException, AbortingException {
segmentInfo.setMaxDoc(numDocsInRAM);
final SegmentWriteState flushState = new SegmentWriteState(infoStream, directory, segmentInfo, fieldInfos.finish(), pendingUpdates, new IOContext(new FlushInfo(numDocsInRAM, bytesUsed())));
final double startMBUsed = bytesUsed() / 1024. / 1024.;
if (pendingUpdates.docIDs.size() > 0) {
flushState.liveDocs = codec.liveDocsFormat().newLiveDocs(numDocsInRAM);
for(int delDocID : pendingUpdates.docIDs) {
flushState.liveDocs.clear(delDocID);
}
flushState.delCountOnFlush = pendingUpdates.docIDs.size();
pendingUpdates.bytesUsed.addAndGet(-pendingUpdates.docIDs.size() * BufferedUpdates.BYTES_PER_DEL_DOCID);
pendingUpdates.docIDs.clear();
}
if (aborted) {
return null;
}
long t0 = System.nanoTime();
try {
consumer.flush(flushState);
pendingUpdates.terms.clear();
segmentInfo.setFiles(new HashSet<>(directory.getCreatedFiles()));
final SegmentCommitInfo segmentInfoPerCommit = new SegmentCommitInfo(segmentInfo, 0, -1L, -1L, -1L);
final BufferedUpdates segmentDeletes;
if (pendingUpdates.queries.isEmpty() && pendingUpdates.numericUpdates.isEmpty() && pendingUpdates.binaryUpdates.isEmpty()) {
pendingUpdates.clear();
segmentDeletes = null;
} else {
segmentDeletes = pendingUpdates;
}
FlushedSegment fs = new FlushedSegment(segmentInfoPerCommit, flushState.fieldInfos, segmentDeletes, flushState.liveDocs, flushState.delCountOnFlush);
sealFlushedSegment(fs);
return fs;
} catch (Throwable th) {
}
}
flush函数的pendingUpdates保存了待删除或更新的文档ID。假设待删除或更新的文档数大于0,就要标记处这些文档,接下来的codec被定义为Lucene60Codec,往下跟踪可知liveDocsFormat函数返回Lucene50LiveDocsFormat,Lucene50LiveDocsFormat的newLiveDocs函数创建FixedBitSet用来标记待删除或更新的文档ID。再往下的consumer在创建函数中被定义为DefaultIndexingChain,下面开始重点看DefaultIndexingChain的flush函数。
DefaultIndexingChain的flush函数
DefaultIndexingChain的flush函数代码如下所示,
DocumentsWriter::doflush->DocumentsWriterPerThread::flush->DefaultIndexingChain::flush
public void flush(SegmentWriteState state) throws IOException, AbortingException {
int maxDoc = state.segmentInfo.maxDoc();
long t0 = System.nanoTime();
writeNorms(state);
t0 = System.nanoTime();
writeDocValues(state);
t0 = System.nanoTime();
writePoints(state);
t0 = System.nanoTime();
initStoredFieldsWriter();
fillStoredFields(maxDoc);
storedFieldsWriter.finish(state.fieldInfos, maxDoc);
storedFieldsWriter.close();
t0 = System.nanoTime();
Map<String,TermsHashPerField> fieldsToFlush = new HashMap<>();
for (int i=0;i<fieldHash.length;i++) {
PerField perField = fieldHash[i];
while (perField != null) {
if (perField.invertState != null) {
fieldsToFlush.put(perField.fieldInfo.name, perField.termsHashPerField);
}
perField = perField.next;
}
}
termsHash.flush(fieldsToFlush, state);
t0 = System.nanoTime();
docWriter.codec.fieldInfosFormat().write(state.directory, state.segmentInfo, "", state.fieldInfos, IOContext.DEFAULT);
}
参数state中的segmentInfo是DocumentsWriterPerThread构造函数中创建的SegmentInfo,保存了相应的段信息,maxDoc函数返回目前在内存中的文档树。DefaultIndexingChain的flush函数接下来通过writeNorms函数将norm信息写入.nvm和.nvd文件中。
DefaultIndexingChain的writeNorms函数
DefaultIndexingChain::flush->DefaultIndexingChain::writeNorms
private void writeNorms(SegmentWriteState state) throws IOException {
boolean success = false;
NormsConsumer normsConsumer = null;
try {
if (state.fieldInfos.hasNorms()) {
NormsFormat normsFormat = state.segmentInfo.getCodec().normsFormat();
normsConsumer = normsFormat.normsConsumer(state);
for (FieldInfo fi : state.fieldInfos) {
PerField perField = getPerField(fi.name);
assert perField != null;
if (fi.omitsNorms() == false && fi.getIndexOptions() != IndexOptions.NONE) {
perField.norms.finish(state.segmentInfo.maxDoc());
perField.norms.flush(state, normsConsumer);
}
}
}
success = true;
} finally {
}
}
Lucene60Codec的normsFormat函数最终返回Lucene53NormsFormat,对应的normsConsumer函数返回一个Lucene53NormsConsumer。
DefaultIndexingChain::flush->writeNorms->Lucene53NormsFormat::normsConsumer
public NormsConsumer normsConsumer(SegmentWriteState state) throws IOException {
return new Lucene53NormsConsumer(state, DATA_CODEC, DATA_EXTENSION, METADATA_CODEC, METADATA_EXTENSION);
}
Lucene53NormsConsumer(SegmentWriteState state, String dataCodec, String dataExtension, String metaCodec, String metaExtension) throws IOException {
boolean success = false;
try {
String dataName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, dataExtension);
data = state.directory.createOutput(dataName, state.context);
CodecUtil.writeIndexHeader(data, dataCodec, VERSION_CURRENT, state.segmentInfo.getId(), state.segmentSuffix);
String metaName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, metaExtension);
meta = state.directory.createOutput(metaName, state.context);
CodecUtil.writeIndexHeader(meta, metaCodec, VERSION_CURRENT, state.segmentInfo.getId(), state.segmentSuffix);
maxDoc = state.segmentInfo.maxDoc();
success = true;
} finally {
}
}
IndexFileNames的segmentFileName函数会根据传入的参数段名(例如_0)和拓展名(例如.nvd)构造文件名_0.nvd。接着通过FSDirectory的createOutput创建输出流,代码如下,
DefaultIndexingChain::flush->writeNorms->Lucene53NormsFormat::normsConsumer->TrackingDirectoryWrapper::createOutput
public IndexOutput createOutput(String name, IOContext context) throws IOException {
IndexOutput output = in.createOutput(name, context);
createdFileNames.add(name);
return output;
}
public IndexOutput createOutput(String name, IOContext context) throws IOException {
ensureOpen();
pendingDeletes.remove(name);
maybeDeletePendingFiles();
return new FSIndexOutput(name);
}
public FSIndexOutput(String name) throws IOException {
this(name, StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE);
}
createOutput函数内部会根据文件名创建一个FSIndexOutput并返回。
回到Lucene53NormsFormat的normsConsumer函数中,接下来就通过writeIndexHeader向文件写入头信息。
DefaultIndexingChain::flush->writeNorms->Lucene53NormsFormat::normsConsumer->CodecUtil::writeIndexHeader
public static void writeIndexHeader(DataOutput out, String codec, int version, byte[] id, String suffix) throws IOException {
writeHeader(out, codec, version);
out.writeBytes(id, 0, id.length);
BytesRef suffixBytes = new BytesRef(suffix);
out.writeByte((byte) suffixBytes.length);
out.writeBytes(suffixBytes.bytes, suffixBytes.offset, suffixBytes.length);
}
public static void writeHeader(DataOutput out, String codec, int version) throws IOException {
BytesRef bytes = new BytesRef(codec);
out.writeInt(CODEC_MAGIC);
out.writeString(codec);
out.writeInt(version);
}
回到DefaultIndexingChain的writeNorms函数中,接下来通过getPerField获取PerField,其中的FieldInfo保存了域信息,代码如下
DefaultIndexingChain:flush->writeNorms->getPerField
private PerField getPerField(String name) {
final int hashPos = name.hashCode() & hashMask;
PerField fp = fieldHash[hashPos];
while (fp != null && !fp.fieldInfo.name.equals(name)) {
fp = fp.next;
}
return fp;
}
继续往下看,PerField中的成员变量norms在其构造函数中被定义为NormValuesWriter,对应的finish为空函数,而flush函数如下,
DefaultIndexingChain::flush->writeNorms->NormValuesWriter::flush
public void flush(SegmentWriteState state, NormsConsumer normsConsumer) throws IOException {
final int maxDoc = state.segmentInfo.maxDoc();
final PackedLongValues values = pending.build();
normsConsumer.addNormsField(fieldInfo,
new Iterable<Number>() {
@Override
public Iterator<Number> iterator() {
return new NumericIterator(maxDoc, values);
}
});
}
这里的normsConsumer就是Lucene53NormsConsumer,对应的addNormsField函数如下所示,
DefaultIndexingChain::flush->writeNorms->NormValuesWriter::flush->Lucene53NormsConsumer::addNormsField
public void addNormsField(FieldInfo field, Iterable<Number> values) throws IOException {
meta.writeVInt(field.number);
long minValue = Long.MAX_VALUE;
long maxValue = Long.MIN_VALUE;
int count = 0;
for (Number nv : values) {
final long v = nv.longValue();
minValue = Math.min(minValue, v);
maxValue = Math.max(maxValue, v);
count++;
}
if (minValue == maxValue) {
addConstant(minValue);
} else if (minValue >= Byte.MIN_VALUE && maxValue <= Byte.MAX_VALUE) {
addByte1(values);
} else if (minValue >= Short.MIN_VALUE && maxValue <= Short.MAX_VALUE) {
addByte2(values);
} else if (minValue >= Integer.MIN_VALUE && maxValue <= Integer.MAX_VALUE) {
addByte4(values);
} else {
addByte8(values);
}
}
该函数再往下看就是将FieldInfo中的数据通过刚刚创建的FSIndexOutput写入到.nvd和.nvm文件中。
DefaultIndexingChain的writeDocValues函数
看完了writeNorms函数,接下来看writeDocValues函数,
DefaultIndexingChain::flush->writeDocValues
private void writeDocValues(SegmentWriteState state) throws IOException {
int maxDoc = state.segmentInfo.maxDoc();
DocValuesConsumer dvConsumer = null;
boolean success = false;
try {
for (int i=0;i<fieldHash.length;i++) {
PerField perField = fieldHash[i];
while (perField != null) {
if (perField.docValuesWriter != null) {
if (dvConsumer == null) {
DocValuesFormat fmt = state.segmentInfo.getCodec().docValuesFormat();
dvConsumer = fmt.fieldsConsumer(state);
}
perField.docValuesWriter.finish(maxDoc);
perField.docValuesWriter.flush(state, dvConsumer);
perField.docValuesWriter = null;
}
perField = perField.next;
}
}
success = true;
} finally {
}
}
和前面writeNorms函数中的分析类似,writeDocValues函数遍历得到每个PerField,PerField中的docValuesWriter根据不同的Field值域类型被定义为NumericDocValuesWriter、BinaryDocValuesWriter、SortedDocValuesWriter、SortedNumericDocValuesWriter和SortedSetDocValuesWriter,代码如下,
DefaultIndexingChain::flush->indexDocValue
private void indexDocValue(PerField fp, DocValuesType dvType, IndexableField field) throws IOException {
if (fp.fieldInfo.getDocValuesType() == DocValuesType.NONE) {
fieldInfos.globalFieldNumbers.setDocValuesType(fp.fieldInfo.number, fp.fieldInfo.name, dvType);
}
fp.fieldInfo.setDocValuesType(dvType);
int docID = docState.docID;
switch(dvType) {
case NUMERIC:
if (fp.docValuesWriter == null) {
fp.docValuesWriter = new NumericDocValuesWriter(fp.fieldInfo, bytesUsed);
}
((NumericDocValuesWriter) fp.docValuesWriter).addValue(docID, field.numericValue().longValue());
break;
case BINARY:
if (fp.docValuesWriter == null) {
fp.docValuesWriter = new BinaryDocValuesWriter(fp.fieldInfo, bytesUsed);
}
((BinaryDocValuesWriter) fp.docValuesWriter).addValue(docID, field.binaryValue());
break;
case SORTED:
if (fp.docValuesWriter == null) {
fp.docValuesWriter = new SortedDocValuesWriter(fp.fieldInfo, bytesUsed);
}
((SortedDocValuesWriter) fp.docValuesWriter).addValue(docID, field.binaryValue());
break;
case SORTED_NUMERIC:
if (fp.docValuesWriter == null) {
fp.docValuesWriter = new SortedNumericDocValuesWriter(fp.fieldInfo, bytesUsed);
}
((SortedNumericDocValuesWriter) fp.docValuesWriter).addValue(docID, field.numericValue().longValue());
break;
case SORTED_SET:
if (fp.docValuesWriter == null) {
fp.docValuesWriter = new SortedSetDocValuesWriter(fp.fieldInfo, bytesUsed);
}
((SortedSetDocValuesWriter) fp.docValuesWriter).addValue(docID, field.binaryValue());
break;
default:
throw new AssertionError();
}
}
为了方便分析,下面假设PerField中的docValuesWriter被定义为BinaryDocValuesWriter。
回到writeDocValues函数中,再往下通过docValuesFormat函数返回一个PerFieldDocValuesFormat,并通过PerFieldDocValuesFormat的fieldsConsumer获得一个DocValuesConsumer。
DefaultIndexingChain::flush->writeDocValues->PerFieldDocValuesFormat::fieldsConsumer
public final DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException {
return new FieldsWriter(state);
}
fieldsConsumer最后返回的其实是一个FieldsWriter。
回到DefaultIndexingChain的writeDocValues函数中,接下来继续调用docValuesWriter也即前面假设的BinaryDocValuesWriter的flush函数。
DefaultIndexingChain::flush->writeDocValues->BinaryDocValuesWriter::flush
public void flush(SegmentWriteState state, DocValuesConsumer dvConsumer) throws IOException {
final int maxDoc = state.segmentInfo.maxDoc();
bytes.freeze(false);
final PackedLongValues lengths = this.lengths.build();
dvConsumer.addBinaryField(fieldInfo,
new Iterable<BytesRef>() {
@Override
public Iterator<BytesRef> iterator() {
return new BytesIterator(maxDoc, lengths);
}
});
}
BinaryDocValuesWriter的flush函数主要调用了FieldsWriter的addBinaryField函数添加FieldInfo中的数据。
DefaultIndexingChain::flush->writeDocValues->BinaryDocValuesWriter::flush->FieldsWriter::addBinaryField
public void addBinaryField(FieldInfo field, Iterable<BytesRef> values) throws IOException {
getInstance(field).addBinaryField(field, values);
}
addBinaryField首先通过getInstance函数最终获得一个Lucene54DocValuesConsumer。
DefaultIndexingChain::flush->writeDocValues->BinaryDocValuesWriter::flush->FieldsWriter::addBinaryField->getInstance
private DocValuesConsumer getInstance(FieldInfo field) throws IOException {
DocValuesFormat format = null;
if (field.getDocValuesGen() != -1) {
final String formatName = field.getAttribute(PER_FIELD_FORMAT_KEY);
if (formatName != null) {
format = DocValuesFormat.forName(formatName);
}
}
if (format == null) {
format = getDocValuesFormatForField(field.name);
}
final String formatName = format.getName();
String previousValue = field.putAttribute(PER_FIELD_FORMAT_KEY, formatName);
Integer suffix = null;
ConsumerAndSuffix consumer = formats.get(format);
if (consumer == null) {
if (field.getDocValuesGen() != -1) {
final String suffixAtt = field.getAttribute(PER_FIELD_SUFFIX_KEY);
if (suffixAtt != null) {
suffix = Integer.valueOf(suffixAtt);
}
}
if (suffix == null) {
suffix = suffixes.get(formatName);
if (suffix == null) {
suffix = 0;
} else {
suffix = suffix + 1;
}
}
suffixes.put(formatName, suffix);
final String segmentSuffix = getFullSegmentSuffix(segmentWriteState.segmentSuffix,
getSuffix(formatName, Integer.toString(suffix)));
consumer = new ConsumerAndSuffix();
consumer.consumer = format.fieldsConsumer(new SegmentWriteState(segmentWriteState, segmentSuffix));
consumer.suffix = suffix;
formats.put(format, consumer);
} else {
suffix = consumer.suffix;
}
previousValue = field.putAttribute(PER_FIELD_SUFFIX_KEY, Integer.toString(suffix));
return consumer.consumer;
}
假设是第一次进入该函数,format会通过getDocValuesFormatForField函数被定义为Lucene54DocValuesFormat,然后通过Lucene54DocValuesFormat的fieldsConsumer函数构造一个Lucene54DocValuesConsumer并返回。
DefaultIndexingChain::flush->writeDocValues->BinaryDocValuesWriter::flush->FieldsWriter::addBinaryField->getInstance->Lucene54DocValuesFormat::fieldsConsumer
public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException {
return new Lucene54DocValuesConsumer(state, DATA_CODEC, DATA_EXTENSION, META_CODEC, META_EXTENSION);
}
public Lucene54DocValuesConsumer(SegmentWriteState state, String dataCodec, String dataExtension, String metaCodec, String metaExtension) throws IOException {
boolean success = false;
try {
String dataName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, dataExtension);
data = state.directory.createOutput(dataName, state.context);
CodecUtil.writeIndexHeader(data, dataCodec, Lucene54DocValuesFormat.VERSION_CURRENT, state.segmentInfo.getId(), state.segmentSuffix);
String metaName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, metaExtension);
meta = state.directory.createOutput(metaName, state.context);
CodecUtil.writeIndexHeader(meta, metaCodec, Lucene54DocValuesFormat.VERSION_CURRENT, state.segmentInfo.getId(), state.segmentSuffix);
maxDoc = state.segmentInfo.maxDoc();
success = true;
} finally {
}
}
和前面的分析类似,这里创建了.dvd和.dvm的文件输出流并写入相应的头信息。
Lucene54DocValuesConsumer的addBinaryField函数就不往下看了,就是调用文件输出流写入相应的数据。
DefaultIndexingChain的writePoints函数
DefaultIndexingChain::flush->writePoints
private void writePoints(SegmentWriteState state) throws IOException {
PointsWriter pointsWriter = null;
boolean success = false;
try {
for (int i=0;i<fieldHash.length;i++) {
PerField perField = fieldHash[i];
while (perField != null) {
if (perField.pointValuesWriter != null) {
if (pointsWriter == null) {
PointsFormat fmt = state.segmentInfo.getCodec().pointsFormat();
pointsWriter = fmt.fieldsWriter(state);
}
perField.pointValuesWriter.flush(state, pointsWriter);
perField.pointValuesWriter = null;
}
perField = perField.next;
}
}
if (pointsWriter != null) {
pointsWriter.finish();
}
success = true;
} finally {
}
}
和前面的分析类似,writePoints函数中的pointsFormat最终返回Lucene60PointsFormat,然后通过其fieldsWriter函数获得一个Lucene60PointsWriter。
DefaultIndexingChain::writePoints->Lucene60PointsFormat::fieldsWriter
public PointsWriter fieldsWriter(SegmentWriteState state) throws IOException {
return new Lucene60PointsWriter(state);
}
PerField中的成员变量pointValuesWriter被设置为PointValuesWriter,对应的flush函数如下所示,
DefaultIndexingChain::writePoints->PointValuesWriter::flush
public void flush(SegmentWriteState state, PointsWriter writer) throws IOException {
writer.writeField(fieldInfo,
new PointsReader() {
@Override
public void intersect(String fieldName, IntersectVisitor visitor) throws IOException {
if (fieldName.equals(fieldInfo.name) == false) {
throw new IllegalArgumentException();
}
for(int i=0;i<numPoints;i++) {
bytes.readBytes(packedValue.length * i, packedValue, 0, packedValue.length);
visitor.visit(docIDs[i], packedValue);
}
}
@Override
public void checkIntegrity() {
throw new UnsupportedOperationException();
}
@Override
public long ramBytesUsed() {
return 0L;
}
@Override
public void close() {
}
@Override
public byte[] getMinPackedValue(String fieldName) {
throw new UnsupportedOperationException();
}
@Override
public byte[] getMaxPackedValue(String fieldName) {
throw new UnsupportedOperationException();
}
@Override
public int getNumDimensions(String fieldName) {
throw new UnsupportedOperationException();
}
@Override
public int getBytesPerDimension(String fieldName) {
throw new UnsupportedOperationException();
}
@Override
public long size(String fieldName) {
return numPoints;
}
@Override
public int getDocCount(String fieldName) {
return numDocs;
}
});
}
PointValuesWriter的flush函数继而会调用Lucene60PointsWriter的writeField函数,如下所示,
DefaultIndexingChain::writePoints->PointValuesWriter::flush->Lucene60PointsWriter::writeField
public void writeField(FieldInfo fieldInfo, PointsReader values) throws IOException {
boolean singleValuePerDoc = values.size(fieldInfo.name) == values.getDocCount(fieldInfo.name);
try (BKDWriter writer = new BKDWriter(writeState.segmentInfo.maxDoc(),
writeState.directory,
writeState.segmentInfo.name,
fieldInfo.getPointDimensionCount(),
fieldInfo.getPointNumBytes(),
maxPointsInLeafNode,
maxMBSortInHeap,
values.size(fieldInfo.name),
singleValuePerDoc)) {
values.intersect(fieldInfo.name, new IntersectVisitor() {
@Override
public void visit(int docID) {
throw new IllegalStateException();
}
public void visit(int docID, byte[] packedValue) throws IOException {
writer.add(packedValue, docID);
}
@Override
public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
return Relation.CELL_CROSSES_QUERY;
}
});
if (writer.getPointCount() > 0) {
indexFPs.put(fieldInfo.name, writer.finish(dataOut));
}
}
}
结合PointValuesWriter的flush函数中PointsReader的定义,以及Lucene60PointsWriter中writeField函数中visit函数的定义,writeField函数最终会调用BKDWriter的add函数,BKD是一种数据结构,add函数定义如下,
public void add(byte[] packedValue, int docID) throws IOException {
if (pointCount >= maxPointsSortInHeap) {
if (offlinePointWriter == null) {
spillToOffline();
}
offlinePointWriter.append(packedValue, pointCount, docID);
} else {
heapPointWriter.append(packedValue, pointCount, docID);
}
if (pointCount == 0) {
System.arraycopy(packedValue, 0, minPackedValue, 0, packedBytesLength);
System.arraycopy(packedValue, 0, maxPackedValue, 0, packedBytesLength);
} else {
for(int dim=0;dim<numDims;dim++) {
int offset = dim*bytesPerDim;
if (StringHelper.compare(bytesPerDim, packedValue, offset, minPackedValue, offset) < 0) {
System.arraycopy(packedValue, offset, minPackedValue, offset, bytesPerDim);
}
if (StringHelper.compare(bytesPerDim, packedValue, offset, maxPackedValue, offset) > 0) {
System.arraycopy(packedValue, offset, maxPackedValue, offset, bytesPerDim);
}
}
}
pointCount++;
docsSeen.set(docID);
}
成员变量heapPointWriter的类型为HeapPointWriter,用来将数据写入内存;offlinePointWriter的类型为OfflinePointWriter,用来将数据写入硬盘。一开始,数据将会通过HeapPointWriter被写入内存,当内存中的数据超过maxPointsSortInHeap时,就调用spillToOffline函数进行切换。
private void spillToOffline() throws IOException {
offlinePointWriter = new OfflinePointWriter(tempDir, tempFileNamePrefix, packedBytesLength, longOrds, "spill", 0, singleValuePerDoc);
tempInput = offlinePointWriter.out;
PointReader reader = heapPointWriter.getReader(0, pointCount);
for(int i=0;i<pointCount;i++) {
boolean hasNext = reader.next();
offlinePointWriter.append(reader.packedValue(), i, heapPointWriter.docIDs[i]);
}
heapPointWriter = null;
}
OfflinePointWriter的构造函数会创建类似”段名bkd_spill临时文件数量.tmp”的文件名对应的输出流,然后通过append函数复制HeapPointWriter中的数据。
回到DefaultIndexingChain的writePoints函数中,接下来通过finish函数将数据写入最终的.dim文件中,代码如下,
public void finish() throws IOException {
finished = true;
CodecUtil.writeFooter(dataOut);
String indexFileName = IndexFileNames.segmentFileName(writeState.segmentInfo.name,
writeState.segmentSuffix,
Lucene60PointsFormat.INDEX_EXTENSION);
try (IndexOutput indexOut = writeState.directory.createOutput(indexFileName, writeState.context)) {
CodecUtil.writeIndexHeader(indexOut,
Lucene60PointsFormat.META_CODEC_NAME,
Lucene60PointsFormat.INDEX_VERSION_CURRENT,
writeState.segmentInfo.getId(),
writeState.segmentSuffix);
int count = indexFPs.size();
indexOut.writeVInt(count);
for(Map.Entry<String,Long> ent : indexFPs.entrySet()) {
FieldInfo fieldInfo = writeState.fieldInfos.fieldInfo(ent.getKey());
indexOut.writeVInt(fieldInfo.number);
indexOut.writeVLong(ent.getValue());
}
CodecUtil.writeFooter(indexOut);
}
}
数据写入.fdt以及.fdx文件
继续看DefaultIndexingChain的flush函数,接下来通过initStoredFieldsWriter函数初始化一个StoredFieldsWriter,代码如下
DefaultIndexingChain::flush->initStoredFieldsWriter
private void initStoredFieldsWriter() throws IOException {
if (storedFieldsWriter == null) {
storedFieldsWriter = docWriter.codec.storedFieldsFormat().fieldsWriter(docWriter.directory, docWriter.getSegmentInfo(), IOContext.DEFAULT);
}
}
storedFieldsFormat函数返回Lucene50StoredFieldsFormat,其fieldsWriter函数会接着调用CompressingStoredFieldsFormat的fieldsWriter函数,最后返回CompressingStoredFieldsWriter,
DefaultIndexingChain::flush->initStoredFieldsWriter->Lucene60Codec::storedFieldsFormat->Lucene50StoredFieldsFormat::fieldsWriter->CompressingStoredFieldsFormat::fieldsWriter
public StoredFieldsWriter fieldsWriter(Directory directory, SegmentInfo si,
IOContext context) throws IOException {
return new CompressingStoredFieldsWriter(directory, si, segmentSuffix, context,
formatName, compressionMode, chunkSize, maxDocsPerChunk, blockSize);
}
CompressingStoredFieldsWriter的构造函数如下所示,
DefaultIndexingChain::flush->initStoredFieldsWriter->Lucene60Codec::storedFieldsFormat->Lucene50StoredFieldsFormat::fieldsWriter->CompressingStoredFieldsFormat::fieldsWriter->CompressingStoredFieldsWriter::CompressingStoredFieldsWriter
public CompressingStoredFieldsWriter(Directory directory, SegmentInfo si, String segmentSuffix, IOContext context, String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockSize) throws IOException {
assert directory != null;
this.segment = si.name;
this.compressionMode = compressionMode;
this.compressor = compressionMode.newCompressor();
this.chunkSize = chunkSize;
this.maxDocsPerChunk = maxDocsPerChunk;
this.docBase = 0;
this.bufferedDocs = new GrowableByteArrayDataOutput(chunkSize);
this.numStoredFields = new int[16];
this.endOffsets = new int[16];
this.numBufferedDocs = 0;
boolean success = false;
IndexOutput indexStream = directory.createOutput(IndexFileNames.segmentFileName(segment, segmentSuffix, FIELDS_INDEX_EXTENSION), context);
try {
fieldsStream = directory.createOutput(IndexFileNames.segmentFileName(segment, segmentSuffix, FIELDS_EXTENSION),context);
final String codecNameIdx = formatName + CODEC_SFX_IDX;
final String codecNameDat = formatName + CODEC_SFX_DAT;
CodecUtil.writeIndexHeader(indexStream, codecNameIdx, VERSION_CURRENT, si.getId(), segmentSuffix);
CodecUtil.writeIndexHeader(fieldsStream, codecNameDat, VERSION_CURRENT, si.getId(), segmentSuffix);
indexWriter = new CompressingStoredFieldsIndexWriter(indexStream, blockSize);
indexStream = null;
fieldsStream.writeVInt(chunkSize);
fieldsStream.writeVInt(PackedInts.VERSION_CURRENT);
success = true;
} finally {
}
}
CompressingStoredFieldsWriter的构造函数创建了.fdt和.fdx两个文件并建立输出流。
回到DefaultIndexingChain的flush函数中,接下来调用fillStoredFields,进而调用startStoredFields以及finishStoredFields函数,startStoredFields函数会调用刚刚上面构造的CompressingStoredFieldsWriter的startDocument函数,该函数为空,finishStoredFields函数会调用CompressingStoredFieldsWriter的finishDocument函数,代码如下
DefaultIndexingChain::flush->fillStoredFields->startStoredFields->CompressingStoredFieldsWriter::finishDocument
public void finishDocument() throws IOException {
if (numBufferedDocs == this.numStoredFields.length) {
final int newLength = ArrayUtil.oversize(numBufferedDocs + 1, 4);
this.numStoredFields = Arrays.copyOf(this.numStoredFields, newLength);
endOffsets = Arrays.copyOf(endOffsets, newLength);
}
this.numStoredFields[numBufferedDocs] = numStoredFieldsInDoc;
numStoredFieldsInDoc = 0;
endOffsets[numBufferedDocs] = bufferedDocs.length;
++numBufferedDocs;
if (triggerFlush()) {
flush();
}
}
经过一些值得计算以及设置后,finishDocument通过triggerFlush函数判断是否需要进行flush操作,该flush函数定义在CompressingStoredFieldsWriter中,
DefaultIndexingChain::flush->fillStoredFields->startStoredFields->CompressingStoredFieldsWriter::finishDocument->flush
private void flush() throws IOException {
indexWriter.writeIndex(numBufferedDocs, fieldsStream.getFilePointer());
final int[] lengths = endOffsets;
for (int i = numBufferedDocs - 1; i > 0; --i) {
lengths[i] = endOffsets[i] - endOffsets[i - 1];
}
final boolean sliced = bufferedDocs.length >= 2 * chunkSize;
writeHeader(docBase, numBufferedDocs, numStoredFields, lengths, sliced);
if (sliced) {
for (int compressed = 0; compressed < bufferedDocs.length; compressed += chunkSize) {
compressor.compress(bufferedDocs.bytes, compressed, Math.min(chunkSize, bufferedDocs.length - compressed), fieldsStream);
}
} else {
compressor.compress(bufferedDocs.bytes, 0, bufferedDocs.length, fieldsStream);
}
docBase += numBufferedDocs;
numBufferedDocs = 0;
bufferedDocs.length = 0;
numChunks++;
}
fieldsStream是根据fdt文件创建的FSIndexOutput,对应的函数getFilePointer返回当前可以插入数据的位置。indexWriter被定义为CompressingStoredFieldsIndexWriter,是在CompressingStoredFieldsWriter的构造函数中创建的,其writeIndex函数代码如下,
DefaultIndexingChain::flush->fillStoredFields->startStoredFields->CompressingStoredFieldsWriter::finishDocument->flush->CompressingStoredFieldsIndexWriter::writeIndex
void writeIndex(int numDocs, long startPointer) throws IOException {
if (blockChunks == blockSize) {
writeBlock();
reset();
}
if (firstStartPointer == -1) {
firstStartPointer = maxStartPointer = startPointer;
}
docBaseDeltas[blockChunks] = numDocs;
startPointerDeltas[blockChunks] = startPointer - maxStartPointer;
++blockChunks;
blockDocs += numDocs;
totalDocs += numDocs;
maxStartPointer = startPointer;
}
当blockChunks增大到blockSize时,启动writeBlock函数写入索引,writeBlock函数最终向.fdx中写入索引信息,这里就不往下看了。
回到CompressingStoredFieldsWriter的flush函数中,接下来通过writeHeader函数向.fdt文件中写入头信息。再往下的compressor被定义为LZ4FastCompressor,其compress函数将缓存bufferedDocs.bytes中的数据写入到fieldStream,也即.fdt文件对应的输出流中。
再回到DefaultIndexingChain的flush函数中,接下来看finish函数,
DefaultIndexingChain::flush->CompressingStoredFieldsWriter::finish
public void finish(FieldInfos fis, int numDocs) throws IOException {
if (numBufferedDocs > 0) {
flush();
numDirtyChunks++;
} else {
}
indexWriter.finish(numDocs, fieldsStream.getFilePointer());
fieldsStream.writeVLong(numChunks);
fieldsStream.writeVLong(numDirtyChunks);
CodecUtil.writeFooter(fieldsStream);
}
CompressingStoredFieldsWriter的flush函数前面已经分析过了,这里的fieldsStream代表.fdt文件的输出流,最终将索引数据写入到fdt文件中。
DefaultIndexingChain::flush->CompressingStoredFieldsWriter::finish->CompressingStoredFieldsIndexWriter::finish
void finish(int numDocs, long maxPointer) throws IOException {
if (blockChunks > 0) {
writeBlock();
}
fieldsIndexOut.writeVInt(0);
fieldsIndexOut.writeVLong(maxPointer);
CodecUtil.writeFooter(fieldsIndexOut);
}
这里最重要的是writeBlock函数,其最终也是将索引数据写入.fdx文件中。
数据写入.tvd以及.tvx文件
回到DefaultIndexingChain的flush函数中,接下来使用fieldsToFlush封装了fieldHash函数中的域信息,然后调用termsHash的flush函数,termsHash在DefaultIndexingChain的构造函数中被定义为FreqProxTermsWriter,其flush函数代码如下,
DefaultIndexingChain::flush->FreqProxTermsWriter::flush
public void flush(Map<String,TermsHashPerField> fieldsToFlush, final SegmentWriteState state) throws IOException {
super.flush(fieldsToFlush, state);
List<FreqProxTermsWriterPerField> allFields = new ArrayList<>();
for (TermsHashPerField f : fieldsToFlush.values()) {
final FreqProxTermsWriterPerField perField = (FreqProxTermsWriterPerField) f;
if (perField.bytesHash.size() > 0) {
perField.sortPostings();
allFields.add(perField);
}
}
CollectionUtil.introSort(allFields);
Fields fields = new FreqProxFields(allFields);
applyDeletes(state, fields);
FieldsConsumer consumer = state.segmentInfo.getCodec().postingsFormat().fieldsConsumer(state);
boolean success = false;
try {
consumer.write(fields);
success = true;
} finally {
}
}
首先,FreqProxTermsWriter的父类的flush函数最终会调用TermVectorsConsumer的flush函数,定义如下,
DefaultIndexingChain::flush->TermVectorsConsumer::flush
void flush(Map<String, TermsHashPerField> fieldsToFlush, final SegmentWriteState state) throws IOException {
if (writer != null) {
int numDocs = state.segmentInfo.maxDoc();
try {
fill(numDocs);
writer.finish(state.fieldInfos, numDocs);
} finally {
}
}
}
TermVectorsConsumer的flush函数中的writer为Lucene50TermVectorsFormat中创建的CompressingTermVectorsWriter,下面主要看其的finish函数,
DefaultIndexingChain::flush->TermVectorsConsumer::flush->CompressingTermVectorsWriter::finish
public void finish(FieldInfos fis, int numDocs) throws IOException {
if (!pendingDocs.isEmpty()) {
flush();
numDirtyChunks++;
}
indexWriter.finish(numDocs, vectorsStream.getFilePointer());
vectorsStream.writeVLong(numChunks);
vectorsStream.writeVLong(numDirtyChunks);
CodecUtil.writeFooter(vectorsStream);
}
这里的indexWriter是CompressingStoredFieldsIndexWriter,其finish前面分析过了,该函数最终将索引数据写入.tvx文件中,下面来看flush函数,
DefaultIndexingChain::flush->TermVectorsConsumer::flush->CompressingTermVectorsWriter::finish->flush
private void flush() throws IOException {
final int chunkDocs = pendingDocs.size();
assert chunkDocs > 0 : chunkDocs;
indexWriter.writeIndex(chunkDocs, vectorsStream.getFilePointer());
final int docBase = numDocs - chunkDocs;
vectorsStream.writeVInt(docBase);
vectorsStream.writeVInt(chunkDocs);
final int totalFields = flushNumFields(chunkDocs);
if (totalFields > 0) {
final int[] fieldNums = flushFieldNums();
flushFields(totalFields, fieldNums);
flushFlags(totalFields, fieldNums);
flushNumTerms(totalFields);
flushTermLengths();
flushTermFreqs();
flushPositions();
flushOffsets(fieldNums);
flushPayloadLengths();
compressor.compress(termSuffixes.bytes, 0, termSuffixes.length, vectorsStream);
}
pendingDocs.clear();
curDoc = null;
curField = null;
termSuffixes.length = 0;
numChunks++;
}
indexWriter是CompressingStoredFieldsIndexWriter,writeIndex前面分析过了,最终将数据写入到.tvx文件中。
CompressingTermVectorsWriter的flush函数接下来调用flushNumFields向.tvd文件中写入索引信息,代码如下,
DefaultIndexingChain::flush->TermVectorsConsumer::flush->CompressingTermVectorsWriter::finish->flush->flushNumFields
private int flushNumFields(int chunkDocs) throws IOException {
if (chunkDocs == 1) {
final int numFields = pendingDocs.getFirst().numFields;
vectorsStream.writeVInt(numFields);
return numFields;
} else {
writer.reset(vectorsStream);
int totalFields = 0;
for (DocData dd : pendingDocs) {
writer.add(dd.numFields);
totalFields += dd.numFields;
}
writer.finish();
return totalFields;
}
}
writer和vectorsStream都是.tvd文件对应的输出流的封装,最终都是将索引写入该文件中。
再看CompressingTermVectorsWriter的flush函数,类似flushFieldNums函数的源码,后面的flushFields、flushFlags、flushNumTerms、flushTermLengths、flushTermFreqs、flushPositions、flushOffsets、flushPayloadLengths以及LZ4FastCompressor的compress函数都是最终将索引信息写入.tvd文件中,这里就不往下看了。
分析完了TermVectorsConsumer的flush函数后,在回头看FreqProxTermsWriter的flush函数,接下来的consumer被设置为FieldsWriter,下面来看其write函数,
DefaultIndexingChain::flush->FreqProxTermsWriter::flush->FieldsWriter::write
public void write(Fields fields) throws IOException {
Map<PostingsFormat,FieldsGroup> formatToGroups = new HashMap<>();
Map<String,Integer> suffixes = new HashMap<>();
for(String field : fields) {
FieldInfo fieldInfo = writeState.fieldInfos.fieldInfo(field);
final PostingsFormat format = getPostingsFormatForField(field);
String formatName = format.getName();
FieldsGroup group = formatToGroups.get(format);
if (group == null) {
Integer suffix = suffixes.get(formatName);
if (suffix == null) {
suffix = 0;
} else {
suffix = suffix + 1;
}
suffixes.put(formatName, suffix);
String segmentSuffix = getFullSegmentSuffix(field,
writeState.segmentSuffix,
getSuffix(formatName, Integer.toString(suffix)));
group = new FieldsGroup();
group.state = new SegmentWriteState(writeState, segmentSuffix);
group.suffix = suffix;
formatToGroups.put(format, group);
} else {
}
group.fields.add(field);
String previousValue = fieldInfo.putAttribute(PER_FIELD_FORMAT_KEY, formatName);
previousValue = fieldInfo.putAttribute(PER_FIELD_SUFFIX_KEY, Integer.toString(group.suffix));
}
boolean success = false;
try {
for(Map.Entry<PostingsFormat,FieldsGroup> ent : formatToGroups.entrySet()) {
PostingsFormat format = ent.getKey();
final FieldsGroup group = ent.getValue();
Fields maskedFields = new FilterFields(fields) {
@Override
public Iterator<String> iterator() {
return group.fields.iterator();
}
};
FieldsConsumer consumer = format.fieldsConsumer(group.state);
toClose.add(consumer);
consumer.write(maskedFields);
}
success = true;
} finally {
}
}
简单分析一下FieldsWriter的write函数,该函数最重要的部分是继续调用consumer的write函数。最后的format是通过getPostingsFormatForField设置为Lucene50PostingsFormat,其fieldsConsumer函数代码如下,
DefaultIndexingChain::flush->FreqProxTermsWriter::flush->FieldsWriter::write->Lucene50PostingsFormat::fieldsConsumer
public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws IOException {
PostingsWriterBase postingsWriter = new Lucene50PostingsWriter(state);
boolean success = false;
try {
FieldsConsumer ret = new BlockTreeTermsWriter(state,
postingsWriter,
minTermBlockSize,
maxTermBlockSize);
success = true;
return ret;
} finally {
}
}
fieldsConsumer函数最终返回BlockTreeTermsWriter,然后调用其write函数,定义如下,
DefaultIndexingChain::flush->FreqProxTermsWriter::flush->FieldsWriter::write->BlockTreeTermsWriter::write
public void write(Fields fields) throws IOException {
String lastField = null;
for(String field : fields) {
lastField = field;
Terms terms = fields.terms(field);
FieldInfo fieldInfo = fieldInfos.fieldInfo(field);
List<PrefixTerm> prefixTerms;
if (minItemsInAutoPrefix != 0) {
prefixTerms = new AutoPrefixTermsWriter(terms, minItemsInAutoPrefix, maxItemsInAutoPrefix).prefixes;
} else {
prefixTerms = null;
}
TermsEnum termsEnum = terms.iterator();
TermsWriter termsWriter = new TermsWriter(fieldInfos.fieldInfo(field));
int prefixTermUpto = 0;
while (true) {
BytesRef term = termsEnum.next();
if (prefixTerms != null) {
while (prefixTermUpto < prefixTerms.size() && (term == null || prefixTerms.get(prefixTermUpto).compareTo(term) <= 0)) {
PrefixTerm prefixTerm = prefixTerms.get(prefixTermUpto);
termsWriter.write(prefixTerm.term, getAutoPrefixTermsEnum(terms, prefixTerm), prefixTerm);
prefixTermUpto++;
}
}
if (term == null) {
break;
}
termsWriter.write(term, termsEnum, null);
}
termsWriter.finish();
}
}
BlockTreeTermsWriter的write函数中的termsWriter被定义为TermsWriter。
TermsWriter的write函数内部会通过Lucene50PostingsWriter将数据信息写入.doc,.pos、.pay三个文件中。
TermsWriter的finish函数会通过其内部的writeBlocks函数将索引信息写入.tim、.tip中。
将数据写入.fnm文件
继续往下看DefaultIndexingChain的flush函数,fieldInfosFormat返回Lucene60FieldInfosFormat,
DefaultIndexingChain::flush->Lucene60FieldInfosFormat::write
public void write(Directory directory, SegmentInfo segmentInfo, String segmentSuffix, FieldInfos infos, IOContext context) throws IOException {
final String fileName = IndexFileNames.segmentFileName(segmentInfo.name, segmentSuffix, EXTENSION);
try (IndexOutput output = directory.createOutput(fileName, context)) {
CodecUtil.writeIndexHeader(output, Lucene60FieldInfosFormat.CODEC_NAME, Lucene60FieldInfosFormat.FORMAT_CURRENT, segmentInfo.getId(), segmentSuffix);
output.writeVInt(infos.size());
for (FieldInfo fi : infos) {
fi.checkConsistency();
output.writeString(fi.name);
output.writeVInt(fi.number);
byte bits = 0x0;
if (fi.hasVectors()) bits |= STORE_TERMVECTOR;
if (fi.omitsNorms()) bits |= OMIT_NORMS;
if (fi.hasPayloads()) bits |= STORE_PAYLOADS;
output.writeByte(bits);
output.writeByte(indexOptionsByte(fi.getIndexOptions()));
output.writeByte(docValuesByte(fi.getDocValuesType()));
output.writeLong(fi.getDocValuesGen());
output.writeMapOfStrings(fi.attributes());
int pointDimensionCount = fi.getPointDimensionCount();
output.writeVInt(pointDimensionCount);
if (pointDimensionCount != 0) {
output.writeVInt(fi.getPointNumBytes());
}
}
CodecUtil.writeFooter(output);
}
}
Lucene60FieldInfosFormat的write函数会创建.fnm文件,并将Field域的相关信息写入该文件中。
合并索引文件
向上回到DocumentsWriterPerThread的flush中,接下来创建FlushedSegment,然后调用sealFlushedSegment合并索引文件夹中的文件,代码如下,
DocumentsWriter::doflush->DocumentsWriterPerThread::flush->sealFlushedSegment
void sealFlushedSegment(FlushedSegment flushedSegment) throws IOException {
SegmentCommitInfo newSegment = flushedSegment.segmentInfo;
IndexWriter.setDiagnostics(newSegment.info, IndexWriter.SOURCE_FLUSH);
IOContext context = new IOContext(new FlushInfo(newSegment.info.maxDoc(), newSegment.sizeInBytes()));
boolean success = false;
try {
if (indexWriterConfig.getUseCompoundFile()) {
Set<String> originalFiles = newSegment.info.files();
indexWriter.createCompoundFile(infoStream, new TrackingDirectoryWrapper(directory), newSegment.info, context);
filesToDelete.addAll(originalFiles);
newSegment.info.setUseCompoundFile(true);
}
codec.segmentInfoFormat().write(directory, newSegment.info, context);
if (flushedSegment.liveDocs != null) {
final int delCount = flushedSegment.delCount;
SegmentCommitInfo info = flushedSegment.segmentInfo;
Codec codec = info.info.getCodec();
codec.liveDocsFormat().writeLiveDocs(flushedSegment.liveDocs, directory, info, delCount, context);
newSegment.setDelCount(delCount);
newSegment.advanceDelGen();
}
success = true;
} finally {
}
}
sealFlushedSegment函数中的indexWriter被定义为IndexWriter,其createCompoundFile函数用来合并索引文件夹中的文件,代码定义如下,
DocumentsWriter::doflush->DocumentsWriterPerThread::flush->sealFlushedSegment->IndexWriter::createCompoundFile
final void createCompoundFile(InfoStream infoStream, TrackingDirectoryWrapper directory, final SegmentInfo info, IOContext context) throws IOException {
boolean success = false;
try {
info.getCodec().compoundFormat().write(directory, info, context);
success = true;
} finally {
}
info.setFiles(new HashSet<>(directory.getCreatedFiles()));
}
compoundFormat函数返回Lucene50CompoundFormat,其write函数定义如下,
DocumentsWriter::doflush->DocumentsWriterPerThread::flush->sealFlushedSegment->IndexWriter::createCompoundFile->Lucene50CompoundFormat::write
public void write(Directory dir, SegmentInfo si, IOContext context) throws IOException {
String dataFile = IndexFileNames.segmentFileName(si.name, "", DATA_EXTENSION);
String entriesFile = IndexFileNames.segmentFileName(si.name, "", ENTRIES_EXTENSION);
try (IndexOutput data = dir.createOutput(dataFile, context);
IndexOutput entries = dir.createOutput(entriesFile, context)) {
CodecUtil.writeIndexHeader(data, DATA_CODEC, VERSION_CURRENT, si.getId(), "");
CodecUtil.writeIndexHeader(entries, ENTRY_CODEC, VERSION_CURRENT, si.getId(), "");
entries.writeVInt(si.files().size());
for (String file : si.files()) {
long startOffset = data.getFilePointer();
try (IndexInput in = dir.openInput(file, IOContext.READONCE)) {
data.copyBytes(in, in.length());
}
long endOffset = data.getFilePointer();
long length = endOffset - startOffset;
entries.writeString(IndexFileNames.stripSegmentName(file));
entries.writeLong(startOffset);
entries.writeLong(length);
}
CodecUtil.writeFooter(data);
CodecUtil.writeFooter(entries);
}
}
Lucene50CompoundFormat的write函数会创建.cfs、.cfe文件,以及对应的输出流,其中,.cfs保存各个文件的数据,.cfe保存各个文件的位置信息。
再回到DocumentsWriterPerThread的sealFlushedSegment函数中,接下来通过segmentInfoFormat返回一个Lucene50SegmentInfoFormat,其代码如下,
DocumentsWriter::doflush->DocumentsWriterPerThread::flush->sealFlushedSegment->Lucene50SegmentInfoFormat::write
public void write(Directory dir, SegmentInfo si, IOContext ioContext) throws IOException {
final String fileName = IndexFileNames.segmentFileName(si.name, "", Lucene50SegmentInfoFormat.SI_EXTENSION);
try (IndexOutput output = dir.createOutput(fileName, ioContext)) {
si.addFile(fileName);
CodecUtil.writeIndexHeader(output,
Lucene50SegmentInfoFormat.CODEC_NAME,
Lucene50SegmentInfoFormat.VERSION_CURRENT,
si.getId(),
"");
Version version = si.getVersion();
output.writeInt(version.major);
output.writeInt(version.minor);
output.writeInt(version.bugfix);
assert version.prerelease == 0;
output.writeInt(si.maxDoc());
output.writeByte((byte) (si.getUseCompoundFile() ? SegmentInfo.YES : SegmentInfo.NO));
output.writeMapOfStrings(si.getDiagnostics());
Set<String> files = si.files();
for (String file : files) {
if (!IndexFileNames.parseSegmentName(file).equals(si.name)) {
throw new IllegalArgumentException();
}
}
output.writeSetOfStrings(files);
output.writeMapOfStrings(si.getAttributes());
CodecUtil.writeFooter(output);
}
}
Lucene50SegmentInfoFormat的write函数会创建.si文件以及对应的输出流,然后向该文件写入相应的段信息。
sealFlushedSegment函数的最后会调用writeLiveDocs,创建.liv文件并写入相应的索引信息,代码如下,
DocumentsWriter::doflush->DocumentsWriterPerThread::flush->sealFlushedSegment->Lucene50LiveDocsFormat::writeLiveDocs
public void writeLiveDocs(MutableBits bits, Directory dir, SegmentCommitInfo info, int newDelCount, IOContext context) throws IOException {
long gen = info.getNextDelGen();
String name = IndexFileNames.fileNameFromGeneration(info.info.name, EXTENSION, gen);
FixedBitSet fbs = (FixedBitSet) bits;
long data[] = fbs.getBits();
try (IndexOutput output = dir.createOutput(name, context)) {
CodecUtil.writeIndexHeader(output, CODEC_NAME, VERSION_CURRENT, info.info.getId(), Long.toString(gen, Character.MAX_RADIX));
for (int i = 0; i < data.length; i++) {
output.writeLong(data[i]);
}
CodecUtil.writeFooter(output);
}
}
回到DocumentsWriter的doflush函数,接下来就会根据flush函数的返回或者异常生成相应的事件,最后添加到事件队列中,例如DeleteNewFilesEvent、FlushFailedEvent、ForcedPurgeEvent、MergePendingEvent、ApplyDeletesEvent。这里就不往下再看了。