上一篇文章大概讲了索引从indexwriter到defaultindexchain的过程,也分析了defaultindexchain的基本流程,
主要就是:
将dwpt接收的每个文档一条条处理---》对每一条文档再按Field依次处理---》对每个Field依据他是否分词,是否存储是否有docvalue再分别处理。
可见每个dwpt之间是并行的做事情,每个dwpt内是串行的做事情。
每个field的具体处理是需要写termsHashPerField的信息,而这个信息是被termsHash统一管理的,可以把termsHash理解为一个dwpt中共享的缓冲区,主要用于在内存中建立索引,并在需要fulsh的时候刷入磁盘,每一个dwpt被刷入磁盘后其实就是一个段,当然如果段的大小被设置的话,可能还需要进行段合并之类的。这是后话,我们将在以后专门分析flush,在本文中我们主要继续上一篇分析pf.invert(),代码如下
public void invert(IndexableField field, boolean first) throws IOException, AbortingException {
if (first) {
// First time we're seeing this field (indexed) in
// this document:
invertState.reset(); //第一次的话需要重置信息
}
IndexableFieldType fieldType = field.fieldType();
IndexOptions indexOptions = fieldType.indexOptions();
fieldInfo.setIndexOptions(indexOptions);
if (fieldType.omitNorms()) {
fieldInfo.setOmitsNorms();
}
final boolean analyzed = fieldType.tokenized() && docState.analyzer != null;
// only bother checking offsets if something will consume them.
// TODO: after we fix analyzers, also check if termVectorOffsets will be indexed.
final boolean checkOffsets = indexOptions == IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
/*
* To assist people in tracking down problems in analysis components, we wish to write the field name to the
* infostream when we fail. We expect some caller to eventually deal with the real exception, so we don't want any
* 'catch' clauses, but rather a finally that takes note of the problem.
*/
boolean succeededInProcessingField = false;
try (TokenStream stream = tokenStream = field.tokenStream(docState.analyzer, tokenStream)) { //获取field内容的token流
// reset the TokenStream to the first token
stream.reset();
invertState.setAttributeSource(stream);
termsHashPerField.start(field, first);
while (stream.incrementToken()) { //对field内容中分出的每一个term进行处理
// If we hit an exception in stream.next below
// (which is fairly common, e.g. if analyzer
// chokes on a given document), then it's
// non-aborting and (above) this one document
// will be marked as deleted, but still
// consume a docID
int posIncr = invertState.posIncrAttribute.getPositionIncrement();
invertState.position += posIncr;
if (invertState.position < invertState.lastPosition) {