一:下面一段代码是创建索引的一个简单样例,其中红色标识部分将是我们要跟踪的。
public class IndexTest {
public static void main(String[] args)
{
try {
File fileDir =new File("F:\\document");
IndexWriterConfig config=new IndexWriterConfig(Version.LUCENE_43, new StandardAnalyzer(Version.LUCENE_43));
config.setInfoStream(System.out);
config.setOpenMode(OpenMode.CREATE);
IndexWriter writer=new IndexWriter(FSDirectory.open(new File("F:\\index")),config);
for(File file:fileDir.listFiles())
{
Document document=new Document();
document.add(new TextField("content", new FileReader(file)));
document.add(new StringField("title", file.getName(), Store.YES));
writer.addDocument(document);
}
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE 二:IndexWriterConfig config=new IndexWriterConfig(Version.LUCENE_43, new StandardAnalyzer(Version.LUCENE_43))通过 IndexWriterConfig类的构造函数来创建参数配置对象,我们进入到构造函数内部
public IndexWriterConfig(Version matchVersion, Analyzer analyzer) {
super(analyzer, matchVersion);
}
发现,它调用父类LiveIndexWriterConfig的构造函数。 我们继续跟踪LiveIndexWriterConfig(Analyzer analyzer, Version matchVersion) {
this.analyzer = analyzer;
this.matchVersion = matchVersion;
ramBufferSizeMB = IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB;
maxBufferedDocs = IndexWriterConfig.DEFAULT_MAX_BUFFERED_DOCS;
maxBufferedDeleteTerms = IndexWriterConfig.DEFAULT_MAX_BUFFERED_DELETE_TERMS;
readerTermsIndexDivisor = IndexWriterConfig.DEFAULT_READER_TERMS_INDEX_DIVISOR;
mergedSegmentWarmer = null;
termIndexInterval = IndexWriterConfig.DEFAULT_TERM_INDEX_INTERVAL; // TODO: this should be private to the codec, not settable here
delPolicy = new KeepOnlyLastCommitDeletionPolicy();
commit = null;
openMode = OpenMode.CREATE_OR_APPEND;
similarity = IndexSearcher.getDefaultSimilarity();
mergeScheduler = new ConcurrentMergeScheduler();
writeLockTimeout = IndexWriterConfig.WRITE_LOCK_TIMEOUT;
indexingChain = DocumentsWriterPerThread.defaultIndexingChain;
codec = Codec.getDefault();
if (codec == null) {
throw new NullPointerException();
}
infoStream = InfoStream.getDefault();
mergePolicy = new TieredMergePolicy();
flushPolicy = new FlushByRamOrCountsPolicy();
readerPooling = IndexWriterConfig.DEFAULT_READER_POOLING;
indexerThreadPool = new ThreadAffinityDocumentsWriterThreadPool(IndexWriterConfig.DEFAULT_MAX_THREAD_STATES);
perThreadHardLimitMB = IndexWriterConfig.DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB;
}
其中红色标注的部分是我们关心的。三:首先我们看 indexerThreadPool = new ThreadAffinityDocumentsWriterThreadPool(IndexWriterConfig.DEFAULT_MAX_THREAD_STATES)构建索引的线程池。 我们跟踪ThreadAffinityDocumentsWriterThreadPool的构造函数
public ThreadAffinityDocumentsWriterThreadPool(int maxNumPerThreads) {
super(maxNumPerThreads);
assert getMaxThreadStates() >= 1;
}
发现其调用父类DocumentsWriterPerThreadPool的构造函数 Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONEDocumentsWriterPerThreadPool(int maxNumThreadStates) {
if (maxNumThreadStates < 1) {
throw new IllegalArgumentException("maxNumThreadStates must be >= 1 but was: " + maxNumThreadStates);
}
threadStates = new ThreadState[maxNumThreadStates];
numThreadStatesActive = 0;
}
至此,我们发现会创建一个ThreadState数组,数组默认最大值为8. 通过对ThreadState的分析我们知道,ThreadState和一个DocumentsWriterPerThread关联,而DocumentsWriterPerThread中则包含着索引链的关键部分。 Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE三:接下来我们来分析ThreadState数组中的每个对象,是怎么跟DocumentsWriterPerThread关联起来的。我们回到索引样例中的
IndexWriter writer=new IndexWriter(FSDirectory.open(new File("F:\\index")),config);
继续跟踪IndexWriter的构造函数,我们会发现有一处代码
docWriter = new DocumentsWriter(codec, config, directory, this, globalFieldNumberMap, bufferedDeletesStream);创建 DocumentsWriter对象
四:我们继续跟踪 DocumentsWriter的构造函数
DocumentsWriter(Codec codec, LiveIndexWriterConfig config, Directory directory, IndexWriter writer, FieldNumbers globalFieldNumbers,
BufferedDeletesStream bufferedDeletesStream) {
this.codec = codec;
this.directory = directory;
this.indexWriter = writer;
this.infoStream = config.getInfoStream();
this.similarity = config.getSimilarity();
this.perThreadPool = config.getIndexerThreadPool();
this.chain = config.getIndexingChain();
this.perThreadPool.initialize(this, globalFieldNumbers, config);
flushPolicy = config.getFlushPolicy();
assert flushPolicy != null;
flushPolicy.init(this);
flushControl = new DocumentsWriterFlushControl(this, config);
}
其中标注红色的部分是表示对索引线程池进行初始化操作,我们来看看初始化时做了哪些工作void initialize(DocumentsWriter documentsWriter, FieldNumbers globalFieldMap, LiveIndexWriterConfig config) {
this.documentsWriter.set(documentsWriter); // thread pool is bound to DW
this.globalFieldMap.set(globalFieldMap);
for (int i = 0; i < threadStates.length; i++) {
final FieldInfos.Builder infos = new FieldInfos.Builder(globalFieldMap);
threadStates[i] = new ThreadState(new DocumentsWriterPerThread(documentsWriter.directory, documentsWriter, infos, documentsWriter.chain));
}
}
可以看到,针对线程池中的threadStates数组中的每个对象进行初始化,绑定一个 Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE DocumentsWriterPerThread 线程实例。五:我们来看看 DocumentsWriterPerThread的构造函数
public DocumentsWriterPerThread(Directory directory, DocumentsWriter parent,
FieldInfos.Builder fieldInfos, IndexingChain indexingChain) {
this.directoryOrig = directory;
this.directory = new TrackingDirectoryWrapper(directory);
this.parent = parent;
this.fieldInfos = fieldInfos;
this.writer = parent.indexWriter;
this.infoStream = parent.infoStream;
this.codec = parent.codec;
this.docState = new DocState(this, infoStream);
this.docState.similarity = parent.indexWriter.getConfig().getSimilarity();
bytesUsed = Counter.newCounter();
byteBlockAllocator = new DirectTrackingAllocator(bytesUsed);
pendingDeletes = new BufferedDeletes();
intBlockAllocator = new IntBlockAllocator(bytesUsed);
initialize();
consumer = indexingChain.getChain(this);
}
Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE在代码的最后一句,是为每个线程提供一个索引链。六:最后然我们来看看索引链中的内容
DocConsumer getChain(DocumentsWriterPerThread documentsWriterPerThread) {
final TermsHashConsumer termVectorsWriter = new TermVectorsConsumer(documentsWriterPerThread);
final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();
final InvertedDocConsumer termsHash = new TermsHash(documentsWriterPerThread, freqProxWriter, true,
new TermsHash(documentsWriterPerThread, termVectorsWriter, false, null));
final NormsConsumer normsWriter = new NormsConsumer();
final DocInverter docInverter = new DocInverter(documentsWriterPerThread.docState, termsHash, normsWriter);
final StoredFieldsConsumer storedFields = new TwoStoredFieldsConsumers(
new StoredFieldsProcessor(documentsWriterPerThread),
new DocValuesProcessor(documentsWriterPerThread.bytesUsed));
return new DocFieldProcessor(documentsWriterPerThread, docInverter, storedFields);
}
};
Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE索引链的调用过程,请参见下图 Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE
七:至此,每个IndexWriter创建时,会分配一个默认大小为8的线程池,线程池中存放着DocumentsWriterPerThread线程,每个线程中有一个默认的索引链IndexingChain与之相关联。
Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/28624388/viewspace-767366/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/28624388/viewspace-767366/