对内容长短的限制:
主要目的是防止内部不足而产生的内存泄露问题。只要内存足够大,这个值可以设置成Integer.MAX_VALUE,能覆盖目前可能的文档大小。
参考内容:
Documents are truncated by default
The indexer by default truncates documents to IndexWriter.DEFAULT_MAX_FIELD_LENGTH or 10,000 terms in Lucene 2.0.
Rule of thumb: an average page of English text contains about 250 words. (Source: Google Answers.) This means only about 40 pages are indexed by default. If any of your documents are longer than this (and you want them indexed in full), you should raise the limit with IndexWriter.setMaxFieldLength().
public void setMaxFieldLength(int maxFieldLength)
-
The maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory. This setting refers to the number of running terms, not to the number of different terms.
Note: this silently truncates large documents, excluding from the index all terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError.
By default, no more than
DEFAULT_MAX_FIELD_LENGTH
terms will be indexed for a field.