segments_* and segments.gen files in Lucene

From Book: LuceneInAction2ndEdition


-rw-rw-rw- 1 mike users 12327579 Feb 29 05:29 _2.fdt
-rw-rw-rw- 1 mike users 6400 Feb 29 05:29 _2.fdx
-rw-rw-rw- 1 mike users 33 Feb 29 05:29 _2.fnm
-rw-rw-rw- 1 mike users 1036074 Feb 29 05:29 _2.frq
-rw-rw-rw- 1 mike users 2404 Feb 29 05:29 _2.nrm
-rw-rw-rw- 1 mike users 2128366 Feb 29 05:29 _2.prx
-rw-rw-rw- 1 mike users 14055 Feb 29 05:29 _2.tii
-rw-rw-rw- 1 mike users 1034353 Feb 29 05:29 _2.tis
-rw-rw-rw- 1 mike users 5829 Feb 29 05:29 _2.tvd
-rw-rw-rw- 1 mike users 10227627 Feb 29 05:29 _2.tvf
-rw-rw-rw- 1 mike users 12804 Feb 29 05:29 _2.tvx
-rw-rw-rw- 1 mike users 17 Mar 30 03:34 random.txt
-rw-rw-rw- 1 mike users 20 Feb 29 05:29 segments.gen
-rw-rw-rw- 1 mike users 53 Feb 29 05:29 segments_3

The secret to this is the segments file (segments_3). As you may have guessed from its name, the
segments file stores the name and certain details of all existing index segments
. Every time an
IndexWriter commits a change to the index, the generation (the _3 in the above listing) of the
segments file is incremented. For example, a commit to this index would write segments_4 and remove
segments_3 as well as any now unreferenced files. Before accessing any files in the index directory,
Lucene consults this file to figure out which index files to open and read.
Our example index has a single
segment, _2, whose name is stored in this segments file, so Lucene knows to look only for files with the
_2 prefix. Lucene also limits itself to files with known extensions, such as .fdt, .fdx, and other extensions
shown in our example, so even saving a file with a segment prefix, such as _2.txt, won’t throw Lucene off.
Of course, polluting an index directory with non-Lucene files is strongly discouraged.

The exact number of files that constitute a Lucene index and each segment varies from index to index
and depends on the number of fields the index contains. However, every index contains a single segments
file and a single segments.gen file. The segments.gen file is always 20 bytes and contains the suffix
(generation) of the current segments as a redundant way for Lucene to determine the most recent



Note: if we lost segments.gen file, we are not able to tell the most recent last commit.

