Brief Introduction to Suffix Array
Last Modified: 2000-11-14 (Since: 2000-11-14)
Suffix array is a data structure designed for efficient searching of a large text. The data structure is simply an array containing all the pointers to the text suffixes sorted in lexicographical (alphabetical) order. Each suffix is a string starting at a certain poinsition in the text and ending at the end of the text. Searching a text can be performed by binary search using the suffix array.
Let's get started with the suffix array construction. Suppose that we have the sample text ``abracadabra'' and wish to construct the suffix array for the sample text.
First, we should assign index points to the sample text. Index points specify positions where search can be performed. In our example, index points are assigned character by character. Thus, we can search the sample text with the suffix array at any positions later.
Second, we should sort the index points according to thier corresponding suffixes. The correspondance between the index points and the suffixes looks like
Finally, The resulting index points become the suffix array for the sample text.
Search of the sample text can be performed by binary search using the created suffix array. The following figure shows the process of searching the sample text for `ra'. Numbered arrows shows the order of processing.
Programming Pearls 2nd edition
Chapter 15 is a good reference for learning suffix array.
- Baeza-Yates et al.:
Modern Information Retrieval
Chapter 8 explains suffix array as well as other data structures.
- Manber et al.:
Suffix Arrays - A New Method for On-Line String Searches
The original paper.
- Yamashita: Terminology "Suffix Array" (in Japanese), Journal of Japanese Society for Artificial Intelligence Vol. 15 No. 6 p. 1142