The pattern matching algorithms presented in the previous section speed up the search in a text by preprocessing the pattern (to compute the failure function in the KMP algorithm or the last function in the BM algorithm).
In this section, we take a complementary approach, namely, we present string searching algorithms that preprocess the text. This approach is suitable for applications where a series of queries is performed on a fixed text
, so that the initial cost of preprocessing the text is compensated by a speedup in each subsequent query (for example, a Web site that offers pattern matching in Shakespeare’s Hamlet or a search engine that offers Web pages on the Hamlet topic).
A trie
(pronounced “try”) is a tree-based
data structure for storing strings in order to support fast pattern matching
. The main application for tries is in information retrieval
. Indeed, the name “trie
” comes from the word “retrieval
.” In an information retrieval application, such as a search for a certain DNA sequence in a genomic database, we are given a collection S of strings, all defined using the same alphabet. The primary query operations that tries support are pattern matching and prefix matching
. The latter operation involves being given a string X, and looking for all the strings in S that contain X as a prefix.