simple mapping from words to concepts.
Unfortunately, this problem is difficult because English has different words that mean the same thing
(synonyms), words with multiple meanings, and all sorts of ambiguities that obscure the concepts to the
point where even people can have a hard time understanding.
For example, the word bank when used together with mortgage, loans, and rates probably means a
financial institution. However, the word bank when used together with lures, casting, and fish probably
means a stream or river bank.
How Latent Semantic Analysis Works
Latent Semantic Analysis arose from the problem of how to find relevant documents from search words.
The fundamental difficulty arises when we compare
words
to find relevant documents, because what we
really want to do is compare the
meanings or concepts behind the words
. LSA attempts to solve this
problem by mapping both words and documents into a "concept" space and doing the comparison in this
space.
Since authors have a wide choice of words available when they write, the concepts can be obscured due
to different word choices from different authors. This essentially random choice of words introduces noise
into the word-concept relationship. Latent Semantic Analysis filters out some of this noise and also
attempts to find the smallest set of concepts that spans all the documents.
In order to make this difficult problem solvable, LSA introduces some dramatic simplifications.