文章目录
Fuzzy Name Matching in Postgres
https://blog.crunchydata.com/blog/fuzzy-name-matching-in-postgresql
- Remember text_pattern_ops for indexed prefix filtering.
- Use lower() and functional indexes for case-insensitive queries.
- Combine indexes on soundex() with expensive tests like levenshtein() to get fast searches for fuzzy queries.
Fuzzy Name Matching in Postgres
https://blog.crunchydata.com/blog/fuzzy-name-matching-in-postgresql
- Prefix Matching
SELECT * FROM names WHERE surname LIKE 'Back%';
CREATE INDEX names_surname_txt ON names (surname);
-- The problem is that the default operator class for the "text" data type does not support pattern matching.
-- If you are going to do prefix matching, you need to create an index using the "text_pattern_ops" operator class.
CREATE INDEX names_surname_txt ON names (surname text_pattern_ops);
- Case Insensitivity
SELECT * FROM names WHERE lower(surname) LIKE lower('Back%');
CREATE INDEX names_surname_txt ON names (lower(surname) text_pattern_ops);
- Levenshtein Matching
- levenshtein(‘mister’,‘mister’) == 0
- levenshtein(‘mister’,‘master’) == 1
- levenshtein(‘mister’,‘maser’) == 2
- levenshtein(‘mister’,‘laser’) == 3
Soundex Indexing
- soundex(‘Harrington’) = H652
- soundex(‘Harington’) = H652
- soundex(‘Herington’) = H652
- soundex(‘Heringtan’) = H652
Soundex + Levenshtein
CREATE INDEX surname_soundex ON names (soundex(surname));
WITH q AS (
SELECT 'Robert' AS qgn, 'Harington' AS qsn
)
SELECT
levenshtein(lower(concat(surname,givenname)),lower(concat(qsn, qgn))) AS leven,
names.*
FROM names, q
WHERE soundex(surname) = soundex(qsn)
AND levenshtein(lower(concat(surname,givenname)),lower(concat(qsn, qgn))) < 2
Conclusion
- Remember text_pattern_ops for indexed prefix filtering.
- Use lower() and functional indexes for case-insensitive queries.
- Combine indexes on soundex() with expensive tests like levenshtein() to get fast searches for fuzzy queries.