elastic search中在index_time时使用n-gram来进行搜索推荐

最新推荐文章于 2024-03-27 06:24:52 发布

hsj1213522415

最新推荐文章于 2024-03-27 06:24:52 发布

阅读量630

点赞数

分类专栏： elastic search

本文链接：https://blog.csdn.net/hsj1213522415/article/details/96909531

版权

elastic search 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

什么是n-gram？n元预发模型，对某个词按长度n进行分割

quick，5种长度下的ngram

ngram length=1，q u i c k
ngram length=2，qu ui ic ck
ngram length=3，qui uic ick
ngram length=4，quic uick
ngram length=5，quick

什么是edge ngram？边界n-gram，切分的结果必须包含边界元素；

quick，anchor首字母后进行ngram

q
qu
qui
quic
quick

使用edge ngram将每个单词都进行进一步的分词切分，用切分后的ngram来实现前缀搜索推荐功能

hello world
hello we

h
he
hel
hell
hello		doc1,doc2

w			doc1,doc2
wo
wor
worl
world

将hello world进行切词，将切割的词建立倒排索引，当进行检索hello w时，

hello --> hello，doc1
w --> w，doc1

整个检索过程不用再根据一个前缀，然后扫描整个倒排索引了; 而是简单的拿前缀去倒排索引中匹配即可，如果匹配上了，那么就好了，与match进行全文检索的效果保持一致；

下面实验一下n-gram

建立索引：
PUT /my_index
{
    "settings": {
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

尝试给指定文本，看看效果：
GET /my_index/_analyze
{
  "analyzer": "autocomplete",
  "text": "quick brown"
}

建立mapping：
PUT /my_index/_mapping/my_type
{
  "properties": {
      "title": {
          "type":     "string",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
      }
  }
}
后面可以填充数据看看效果。。。


GET /my_index/my_type/_search 
{
  "query": {
    "match_phrase": {
      "title": "hello w"
    }
  }
}