- 什么是n-gram?n元预发模型,对某个词按长度n进行分割
quick,5种长度下的ngram
ngram length=1,q u i c k
ngram length=2,qu ui ic ck
ngram length=3,qui uic ick
ngram length=4,quic uick
ngram length=5,quick
- 什么是edge ngram?边界n-gram,切分的结果必须包含边界元素;
quick,anchor首字母后进行ngram
q
qu
qui
quic
quick
- 使用edge ngram将每个单词都进行进一步的分词切分,用切分后的ngram来实现前缀搜索推荐功能
hello world
hello we
h
he
hel
hell
hello doc1,doc2
w doc1,doc2
wo
wor
worl
world
将hello world进行切词,将切割的词建立倒排索引,当进行检索hello w时,
hello --> hello,doc1
w --> w,doc1
整个检索过程不用再根据一个前缀,然后扫描整个倒排索引了; 而是简单的拿前缀去倒排索引中匹配即可,如果匹配上了,那么就好了,与match进行全文检索的效果保持一致;
下面实验一下n-gram
建立索引:
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
尝试给指定文本,看看效果:
GET /my_index/_analyze
{
"analyzer": "autocomplete",
"text": "quick brown"
}
建立mapping:
PUT /my_index/_mapping/my_type
{
"properties": {
"title": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
后面可以填充数据看看效果。。。
GET /my_index/my_type/_search
{
"query": {
"match_phrase": {
"title": "hello w"
}
}
}