在上一篇中主要讲述了term相关的查询,以及range查询和exists查询。这一篇将会把剩余的主要的是模糊查询的学习一下。
Prefix查询
Prefix查询即前缀查询,匹配文档,字段包含拥有特定前缀的索引词(不分词)。前缀查询对应Lucene中的PrefixQuery。
如下代码所示:
查询email前缀中包含hatt的。
GET bank/_search
{
"query": {
"prefix": {
"email": {
"value": "hatt"
}
}
},
"profile": "true"
}
返回结果:
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "MultiTermQueryConstantScoreWrapper",
"description" : "email:hatt*",
"time_in_nanos" : 2344350,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 1888,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 2649,
"advance_count" : 1,
"score" : 1217,
"build_scorer_count" : 3,
"create_weight" : 7285,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 2331311
}
},
{
"type" : "TermQuery",
"description" : "email:hattiebond",
"time_in_nanos" : 325372,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 495,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 1077,
"advance_count" : 1,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 41312,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 282488
}
},
{
"type" : "MatchNoDocsQuery",
"description" : """MatchNoDocsQuery("empty BooleanQuery")""",
"time_in_nanos" : 1590,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 1,
"create_weight" : 1121,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 469
}
}
],
"rewrite_time" : 32110,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 27544
}
]
}
],
"aggregations" : [ ]
}
]
}
}
Wildcard查询
Wildcard查询就是通配符查询,匹配文档,字段匹配通配符表达式(不分词),通配符*匹配任意字符(包含空字符)。通配符?匹配任何单个字符。注意这个查询会比较缓慢,需要在许多索引词上重复执行,为了避免极端缓慢的通配符查询,通配符索引词不应该以一个通配符开头,通配符查询对应lucene中的WildCardQuery:
GET bank/_search
{
"query": {
"wildcard": {
"email": {
"value": "hat*bond"
}
}
},
"profile": "true"
}
返回结果:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "MultiTermQueryConstantScoreWrapper",
"description" : "email:hat*bond",
"time_in_nanos" : 173122,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 1524,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 2187,
"advance_count" : 1,
"score" : 443,
"build_scorer_count" : 3,
"create_weight" : 8189,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 160779
}
},
{
"type" : "TermQuery",
"description" : "email:hattiebond",
"time_in_nanos" : 18279,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 492,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 984,
"advance_count" : 1,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 9017,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 7786
}
},
{
"type" : "MatchNoDocsQuery",
"description" : """MatchNoDocsQuery("empty BooleanQuery")""",
"time_in_nanos" : 2172,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 1,
"create_weight" : 1742,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 430
}
}
],
"rewrite_time" : 6978,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 8325
}
]
}
],
"aggregations" : [ ]
}
]
}
}
Regexp查询
Regexp查询即正则查询,切记不分词的字符匹配,不然会返回为空。
GET bank/_search
{
"query": {
"regexp": {
"email": "hattie.*"
}
},
"profile": "true"
}
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[pgvIy_S0QwiNETOTSEWFtw][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "MultiTermQueryConstantScoreWrapper",
"description" : "email:/hattie.*/",
"time_in_nanos" : 130438,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 588,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 1144,
"advance_count" : 1,
"score" : 487,
"build_scorer_count" : 3,
"create_weight" : 1909,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 126310
}
},
{
"type" : "TermQuery",
"description" : "email:hattiebond",
"time_in_nanos" : 16071,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 313,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 599,
"advance_count" : 1,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 8629,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 6530
}
},
{
"type" : "MatchNoDocsQuery",
"description" : """MatchNoDocsQuery("empty BooleanQuery")""",
"time_in_nanos" : 1635,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 1,
"create_weight" : 1307,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 328
}
}
],
"rewrite_time" : 6807,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 6617
}
]
}
],
"aggregations" : [ ]
}
]
}
}
Fuzzy查询
模糊查询对于字符串型字段使用基于编辑距离的相似性,以及数字型和日期型字段的正负范围进行匹配。
字符串型字段
模糊查询基于fuzziness指定的最大编辑距离生成所有可能匹配的索引词,然后检查索引字典来找出确实存在于索引中的索引词。
GET bank/_search
{
"query": {
"fuzzy": {
"email": {
"value": "hattiebond",
"prefix_length": 0,
"fuzziness": 0.5
}
}
}
}
其中prefix_length指的是不会被模糊化的最初的字符的数量,可以用来减少必须审查的索引词的数量,默认值为0.
fuzziness指的是最大编辑距离。即允许匹配的值与关键字之间最大的偏差。
返回结果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.505616,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 6.505616,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[UhzKWPIsSgi8QaaJLHVmFg][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "TermQuery",
"description" : "email:hattiebond",
"time_in_nanos" : 464272,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 1137,
"match" : 0,
"next_doc_count" : 1,
"score_count" : 1,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 3451,
"advance_count" : 1,
"score" : 5744,
"build_scorer_count" : 3,
"create_weight" : 99058,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 354882
}
}
],
"rewrite_time" : 291191,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 24215
}
]
}
],
"aggregations" : [ ]
}
]
}
}