ES默认是动态创建索引和索引类型的mapping的,但是在学习的时候还能这样用,在生产中一定是手动制定mapping!在生产中经常会遇到这样的需求,想用某个字段进行统计,又想对该字段进行模糊查询,解决这种需求的方法就是对该字段创建别名!
mapping结构如下:
1 {2 "settings": {3 "index": {4 "analysis": {5 "filter": {6 "english_keywords": {7 "type" : "keyword_marker",8 "keywords": [9 "topsec"
10 ]11 },12 "english_stemmer": {13 "type" : "stemmer",14 "language" : "english"
15 },16 "english_possessive_stemmer": {17 "type" : "stemmer",18 "language" : "possessive_english"
19 },20 "english_stop": {21 "type" : "stop",22 "stopwords" : "_english_"
23 }24 },25 "analyzer": {29 "english": {30 "type" : "custom",31 "filter": [32 "lowercase",33 "english_stop"
34 ],35 "tokenizer" : "standard"
36 },37 "ik": {38 "filter" : ["lowercase"],39 "type" : "custom",40 "tokenizer" : "ik_max_word"
41 },42 "html": {43 "filter": [44 "lowercase",45 "english_stop"
46 ],47 "char_filter": [48 "html_strip"
49 ],50 "type" : "custom",51 "tokenizer" : "standard"
52 },53 "lower": {54 "filter" : "lowercase",55 "type" : "custom",56 "tokenizer" : "keyword"
57 }58 }59 },60 "number_of_shards" : "1",61 "number_of_replicas" : "0"
62 }63 },64 "mappings": {65 "test": {66 "_all": {67 "enabled" : false
68 },69 "properties": {70 "name": {71 "type" : "keyword"
72 },73 "age": {74 "type" : "keyword",75 "fields": {76 "cn": {77 "analyzer" : "ik",78 "type" : "text"
79 }80 }81 },82
83 "address": {84 "type" : "text"
85 }86 }87 }88 }89 }
字段age的"type" : "keyword",不分词,然后起个别名cn,对它使用ik分词器进行分词!插入四条数据
用age字段对数据进行统计的时候,需要用不分词的age,并且需要使用全匹配规则,语句:
1 {2 "query": {3 "bool": {4 "must": [5 {6 "term": {7 "age": "北京市海淀区西二旗中关村西门"
8 }9 }10 ],11 "must_not": [],12 "should": []13 }14 },15 "from": 0,16 "size": 10,17 "sort": [],18 "aggs": {}19 }
结果:
使用age的分词age.cn进行统计是有问题的,运行的结果说明对age的别名age.cn进行分词,查询条件必须匹配分词器对age的内容进行分词的结果进行匹配,
1 {2 "query": {3 "bool": {4 "must": [5 {6 "term": {7 "age.cn": "北京市海淀区西二旗中关村西门"
8 }9 }10 ],11 "must_not": [],12 "should": []13 }14 },15 "from": 0,16 "size": 10,17 "sort": [],18 "aggs": {}19 }
结果:
1 {2 "query": {3 "bool": {4 "must": [5 {6 "term": {7 "age.cn": "北京市"
8 }9 }10 ],11 "must_not": [],12 "should": []13 }14 },15 "from": 0,16 "size": 10,17 "sort": [],18 "aggs": {}19 }
结果:
如果使用match来统计的话也会有问题,会把不正确的数据也统计出来,使用 match进行统计会把查询条件与内容进行匹配,根据匹配度进行打分,分数高的说明匹配度高,会排在上面
1 {2 "query": {3 "bool": {4 "must": [5 {6 "match": {7 "age.cn": "北京市海淀区西二旗中关村"
8 }9 }10 ],11 "must_not": [],12 "should": []13 }14 },15 "from": 0,16 "size": 10,17 "sort": [],18 "aggs": {}19 }
结果:
下面就是按匹配度打分排名的结果
1 {2 "query": {3 "bool": {4 "must": [5 {6 "match": {7 "age.cn": "北京市昌平区"
8 }9 }10 ],11 "must_not": [],12 "should": []13 }14 },15 "from": 0,16 "size": 10,17 "sort": [],18 "aggs": {}19 }
结果:
总结:统计就用term,不分词,全匹配;模糊查询就用match,分词,不用全匹配!
若有不正之处,请谅解和批评指正,不胜感激!!!!!欢迎大家留言讨论!!!