Elasticsearch全文检索企业开发记录总结（三）：Mapping相关配置

最新推荐文章于 2023-09-26 12:58:29 发布

Send_youCherry

最新推荐文章于 2023-09-26 12:58:29 发布

阅读量503

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch 全文检索

本文链接：https://blog.csdn.net/ll_mor/article/details/78897879

版权

elasticsearch 专栏收录该内容

5 篇文章 1 订阅

订阅专栏

理解Mapping

什么是mapping

ES的mapping非常类似于静态语言中的数据类型：声明一个变量为int类型的变量，以后这个变量都只能存储int类型的数据。同样的，一个number类型的mapping字段只能存储number类型的数据。

同语言的数据类型相比，mapping还有一些其他的含义，mapping不仅告诉ES一个field中是什么类型的值，它还告诉ES如何索引数据以及数据是否能被搜索到。

当你的查询没有返回相应的数据，你的mapping很有可能有问题。当你拿不准的时候，直接检查你的mapping。
剖析mapping

一个mapping由一个或多个analyzer组成，一个analyzer又由一个或多个filter组成的。当ES索引文档的时候，它把字段中的内容传递给相应的analyzer，analyzer再传递给各自的filters。

filter的功能很容易理解：一个filter就是一个转换数据的方法，输入一个字符串，这个方法返回另一个字符串，比如一个将字符串转为小写的方法就是一个filter很好的例子。

一个analyzer由一组顺序排列的filter组成，执行分析的过程就是按顺序一个filter一个filter依次调用， ES存储和索引最后得到的结果。

总结来说， mapping的作用就是执行一系列的指令将输入的数据转成可搜索的索引项。

IK+pinyin 分词器安装与配置

ES作为最强大的全文检索工具，中英文分词几乎是必备功能，下面简单说明下分词器安装步骤（详细步骤网上很多，这里选择nextbang 作者为例）：

下载中文/拼音分词器
IK中文分词器：https://github.com/medcl/elasticsearch-analysis-ik
拼音分词器：https://github.com/medcl/elasticsearch-analysis-pinyin
安装
通过releases找到和es对应版本的zip文件，或者source文件（自己通过mvn package打包）；当然也可以下载最新master的代码。
进入elasticsearch安装目录/plugins；mkdir pinyin；cd pinyin；
cp 刚才打包的zip文件到pinyin目录；unzip解压
部署后，记得重启es节点
配置

settings配置

PUT  my_index/_settings 
"index" : {
        "number_of_shards" : "3",
        "number_of_replicas" : "1",
        "analysis" : {
          "analyzer" : {
            "default" : {
              "tokenizer" : "ik_max_word"
            },
            "pinyin_analyzer" : {
              "tokenizer" : "my_pinyin"
            }
          },
          "tokenizer" : {
            "my_pinyin" : {
              "keep_separate_first_letter" : "false",
              "lowercase" : "true",
              "type" : "pinyin",
              "limit_first_letter_length" : "16",
              "keep_original" : "true",
              "keep_full_pinyin" : "true"
            }
          }
        }
      }

mapping 配置

PUT my_index/index_type/_mapping
"ep" : {
        "_all" : {
          "analyzer" : "ik_max_word"
        },
        "properties" : {
            "name" : {
                "type" : "text",
                "analyzer" : "ik_max_word",
                "include_in_all" : true,
                "fields" : {
                    "pinyin" : {
                        "type" : "text",
                        "term_vector" : "with_positions_offsets",
                        "analyzer" : "pinyin_analyzer",
                        "boost" : 10.0
                      }
                 }
            }
      }
}

4、测试

通过_analyze测试下分词器是否能正常运行：

GET my_index/_analyze
{
    "text":["刘德华"],
    "ananlyzer":"pinyin_analyzer"
}
向index中put中文数据：

POST my_index/index_type -d'
{
"name":"刘德华"
}
'

Mapping映射设计

根据全文检索业务的需求进行数据表的映射设计，下面为本项目的设计原则：

根据业务展示页面每个数据内容涉及到的字段进行类型确认、是否需要进行聚合、是否加入索引以及是否需要进行分词等。
以每个展示单位为映射整体，进行主表与关联的数据表进行一对多映射，保证在聚合查询时，可以得到每个展示单位的数据的聚合结果。

酒店表数据Mapping设计：

{
  "hotel": {
    "mappings": {
      "data": {
        "properties": {
          "address": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "areaCode": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "areaName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "autotrophy": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "cityCode": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "cityName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "coordinate": { "type": "geo_point" },
          "createTime": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||epoch_millis" },
          "deleted": { "type": "boolean" },
          "dictGuestQualificationName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "dictTypeByPositionCode": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "dictTypeByPositionName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "dictTypeByServiceCode": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "dictTypeByServiceName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "featurePicPath": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "forumStatus": { "type": "long" },
          "geohash": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "grade": { "type": "double" },
          "hotelExtendPicPath1": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "hotelExtendPicPath2": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "hotelExtendPicPath3": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "id": { "type": "long" },
          "initGrade": { "type": "double" },
          "level": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "lobbyPicPath": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "phone": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "provinceCode": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "provinceName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "reason": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "roomPicPath": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "scale": { "type": "long" },
          "score": { "type": "double" },
          "serviceScope": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "soundphone": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "star": { "type": "long" },
          "statusCode": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "statusName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
          "streetName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin" } }, "analyzer": "ik_max_word" },
          "updateTime": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||epoch_millis" },
          "vipEnable": { "type": "long" } }
      }
    }
  }
}

Mapping解析

上面mapping涉及到的：

type：数据类型
fields ：可以对一个字段提供多种索引模式，例如，一个string 字段可以映射为text全文搜索的字段，也可以映射keyword为排序或聚合的字段
analyzer：指定分词器
ignore_above ：超过多少个字符的文本，将会被忽略，不被索引

…

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html#_multi_fields_2

Send_youCherry

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch全文检索企业开发记录总结（三）：Mapping相关配置

理解Mapping什么是mappingES的mapping非常类似于静态语言中的数据类型：声明一个变量为int类型的变量，以后这个变量都只能存储int类型的数据。同样的，一个number类型的mapping字段只能存储number类型的数据。同语言的数据类型相比，mapping还有一些其他的含义，mapping不仅告诉ES一个field中是什么类型的值，它还告诉ES如何索引数据以及数据是否能
复制链接

扫一扫

专栏目录