目录
Mapping
-
Mapping类似数据库中的schema的定义,作用如下
-
定义索引中的字段的名称
-
定义字段的数据类型,例如字符串,数字, 布尔......
-
字段(ES会对每一个字段做倒排索引建立),可以对字段做倒排索引的相关配置(Analyzed or Not Analyzed, Analyzer)
-
-
Mapping会把JSON文档映射成Lucene所需要的扁平格式
-
一个Mapping属于一个索引的Type
-
每个文档属于一个Type
-
一个Type有一个Mapping定义
-
7.0开始,不需要在Mapping中定义指定Type信息
-
字段的数据类型
简单类型
-
Text / Keyword
-
Date
-
integer /Floating
-
Boolean
-
IPv4 && IPv6
复杂类型
-
对象类型 /嵌套类型
特殊类型
-
geo_point & geo_shape / percolator
Dynamic Mapping
-
在写入文档时,如果索引不存在,会自动创建索引
-
Dynamic Mapping的机制,使得我们无需手动定义Mappings.Elasticsearch会自动根据文档信息,推算出字段的类型
-
但是有时候会推算的不对,例如将地理位置字段推断成text
-
当类型不正确时,会导致一些功能无法正常运行;例如Range查询
类型的自动识别
JSON类型 | Elasticsearch类型 |
---|---|
字符串 |
|
布尔值 | Boolean |
浮点数 | float |
整数 | long |
对象 | Object |
数组 | 由第一个非空值的类型所决定 |
空值 | 忽略 |
- 测试
//第一次测试
PUT /mapping_test/_doc/1
{
"firstName":"Chan",
"lastName":"JackIE",
"loginDate":"2018-07-12"
}//带了双引号的字符串,符合日期的话type为data,其余type为text
GET /mapping_test/_mapping
{
"mapping_test" : {
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"loginDate" : {
"type" : "date"
}
}
}
}
}//第二次测试
PUT /mapping_test/_doc/1
{
"uid":"123",
"isVip":false,
"isAdmin":"true",
"age":19,
"heigh":180
}//isVip的type为boolean;uid为text;isAdmin为text,age和heigh为long
GET /mapping_test/_mapping
{
"mapping_test" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"heigh" : {
"type" : "long"
},
"isAdmin" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"isVip" : {
"type" : "boolean"
},
"uid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
能否更改Mapping的字段类型
-
两种情况
-
新增加字段
-
Dynamic设为true时,一旦有新增字段的文档写入,Mapping也会被更新
-
Dynamic设为false,Mapping不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中
-
Dynamic设为strict,文档写入失败
-
-
对已有字段,一旦已经有数据写入,就不再支持修改字段定义
-
Lucene实现的倒排索引,一旦生成后,就不允许修改
-
-
如果希望改变字段类型,必须ReindexAPI,重建索引
-
-
原因
-
如果修改了字段的数据类型,会导致已被索引的属于无法被搜索
-
但是如果是增加新的字段,就不会有这样的影响
-
控制Dynamic Mappings
true | false | strict | |
文档可索引 | YES | YES | NO |
字段可索引 | YES | NO | NO |
Mapping被更新 | YES | NO | NO |
-
当dynammic被设置成false,存在新增字段的数据写入,该数据可以被索引,但是新增字段被丢弃
-
但设置成Strict模式的时候数据写入直接出错
-
测试
PUT /dynamic_mapping_test/_doc/1
{
"newField":"someValue"
}//搜索该字段,可以搜索到
POST /dynamic_mapping_test/_search
{
"query": {
"match": {
"newField": "someValue"
}
}
}//修改为dynamic false
PUT /dynamic_mapping_test/_mapping
{
"dynamic":"false"
}//新增其他字段,可以添加成功
PUT /dynamic_mapping_test/_doc/1
{
"otherField":"someValue"
}//利用otherField字段搜索,找不到对应的数据
POST /dynamic_mapping_test/_search
{
"query": {
"match": {
"otherField": "someValue"
}
}
}
//该索引的mapping文件没有otherField字段
GET /dynamic_mapping_test/_mapping
{
"dynamic_mapping_test" : {
"mappings" : {
"dynamic" : "false",
"properties" : {
"newField" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}//在这里要提示一点:PUT /dynamic_mapping_test/_doc/1 会进行全量替换如果需要增加字段的话需要将之前的字段也带上.否则,就只有新增字段
//修改为dynamic strict
PUT /dynamic_mapping_test/_mapping
{
"dynamic":"strict"
}//新增加字段,会报错:strict_dynamic_mapping_exception
PUT /dynamic_mapping_test/_doc/1
{
"otherField":"someValue"
}
显示Mapping设置与常见参数介绍
如何显示定义一个Mapping
PUT /index
{
"mappings":{
//define your mappings here
}
}
自定义Mapping的一些建议
-
可以参考API手册,纯手写
-
为了减少输入的工作量,减少出错概率,可以依照以下步骤
-
创建一个临时的index,写入一些样本数据
-
通过访问Mapping API 获得该临时文件的动态Mapping定义
-
修改后用,,使用该配置创建你的索引
-
删除临时索引
-
控制当前字段是否被索引
- index ----控制当前字段是否被索引.默认为true.如果设置成false,该字段不可被搜索
-
保护数据
-
节省磁盘的存储空间(倒排索引不会被创建)
-
//创建user索引,并且显示的Mapping,将字段mobile的index设为false
PUT /user
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "text",
"index": false
}
}
}
}//添加数据
PUT /user/_doc/1
{
"firstName":"111",
"lastName":"222",
"mobile":"2222"
}//根据mobile字段搜索
POST /user/_search
{
"query": {
"match": {
"mobile": "2222"
}
}
}搜索结果: "failed to create query: Cannot search on field [mobile] since it is not indexed."
-
Index Options
-
四种不同级别的index options配置,可以控制倒排索引记录的内容
-
docs --记录doc id
-
freqs --记录doc id 和term frequencies
-
positions --记录doc id / term frequencies /term position
-
offsets --doc id / term frequencies /term position /character offsets
-
-
Text类型默认记录positions ,其他默认docs
-
记录内容越多,占用存储空间越大
-
PUT /user
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text",
"index_options": "positions"
},
"mobile":{
"type": "text",
"index": false
}
}
}
}
- null_value
-
需要对Null值实现搜索
-
只有keyword类型支持设定Null_Value
-
PUT /person
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "text",
"null_value": "Null"
}
}
}
}报错信息:"Mapping definition for [mobile] has unsupported parameters: [null_value : Null]"
PUT /person
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "keyword",
"null_value": "NULL"
}
}
}
}//增加响应的数据
PUT /person/_doc/1
{
"firstName":"nie",
"lastName":"peng",
"mobile":null
}//查询mobile,可以得到对应的数据
POST /person/_search
{
"query":{
"match": {
"mobile": "NULL"
}
}
}
- 数组类型
- ES中不提供专门的数组类型.但是任何字段,都可以包含在多个相同类类型的数组
PUT /person/_doc/2
{
"firstName":"nie",
"lastName":"peng",
"mobile":["111","222","333"]
}//类型并没有发生变化
GET /person/_mapping
{
"person" : {
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text"
},
"lastName" : {
"type" : "text"
},
"mobile" : {
"type" : "keyword",
"null_value" : "NULL"
}
}
}
}
}
-
copy_to设置
-
_all在7中被copy_to所替代
-
满足一些特性的搜索需求
-
copy_to将字段的数值拷贝到目标字段,实现类似_all的作用
-
copy_to的目标字段不出现在_source中
-
PUT users
{
"mappings": {
"properties": {
"firstName":{
"type": "text",
"copy_to": "fullName"
},
"lastName":{
"type": "text",
"copy_to": "fullName"
}
}
}
}
PUT users/_doc/1
{
"firstName":"Ruan",
"lastName":"Yiming"
}GET /users/_search?q=fullName:(Ruan Yiming)
POST users/_search
{
"query": {
"match": {
"fullName": {
"query": "Ruan Yiming",
"operator": "and"
}
}
}
}
多字段特性
手动设置mapping的时候,可以为一个字段增加一个子字段,或者指定分词器
Exact Values vs Full Text
Exact Value:包括数字/日期/具体的一个字符串(Apple Store)
-
ElasticSearch中的keyword
-
Exact Values不需要被分词,为每一个字段创建一个倒排索引
-
Exact Value在索引时,不需要做特殊的分词处理
-
-
全文本,非结构化的文本数据
-
ElasticSearch中的text
-
自定义分词
-
当ElasticSearch自带的分词器无法满足时,可以自定义分词器,通过自组合不同的组件实现
-
Character Filter
-
在Tokenizer之前对文本进行处理,例如增加删除及替换字符,可以配置多个Character.会影响Tokenizer的position和offset信息
-
一些自带的Character Filters
-
HTML strip -去除html标签
-
Mapping -字符串替换
-
Pattern replace --正则匹配替换
-
-
-
Tokenizer
-
将原始的文本按照一定的规则,切分为词(term or token)
-
ElasticSearch内置的Tokenizers
-
whiterspace(根据空格进行切分)/ standard/ uax_urlemail/ pattern/ keyword(不会做分词处理)/ path hierarchy(按照文件的路径进行切分)
-
-
可以用java开发插件,实现自己的Tokenizer
-
-
Token Filter
-
将Tokenizer输出的单词(term),进行增加,修改,删除
-
自带的Token Filters
-
Lowercase / stop(去掉一些辅助词) /synonym(添加近义词)
-
-
-
Character Filter定义为HTML strip,Tokenizer定义为keyword
POST _analyze
{
"tokenizer": "keyword",
"char_filter": ["html_strip"],
"text":"<b>hello word</b>"
}//结果
{
"tokens" : [
{
"token" : "hello word",
"start_offset" : 3,
"end_offset" : 17,
"type" : "word",
"position" : 0
}
]
}//Character Filter定义为mapping,Tokenizer定义为standard
POST _analyze
{
"tokenizer": "standard",
"char_filter": [{
"type":"mapping",
"mappings": ["- => _"]
}],
"text":"123-456,I-test? test-990 650-555-1234"
}//结果
{
"tokens" : [
{
"token" : "123_456",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<NUM>",
"position" : 0
},
{
"token" : "I_test",
"start_offset" : 8,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "test_990",
"start_offset" : 16,
"end_offset" : 24,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "650_555_1234",
"start_offset" : 25,
"end_offset" : 37,
"type" : "<NUM>",
"position" : 3
}
]
}Character Filter定义为mapping,Tokenizer定义为standard
POST _analyze
{
"tokenizer": "standard",
"char_filter": [{
"type":"mapping",
"mappings": [":) => happy",":( => sad"]
}],
"text":["I an felling :)","Feeling :( today "]
}//结果
{
"tokens" : [
{
"token" : "I",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "an",
"start_offset" : 2,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "felling",
"start_offset" : 5,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "happy",
"start_offset" : 13,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "Feeling",
"start_offset" : 16,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 104
},
{
"token" : "sad",
"start_offset" : 24,
"end_offset" : 26,
"type" : "<ALPHANUM>",
"position" : 105
},
{
"token" : "today",
"start_offset" : 27,
"end_offset" : 32,
"type" : "<ALPHANUM>",
"position" : 106
}
]
}
//正则表达式
GET _analyze
{
"tokenizer": "standard",
"char_filter": [{
"type":"pattern_replace",
"pattern": "http://(.*)",
"replacement":"$1"
}],
"text":"http://www.elastic.co"
}//结果
{
"tokens" : [
{
"token" : "www.elastic.co",
"start_offset" : 0,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}//按照路径切分
GET _analyze
{
"tokenizer": "path_hierarchy",
"text":"/user/a/b/c/d"
}//结果
{
"tokens" : [
{
"token" : "/user",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a/b",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a/b/c",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 0
},
{
"token" : "/user/a/b/c/d",
"start_offset" : 0,
"end_offset" : 13,
"type" : "word",
"position" : 0
}
]
}
//whiteSpace 与stop
GET _analyze
{
"tokenizer": "whiteSpace",
"filter": ["stop"],
"text":"The rain in Spain falls mainly on the plain."
}//结果
{
"tokens" : [
{
"token" : "The",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "rain",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "Spain",
"start_offset" : 12,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "falls",
"start_offset" : 18,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "mainly",
"start_offset" : 24,
"end_offset" : 30,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "plain",
"start_offset" : 38,
"end_offset" : 43,
"type" : "<ALPHANUM>",
"position" : 8
}
]
}
//whiteSpace 与stop:加上lowercase后,The被当成stopword删除;filter里面的顺序也很重要,先是停用还是小写或者先是小写还是停用
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["lowercase","stop"],
"text":"The rain in Spain falls mainly on the plain."
}{
"tokens" : [
{
"token" : "rain",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 1
},
{
"token" : "spain",
"start_offset" : 12,
"end_offset" : 17,
"type" : "word",
"position" : 3
},
{
"token" : "falls",
"start_offset" : 18,
"end_offset" : 23,
"type" : "word",
"position" : 4
},
{
"token" : "mainly",
"start_offset" : 24,
"end_offset" : 30,
"type" : "word",
"position" : 5
},
{
"token" : "plain.",
"start_offset" : 38,
"end_offset" : 44,
"type" : "word",
"position" : 8
}
]
}
- 为某一个索引自定义一个分词器
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
//自定义分词器的名字
"my_custom_analyzer":{
//指定类型为自定义
"type":"custom",
//自定义的char_filter名字为emoticons
"char_filter":["emoticons"],
//自定义的tokenizer名字为punctuuation
"tokenizer":"punctuuation",
//自定义的filter名字为english_stop
"filter":["lowercase","english_stop"]
}
},
//在这里创建我们的punctuuation
"tokenizer": {
"punctuuation":{
"type":"pattern",
"pattern":"[ .,!?]"
}
},
//在这里创建我们的enoticons
"char_filter": {
"emoticons":{
"type":"mapping",
"mappings":[
":) => happy",":( => sad"
]
}
},
//在这里创建我们的english_stop
"filter": {
"english_stop":{
"type":"stop",
"stopwords":"_english_"
}
}
}
}
}POST my_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text":"I'm a :) person ,and you??"
}{
"tokens" : [
{
"token" : "i'm",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "happy",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
},
{
"token" : "person",
"start_offset" : 9,
"end_offset" : 15,
"type" : "word",
"position" : 3
},
{
"token" : "you",
"start_offset" : 21,
"end_offset" : 24,
"type" : "word",
"position" : 5
}
]
}
Index Template
-
Index Template -帮助你设定 Mappings和Settings,并按照一定的规则,自动匹配到新创建的索引之上
-
模板仅仅在一个索引被新创建时,才会产生作用.修改模板不会影响已创建的索引
-
你可以设定多个索引模板,这些设置都会被"merge"在一起
-
你可以指定"order"的数值,控制"merging"的过程
-
Index Template的工作方式
-
当一个索引被新创建时
-
引用ES默认的settings和mappings
-
应用order数值低的IndexTemplate中的设定
-
应用order数值高的IndexTemplate中的设定,之前的设定会被覆盖
-
引用创建覆盖时,用户所指定的Setings和Mappings,并覆盖之前模板中的设定
-
//自定义一个索引模板template_defalut
PUT _template/template_defalut
{
//所有的索引在创建的时候都会把按照下面的定义走:将主分片和副本分片都是1"index_patterns": ["*"],
"order": 0,
"version": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}//自定义一个索引模板template_test
PUT _template/template_test
{
//test开头的索引在创建的时候都会把按照下面的定义走:将主分片设为1,副本分片设为2,符合日期格式的字符串不自动转换,字符串是一个数字的映射成long
"index_patterns": ["test*"],
"order": 1,
"version": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 2
},
"mappings": {
"date_detection":false,
"numeric_detection": true
}
}
//查看template信息
GET /_template/template_defalut
//通过通配符查看所有的template
GET /_template/temp*//写入新数据
PUT testtemplate/_doc/1
{
"someNumber":"1",
"someDate":"2019/01/01"
}//查看mapping
GET testtemplate/_mapping//按照Index Template的工作方式
{
"testtemplate" : {
"mappings" : {
"date_detection" : false,
"numeric_detection" : true,
"properties" : {
"someDate" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"someNumber" : {
"type" : "long"
}
}
}
}
}
//按照Index Template的工作方式
//查看settings
GET testtemplate/_settings{
"testtemplate" : {
"settings" : {
"index" : {
"creation_date" : "1648890629178",
"number_of_shards" : "1",
"number_of_replicas" : "2",
"uuid" : "qAzSpbNkSsuTwgFdbGC5NQ",
"version" : {
"created" : "7080099"
},
"provided_name" : "testtemplate"
}
}
}
}
//创建索引的时候执行settings
PUT testmy
{
"settings":{
"number_of_replicas": 5
}
}//增加对应的数据
PUT testmy/_doc/1
{
"key":"value"
}//查看该索引下的settings
GET testmy/_settings
{
"testmy" : {
"settings" : {
"index" : {
"creation_date" : "1648890784839",
"number_of_shards" : "1",
"number_of_replicas" : "5",
"uuid" : "Jn0moCWYRZWBKwWC_brT2A",
"version" : {
"created" : "7080099"
},
"provided_name" : "testmy"
}
}
}
}
//查看该索引下的mapping
GET testmy/_mapping
{
"testmy" : {
"mappings" : {
"date_detection" : false,
"numeric_detection" : true,
"properties" : {
"key" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
Dynamic Template
前面的index template是应用在所有的index上面的;dynamic Template是应用在一个具体的索引上面的
-
根据ElasticSearch识别的数据类型,结合字段名称,来动态的设定字段类型
-
所有的字符串类型都设定成KeyWord,或者关闭keyword字段
-
is开头的字段设置成Boolean
-
long_开头的都设成long类型
-
- Dynamic Template是定义在某个索引的mapping中
- Template有一个名称
- 匹配规则是一个数组
- 为匹配的到的字段设置Mapping
PUT my_index
{
"mappings": {//将name下面除了midde的字段外,其他字段都设为text,值拷贝到full_name中
"dynamic_templates":[
{
"full_name":{
"path_match":"name.*",
"path_unmatch":"*.middle",
"mapping":{
"type":"text",
"copy_to": "full_name"
}
}
}
]
}
}//增加数据
PUT my_index/_doc/1
{
"name":{
"first":"John",
"middle":"Winston",
"last":"Lennon"
}
}//通过full_name检索
GET my_index/_search?q=full_name:John
PUT my_index
{
"mappings": {
"dynamic_templates":[
{
"string_as_boolean":{
//possible values are [object, string, long, double, boolean, date, binary]
"match_mapping_type":"string",
"match":"is*",
"mapping":{
"type":"boolean"
}
}
},
{
"string_as_keywords":{
"match_mapping_type":"string",
"mapping":{
"type":"keyword"
}
}
}
]
}
}PUT my_index/_doc/1
{
"firstName":"Ruan",
"isVIP":"true"
}GET my_index/_mapping
{
"my_index" : {
"mappings" : {
"dynamic_templates" : [
{
"string_as_boolean" : {
"match" : "is*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "boolean"
}
}
},
{
"string_as_keywords" : {
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
}
],
"properties" : {
"firstName" : {
"type" : "keyword"
},
"isVIP" : {
"type" : "boolean"
}
}
}
}
}