ElasticSearch 6.x 学习笔记：14.mapping参数

最新推荐文章于 2019-06-13 17:40:21 发布

人在囧途^o^

最新推荐文章于 2019-06-13 17:40:21 发布

阅读量208

点赞数

分类专栏： ElasticSearch 文章标签： ElasticSearch

ElasticSearch 专栏收录该内容

22 篇文章 1 订阅

订阅专栏

ElasticSearch 6.x 学习笔记：14.mapping参数

原文： https://blog.csdn.net/chengyuqiang/article/details/79059958

14.1 mapping 参数概述

官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-params.html
ElasticSearch 提供了丰富的映射参数对字段的映射进行参数设计，比如字段的分词器、字段权重、日期形式、检索模型等。

14.2 analyzer

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analyzer.html

指定分词器，对索引和查询都有效。如下，指定ik分词的配置
（1）定义索引

DELETE my_index
PUT my_index

（2）ik_smart分词

GET my_index/_analyze
{
  "analyzer": "ik_smart",
  "text":"安徽省长江流域"
}

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

（3）定义mapping

POST my_index/fulltext/_mapping
{
  "properties": {
      "content": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
      }
  }
}

（4）插入数据

POST my_index/fulltext/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST my_index/fulltext/2
{"content":"公安部：各地校车将享最高路权"}

POST my_index/fulltext/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST my_index/fulltext/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

（5）查询

POST /index/fulltext/_search
{
    "query" : { "match" : { "content" : "中国" }}
}

查询结果

{
  "took": 135,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6489038,
    "hits": [
      {
        "_index": "index",
        "_type": "fulltext",
        "_id": "4",
        "_score": 0.6489038,
        "_source": {
          "content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
        }
      },
      {
        "_index": "index",
        "_type": "fulltext",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "content": "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"
        }
      }
    ]
  }
}

14.3 normalizer

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/normalizer.html

normalizer用于解析前的标准化配置，比如把所有字符转化为小写等。

DELETE my_index

PUT my_index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

PUT my_index/type/1 
{"foo": "BÀR"}

PUT my_index/type/2
{"foo": "bar"}

PUT my_index/type/3
{"foo": "baz"}

POST my_index/_refresh

GET my_index/_search
{
  "query": {
    "match": {
      "foo": "BAR"
    }
  }
}

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "type",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "foo": "bar"
        }
      },
      {
        "_index": "my_index",
        "_type": "type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "foo": "BÀR"
        }
      }
    ]
  }
}

14.4 boost

官方推荐在查询时指定boost

我们可以通过指定一个boost值来控制每个子查询的相对权重，该值默认为1，一个大于1的boost会增加该查询字句的相对权重。

DELETE my_index

put my_index

PUT my_index/my_type/1
{
  "title":"quick brown fox"

}

POST _search
{
    "query": {
        "match" : {
            "title": {
                "query": "quick brown fox",
                "boost": 2
            }
        }
    }
}

查询结果

{
  "took": 48,
  "timed_out": false,
  "_shards": {
    "total": 45,
    "successful": 45,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.7260926,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1.7260926,
        "_source": {
          "title": "quick brown fox"
        }
      }
    ]
  }
}

boost参数被用来增加一个子句的相对权重（当boost大于1时），或者减小相对权重（当boost介于0到1时），但是增加或者减小不是线性的。换言之，boost设为2并不会让最终的score加倍。

相反，新的score会在适用了boost后被归一化（Normalized）。每种查询都有自己的归一化算法。但是能够说一个高的boost值会产生一个高的score。

14.5 coecer

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/coerce.html#coerce

coecer属性用于清除脏数据，coecer的默认值是true。整型数字5有可能会被写成字符串“5”或者浮点数5.0，coecer属相可以用来清除脏数据：

字符串会被强制转化为整数
浮点数会被强制转化为整数

（1）重建my_index

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer"
        },
        "number_two": {
          "type": "integer",
          "coerce": false
        }
      }
    }
  }
}

（2）写入一条测试文档

PUT my_index/my_type/1
{
  "number_one": "10" 
}

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

（3）写入另一条测试文档

PUT my_index/my_type/2
{
  "number_two": "10" 
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [number_two]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [number_two]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Integer value passed as String"
    }
  },
  "status": 400
}

14.6 copy-to

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/copy-to.html
copy-to属性用于配置自定义的_all字段。换言之，就是多个字段可以合并成一个超级字段。比如，firstname和lastname可以合并为fullname字段

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

{
  "took": 22,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first_name": "John",
          "last_name": "Smith"
        }
      }
    ]
  }
}

14.7 doc_value

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/doc-values.html
doc_value 是为了加快排序、聚合操作，在建立倒排索引的时候，额外增加一个列式存储映射，是一个空间换时间的做法。默认是开启的，对于确定不需要聚合或者排序的字段可以关闭。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "status_code": { 
          "type":       "keyword"
        },
        "session_id": { 
          "type":       "keyword",
          "doc_values": false
        }
      }
    }
  }
}

14.8 dynamic

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/dynamic.html
dynamic属性用于检测新发现的字段，有三个取值：

true：新发现的字段添加到映射中。（默认）
false：新检测的字段被忽略。必须显式的添加字段
strict：如果检测到新字段，就会引发异常并拒绝文档

（1）新建索引
取值为strict，非布尔值要加引号

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic": "strict", 
      "properties": {
        "title": { "type": "text"}
      }
    }
  }
}

（2）插入新文档

PUT my_index/my_type/1
{
  "title": "test",
  "content": "test dynamic"
}

抛出异常

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
  },
  "status": 400
}

14.9 enabled

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/enabled.html

ElasticSearch 默认会索引所有的字段，enabled 设为false的字段，es会跳过字段内容，该字段只能从source中获取，但是不可搜。而且字段可以是任意类型。

（1）新建索引，插入文档

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name":{"enabled": false}
      } 
    }
  }
}
PUT my_index/my_type/1
{
  "title": "test enabled",
  "name":"chengyuqiang"
}

（2）查看文档

GET my_index/my_type/1
{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "title": "test enabled",
    "name": "chengyuqiang"
  }
}

（3）搜索字段

GET my_index/_search
{
  "query": {
    "match": {
      "name": "chengyuqiang"
    }
  }
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

14.10 fielddata

14.11 format

在《12.5 date类型》一节已经介绍了日期格式化。
这里需要强调的是：epoch_millis表示毫秒数，epoch_second表示秒数。

14.12 ignore_above

ignore_above用于指定字段索引和存储的长度最大值，超过最大值的会被忽略

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "keyword",
          "ignore_above": 20 
        }
      }
    }
  }
}

PUT my_index/my_type/1 
{
  "message": "Syntax error"
}

PUT my_index/my_type/2 
{
  "message": "Syntax error with some long stacktrace"
}

GET my_index/_search 
{
  "size":0,
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}

mapping中指定了ignore_above字段的最大长度为20，第二个文档的字段长小于20，因此索引成功，第二个超过20，因此不索引，返回结果只有“Syntax error”，结果如下

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "messages": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Syntax error",
          "doc_count": 1
        }
      ]
    }
  }
}

14.13 ignore_malformed

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/ignore-malformed.html
ignore_malformed可以忽略不规则数据。对于账号userid字段，有人可能填写的是整数类型，也有人填写的是邮件格式。给一个字段索引不适合的数据类型发生异常，导致整个文档索引失败。如果ignore_malformed参数设为true，异常会被忽略，出异常的字段不会被索引，其它字段正常索引。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer",
          "ignore_malformed": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}

PUT my_index/my_type/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

上面的例子中munber_one接受integer类型，ignore_malformed属性设为true，因此文档一种number_one字段虽然是字符串但依然能写入成功；number_two接受integer类型，默认ignore_malformed属性为false，因此写入失败。

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [number_two]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [number_two]",
    "caused_by": {
      "type": "number_format_exception",
      "reason": "For input string: \"foo\""
    }
  },
  "status": 400
}

14.14 index_options

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/index-options.html
index_options参数控制将哪些信息添加到倒排索引，用于搜索和突出显示目的。

参数	说明
docs	Only the doc number is indexed. Can answer the question Does this term exist in this field?
freqs	Doc number and term frequencies are indexed. Term frequencies are used to score repeated terms higher than single terms.
positions	Doc number, term frequencies, and term positions (or order) are indexed. Positions can be used for proximity or phrase queries.
offsets	Doc number, term frequencies, positions, and start and end character offsets (which map the term back to the original string) are indexed. Offsets are used by the unified highlighter to speed up highlighting.

注意：The index_options parameter has been deprecated for Numeric fields in 6.0.0。6.0.0中的数字字段已弃用index_options参数。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "text",
          "index_options": "offsets"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text": "Quick brown fox"
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": "brown fox"
    }
  },
  "highlight": {
    "fields": {
      "text": {} 
    }
  }
}

{
  "took": 50,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "text": "Quick brown fox"
        },
        "highlight": {
          "text": [
            "Quick <em>brown</em> <em>fox</em>"
          ]
        }
      }
    ]
  }
}

14.15 index

index 属性指定字段是否索引，不索引也就不可搜索，取值可以为true或者false

14.16 fields

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/multi-fields.html

fields可以让同一文本有多种不同的索引方式，比如一个String类型字段的，可以使用text类型做全文搜索，使用keyword类型做聚合和排序。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "city": "New York"
}

PUT my_index/my_type/2
{
  "city": "York"
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

{
  "took": 31,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "city": "New York"
        },
        "sort": [
          "New York"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "city": "York"
        },
        "sort": [
          "York"
        ]
      }
    ]
  },
  "aggregations": {
    "Cities": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "New York",
          "doc_count": 1
        },
        {
          "key": "York",
          "doc_count": 1
        }
      ]
    }
  }
}