ElasticSearch6.x 之映射参数

最新推荐文章于 2023-11-02 10:35:41 发布

在奋斗的大道

最新推荐文章于 2023-11-02 10:35:41 发布

阅读量598

点赞数

分类专栏： elasticsearch 学习笔记

elasticsearch 学习笔记专栏收录该内容

53 篇文章 1 订阅

订阅专栏

本文转载至https://blog.csdn.net/chengyuqiang/article/details/79059958

映射参数概述：

ElasticSearch提供了丰富的映射参数。

官网地址：https://www.elastic.co/guide/en/elasticsearch/reference/6.3/mapping-params.html

1、analyzer

指定分词器，对应索引和查询都有效。如下：IK分词配置

第一步：定义索引并定义映射

第二步：向指定索引填充数据

Post http://192.168.1.74:9200/my_index/it/1
{
   "name" : "中国是个幸福美丽的国度"
}

Post http://192.168.1.74:9200/my_index/it/2
{
   "name" : "广东省的省会城市是羊城(广州)"
}

Post http://192.168.1.74:9200/my_index/it/3
{
   "name" : "中国最南的岛屿叫曾母暗沙"
}

第三步：查询

2、normalizer

elasticsearch标准化配置属性，如下：所有字符转换为小写。

第一步：定义索引，并设置elasticsearch 分析标准。

参数内容：

{
"settings": {
    "analysis": {
      "normalizer":{
        "my_normalizer":{
          "type":"custom",
          "char_filter" : [],
          "filter" : ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "it" : {
      "properties" : {
        "name" : {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

第二步：插入数据：

Post http://192.168.1.74:9200/my_index/it/1
{
   "name" : "bar"
}

Post http://192.168.1.74:9200/my_index/it/2
{
   "name" : "BAR"
}

Post http://192.168.1.74:9200/my_index/it/3
{
   "name" : "BaR"
}

第三步：查询数据：

3、boost

通过指定一个boost值来控制每个查询子句的相对权重，该值默认为1。一个大于1的boost会增加该查询子句的相对权重。

第一步：定义索引，设置权重比值

第二步：填充数据

Post http://192.168.1.74:9200/my_index/it/1
{
   "title" : "client me"
}

Post http://192.168.1.74:9200/my_index/it/2
{
   "title" : "follow"
}

第三步：查询数据

4、coerce

coerce属性用于清除脏数据，coerce的默认值是true。整型数字5有可能会被写成字符串“5”或者浮点数5.0.coerce属性可以用来清除脏数据：

字符串会被强制转换为整数
浮点数被强制转换为整数

【例子】
（1）重新创建my_index

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer"
        },
        "number_two": {
          "type": "integer",
          "coerce": false
        }
      }
    }
  }
}

（2）写入一条测试文档

PUT my_index/my_type/1
{
  "number_one": "10" 
}

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

（3）写入另一条测试文档

PUT my_index/my_type/2
{
  "number_two": "10" 
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [number_two]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [number_two]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Integer value passed as String"
    }
  },
  "status": 400
}

5、copy-to

copy_to属性用于配置自定义的_all字段。换言之，就是多个字段可以合并成一个超级字段。比如，first_name和last_name可以合并为full_name字段。
【例子】
（1）

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "first_name": "John",
  "last_name": "Smith"
}

（2）查询

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

{
  "took": 22,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first_name": "John",
          "last_name": "Smith"
        }
      }
    ]
  }
}

6、doc_values

doc_values是为了加快排序、聚合操作，在建立倒排索引的时候，额外增加一个列式存储映射，是一个空间换时间的做法。默认是开启的，对于确定不需要聚合或者排序的字段可以关闭。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "status_code": { 
          "type":       "keyword"
        },
        "session_id": { 
          "type":       "keyword",
          "doc_values": false
        }
      }
    }
  }
}

7、dynamic

dynamic属性用于检测新发现的字段，有三个取值：

true:新发现的字段添加到映射中。（默认）
flase:新检测的字段被忽略。必须显式添加新字段。
strict:如果检测到新字段，就会引发异常并拒绝文档

【例子】
（1）新建索引
取值为strict，非布尔值要加引号

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic": "strict", 
      "properties": {
        "title": { "type": "text"}
      }
    }
  }
}

（2）插入新文档

PUT my_index/my_type/1
{
  "title": "test",
  "content": "test dynamic"
}

抛出异常

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
  },
  "status": 400
}

8、enabled

ELasticseaech默认会索引所有的字段，enabled设为false的字段，es会跳过字段内容，该字段只能从_source中获取，但是不可搜。而且字段可以是任意类型。

【例子】
（1）新建索引，插入文档

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name":{"enabled": false}
      } 
    }
  }
}
PUT my_index/my_type/1
{
  "title": "test enabled",
  "name":"chengyuqiang"
}

（2）查看文档

GET my_index/my_type/1

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "title": "test enabled",
    "name": "chengyuqiang"
  }
}

（3）搜索字段

GET my_index/_search
{
  "query": {
    "match": {
      "name": "chengyuqiang"
    }
  }
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

9、ignore_above

ignore_above用于指定字段索引和存储的长度最大值，超过最大值的会被忽略

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "keyword",
          "ignore_above": 20 
        }
      }
    }
  }
}

PUT my_index/my_type/1 
{
  "message": "Syntax error"
}

PUT my_index/my_type/2 
{
  "message": "Syntax error with some long stacktrace"
}

GET my_index/_search 
{
  "size":0,
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}

mapping中指定了ignore_above字段的最大长度为20，第一个文档的字段长小于20，因此索引成功，第二个超过20，因此不索引，返回结果只有”Syntax error”,结果如下

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "messages": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Syntax error",
          "doc_count": 1
        }
      ]
    }
  }
}

10、ignore_malformed

ignore_malformed可以忽略不规则数据。对于账号userid字段，有人可能填写的是整数类型，也有人填写的是邮件格式。给一个字段索引不合适的数据类型发生异常，导致整个文档索引失败。如果ignore_malformed参数设为true，异常会被忽略，出异常的字段不会被索引，其它字段正常索引。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer",
          "ignore_malformed": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}

PUT my_index/my_type/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

上面的例子中number_one接受integer类型，ignore_malformed属性设为true，因此文档一种number_one字段虽然是字符串但依然能写入成功；number_two接受integer类型，默认ignore_malformed属性为false，因此写入失败。

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [number_two]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [number_two]",
    "caused_by": {
      "type": "number_format_exception",
      "reason": "For input string: \"foo\""
    }
  },
  "status": 400
}

11、index_options

index_options参数控制将哪些信息添加到倒排索引，用于搜索和突出显示目的。

注意：The index_options parameter has been deprecated for Numeric fields in 6.0.0。6.0.0中的数字字段已弃用index_options参数。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "text",
          "index_options": "offsets"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text": "Quick brown fox"
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": "brown fox"
    }
  },
  "highlight": {
    "fields": {
      "text": {} 
    }
  }
}

{
  "took": 50,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "text": "Quick brown fox"
        },
        "highlight": {
          "text": [
            "Quick <em>brown</em> <em>fox</em>"
          ]
        }
      }
    ]
  }
}

12、 index

index属性指定字段是否索引，不索引也就不可搜索，取值可以为true或者false。

13、fields

fields可以让同一文本有多种不同的索引方式，比如一个String类型的字段，可以使用text类型做全文检索，使用keyword类型做聚合和排序。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "city": "New York"
}

PUT my_index/my_type/2
{
  "city": "York"
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}


{
  "took": 31,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "city": "New York"
        },
        "sort": [
          "New York"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "city": "York"
        },
        "sort": [
          "York"
        ]
      }
    ]
  },
  "aggregations": {
    "Cities": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "New York",
          "doc_count": 1
        },
        {
          "key": "York",
          "doc_count": 1
        }
      ]
    }
  }
}

说明：The city.raw field is a keyword version of the city field (.city.raw字段是城市字段的关键字版本。)
The city field can be used for full text search.( city字段可用于全文搜索。)
The city.raw field can be used for sorting and aggregations.( city.raw字段可用于排序和聚合)