7.4.2-elasticsearch索引字段类型参数

最新推荐文章于 2022-07-02 21:18:09 发布

akka_rz

最新推荐文章于 2022-07-02 21:18:09 发布

阅读量704

点赞数

分类专栏： ELK 文章标签： elasticsearch

本文链接：https://blog.csdn.net/weixin_28906733/article/details/106417295

版权

ELK 专栏收录该内容

26 篇文章 1 订阅

订阅专栏

字段映射可配置参数

序号	参数	说明
1	analyzer	分词器(常见的有standard,english,lowercase等)
2	boost	文档相关度计算分数因子
3	coerce	是否强制ES字段接受类型不匹配的值
4	copy_to	拷贝字段值到其他字段上
5	doc_values	字段是否以列式存储
6	dynamic	是否启用动态映射
7	eager_global_ordinals	是否使用词元编号
8	enabled	字段是否启用
9	fielddata	text类型字段配置
10	fields	多字段
11	format	日期类型格式
12	ignore_above	忽略字段索引阈值
13	ignore_malformed	忽略字段索引
14	index_options	索引配置参数
15	index_phrases	组合词元成新词组
16	index_prefixes	词元查询字段限制
17	index	是否建立索引
18	meta	索引附加信息
19	normalizer
20	norms
21	null_value
22	position_increment_gap
23	properties
24	search_analyzer
25	similarity
26	store
27	term_vector

analyzer参数

只有text类型字段可以支持analyzer参数

//自定义analyzer
PUT custom_analyzer_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_folded":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":["lowercase","asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "analyzer_text":{
        "type": "text",
        "analyzer": "std_folded"
      }
    }
  }
}

GET custom_analyzer_index/_analyze
{
  "analyzer": "std_folded",
  "text": "Is this deja vu?"
}

GET custom_analyzer_index/_analyze
{
  "field": "analyzer_text",
  "text": "Is this deja vu?"
}

search_quote_analyzer配置,针对词组设置特定的分词器,这对于停用词非常有效;
要设置停用词需要使用三个分词器设置字段:
1)、analyzer 用于所有词语的索引,包括停用词;
2)、search_analyzer 用于非短语查询(会移除停用词)
3)、search_quote_analyzer 用于短语查询(不会移除停用词)

PUT /custom_analyzer_index_search
{
  "settings": {
    "analysis": {
      "analyzer": {
        "analyzer_index_1":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":["lowercase"]
        },
        "analyzer_index_2":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":["lowercase","english_stop"]
        }
      },
      "filter": {
        "english_stop":{
          "type":"stop",
          "stopwords":"_english_"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "analyzer_index_1",
        "search_analyzer": "analyzer_index_2",
        "search_quote_analyzer": "analyzer_index_1"
      }
    }
  }
}

PUT custom_analyzer_index_search/_doc/1
{
  "title":"The Quick Brown Fox"
}

PUT custom_analyzer_index_search/_doc/2
{
  "title":"A Quick Brown Fox"
}

GET custom_analyzer_index_search/_search

GET custom_analyzer_index_search/_search
{
  "query": {
    "query_string": {
      "query": "\"the quick brown fox\""
    }
  }
}

boost参数

每个字段会被自动应用因子boost用来计算文档相关度分数

PUT param_boost_index
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "boost": 2
      },
      "content":{
        "type": "text"
      }
    }
  }
}

在计算文档相关分时,title匹配时将比content匹配高一倍的分数,默认为1.0,boost参数只在term查询时生效(prefix,range和fuzzy不会生效);

不建议在索引时设置boost几点理由:
1)、索引时的boost参数值不可改变,除非进行reindex;
2)、查询的时候设置boost参数可以达到相同效果,不同之处在于可以根据需要设置boost值;
3)、在索引时设置boost参数会额外占据磁盘空间,可能会降低计算出的文档相关分;

coerce参数

插入ES的数据并非完全是合规的,例如期望一个字段是数值类型,但是传入的时候以字符串形式传入,这时候可以配置coerce参数来强制ES接收

PUT param_coerce_index_1
{
  "mappings": {
    "properties": {
      "number_one":{
        "type": "integer"
      },
      "number_two":{
        "type": "integer",
        "coerce":false
      }
    }
  }
}
//正常
PUT param_coerce_index_1/_doc/1
{
  "number_one":"10"
}
//报错
PUT param_coerce_index_1/_doc/2
{
  "number_two":"10"
}
coecre参数在设置之后还可以通过api更改
PUT /param_coerce_index_1/_mapping
{
  "properties":{
    "number_two":{
      "type":"integer",
      "coerce":true
    }
  }
}

索引级别的coerce参数设置

//可以在索引级别配置参数index.mapping.coerce来约束es的行为;
PUT /param_coerce_index_2
{
  "settings": {
    "index.mapping.coerce":false
  }, 
  "mappings": {
    "properties": {
      "number_one":{
        "type": "integer",
        "coerce":true
      },
      "number_two":{
        "type": "integer"
      }
    }
  }
}
//正常
PUT param_coerce_index_2/_doc/1
{
  "number_one":"10"
}
//将报错
PUT param_coerce_index_2/_doc/2
{
  "number_two":"20"
}

copy_to参数

copy_to参数允许将多个值拷贝到一个组合字段中(可以作为单个字段进行查询)

PUT param_copy_to_index
{
  "mappings": {
    "properties": {
      "first_name":{
        "type": "text",
        "copy_to": "full_name"
      },
      "last_name":{
        "type": "text",
        "copy_to": "full_name"
      },
      "full_name":{
        "type": "text"
      }
    }
  }
}

//此处需要注意first_name与last_name字段顺序,查询时保证query字段与index时一致
PUT param_copy_to_index/_doc/1
{
  "last_name":"Smith",
  "first_name":"John"
}

//查看索引数据
GET param_copy_to_index/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "Smith John",
        "operator": "and"
      }
    }
  }
}

需要明确的几点:
1)、copy_to只是将字段值复制了,而非其分词(terms);
2)、原始的_source字段将不会被更改而显示copy_to的值;
3)、同一个值可以被拷贝到多个字段上,形如 “copy_to”: [“field1”,“field2”];
4)、不允许递归地进行值拷贝;

doc_values参数

默认情况下索引中大部分字段都被索引从而使得字段可被搜索,倒排索引允许从排序过的词元(term)列表中查找词元并返回词元对应关联的文档;
不同于根据词元找文档，排序、聚合及脚本查询等操作需要不同的数据访问模式,需要能够找到文档并且对应字段上包含对应的词元;
doc_values值是在文档索引时构建的磁盘数据结构,其存储与_source相同的值,但是是面向列的方式存储,这使得排序和聚合等操作能得以实现;doc_values支持除text与annotated_text以外的字段类型;

//若字段确定无排序或聚合的需求,可以将其doc_values值置为false,此处session_id的doc_values设置为false
PUT param_doc_value_index
{
  "mappings": {
    "properties": {
      "status_code":{
        "type": "keyword"
      },
      "session_id":{
        "type": "keyword",
        "doc_values": false
      }
    }
  }
}

dynamic参数

dynamic可选参数

参数	说明
true	新字段将自动发现并且映射
false	新字段自动发现将被忽略,这些字段将不会建立索引从而导致不可搜索,不过这部分字段信息仍会在_source字段中出现,这些字段也不会被添加到映射中;
strict	检测到有新字段将抛出异常且文档不会建立索引

enabled参数

默认情况下ES会尝试为所有字段建立索引,但有些情况可能只想存储数据而非为其建立索引;
enable参数可以针对索引级别和对象字段级别设置,这样可以使ES跳过对字段内容的解析,但是json内容仍然可以从_source字段中检索出来,但是不可单独被检索或以其他形式存储;

PUT params_enabled_index
{
  "mappings": {
    "properties": {
      "user_id":{
        "type": "keyword"
      },
      "last_updated":{
        "type": "date"
      },
      "session_data":{
        "type": "object",
        "enabled":false
      }
    }
  }
}

PUT params_enabled_index/_doc/session_1
{
  "user_id":"kimchy",
  "session_data":{
    "arbitrary_object":{
      "some_array":["foo","bar",{"clazz":2}]
    }
  },
  "last_updated":"2020-06-01T10:00:00"
}

GET params_enabled_index/_mapping

GET params_enabled_index/_doc/session_1

字段或索引上设置的enabled参数不可更改,字段enabled参数置成false,ES将不再解析字段内容,这样可以添加一个非object类型的数据到一个object类型的字段上;

PUT params_enabled_all_index
{
  "mappings": {
    "enabled":false
  }
}
//更新索引字段映射,无效
PUT params_enabled_all_index/_mapping
{
  "properties":{
    "username":{
      "type":"text",
      "fields":{
        "keyword":{
          "type":"keyword"
        }
      }
    }
  }
}

PUT params_enabled_all_index/_doc/session_1
{
  "user_id":"kimchy",
  "session_data":{
    "arbitrary_object":{
      "some_array":["foo","bar",{"clazz":2}]
    }
  },
  "last_updated":"2020-06-01T10:00:00"
}
//增加username
PUT params_enabled_all_index/_doc/session_3
{
  "user_id":"kimchy",
  "session_data":{
    "arbitrary_object":{
      "some_array":["foo","bar",{"clazz":2}]
    }
  },
  "last_updated":"2020-06-01T10:00:00",
  "username":"bbbb"
}

GET params_enabled_all_index/_mapping
//能够查询记录且详情
GET params_enabled_all_index/_doc/session_3
//无法查询结果
GET params_enabled_all_index/_search
{
  "query": {
    "match": {
      "username.keyword": "bbbb"
    }
  }
}

//定义类型为object,因为enabled参数,字符串可插入
PUT params_enabled_field_parse_ignore_index
{
  "mappings": {
    "properties": {
      "session_data":{
        "type": "object",
        "enabled":false
      }
    }
  }
}

PUT params_enabled_field_parse_ignore_index/_doc/1
{
  "session_data":"foo bar"
}