Elasticsearch整理之mapping的参数

最新推荐文章于 2024-09-05 13:09:01 发布

wust_tanyao

最新推荐文章于 2024-09-05 13:09:01 发布

阅读量595

点赞数

分类专栏： ElasticSearch

ElasticSearch 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

https://blog.csdn.net/Interstellar_/article/details/81359301#22.%20term_vector

18. position_increment_gap

一、Mapping的参数

1. analyzer

分词器可以在query中定义、field中定义、index中定义

PUT /my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "text": { 
          "type": "text",
          "fields": {
            "english": { 
              "type":     "text",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}
 
GET my_index/_analyze 
{
  "field": "text", //使用stardard分析器
  "text": "The quick Brown Foxes."  // return [ the, quick, brown, foxes ].
}
 
GET my_index/_analyze 
{
  "field": "text.english", //使用english分析器
  "text": "The quick Brown Foxes."  //[ quick, brown, fox ]
}

2. normalizer

normalizer用于解析前的标准化配置，比如把所有的字符转化为小写等。

PUT index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}
 
PUT index/_doc/1
{
  "foo": "BÀR"
}
 
PUT index/_doc/2
{
  "foo": "bar"
}
 
PUT index/_doc/3
{
  "foo": "baz"
}
 
POST index/_refresh
 
GET index/_search
{
  "query": {
    "term": {
      "foo": "BAR"
    }
  }
}
 
// BAR经过normalizer后会转化为bar，因此文档1和文档2都会被检索到
GET index/_search
{
  "query": {
    "match": {
      "foo": "BAR"
    }
  }
}

3. boost

用于设置字段的权重，默认值为1

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "boost": 2 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

4. coerce

coerce属性用于清除脏数据，默认值是true。比如整型数字5有可能会被写成字符串“5”或者浮点数5.0。开启coerce属性可以清洗：

字符串会被转换为整数

浮点数被转换为整数

5. copy_to

可以使多个字段合并成一个字段。比如，first_name和last_name可以合并为full_name字段

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}
 
PUT my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}
 
GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

6. doc_values

默认开启，如果不需要对字段进行排序或聚合，或者从脚本访问字段值，则可以将其设为false以节省磁盘空间

7. dynamic

要不要自动添加新字段。默认为true。值为false时，会忽略新字段；值为strict时，会引发异常。

PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic": false, 
      "properties": {
        "user": { 
          "properties": {
            "name": {
              "type": "text"
            },
            "social_networks": { 
              "dynamic": true,
              "properties": {}
            }
          }
        }
      }
    }
  }
}

8. enable

有些字段我们只想存储但不想对其索引，可以将该字段设为false。设为false后该字段只能从_source中获取，但是不可搜。

9. fielddata

https://www.elastic.co/guide/en/elasticsearch/reference/6.3/fielddata.html

10. format

format主要用来格式化日期，具体格式见https://www.elastic.co/guide/en/elasticsearch/reference/6.3/mapping-date-format.html

11. ignore_above

该字段用来指明字段的最大长度，超过该长度将不会被index或store

12. ignore_malformed

该字段可以忽略不规则数据，默认为false

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer",
          "ignore_malformed": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}
 
// 添加成功，因为开启了ignore_malformed字段
PUT my_index/my_type/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}
 
// 添加失败，因为未开启
PUT my_index/my_type/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

13. index

该属性指定字段是否被索引，默认为true

14. index_options

index_options指出哪些信息被加到倒排索引中

docs	只有文档编号被加入
freqs	文档编号和词的频率被加入
positions	文档编号、词的频率、词的位置被加入
offsets	文档编号、词的频率、词的位置、词项开始和结束的字符位置被加入

15. fields

fields可以让同一字段有多种不同的索引方式，比如一个String类型的字段，可以使用text做全文检索，使用keyword做聚合和排序。

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}
 
PUT my_index/_doc/1
{
  "city": "New York"
}
 
PUT my_index/_doc/2
{
  "city": "York"
}
 
GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

16. norms

对评分很有用，但会消耗大量磁盘空间，默认不开启

17. null_value

默认情况下值为null的字段不被index和search，该参数可以让值为null的字段变得可index和search

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "status_code": {
          "type":       "keyword",
          "null_value": "NULL" 
        }
      }
    }
  }
}
 
// 值为null，可以被搜索到
PUT my_index/_doc/1
{
  "status_code": null
}
// 值为空，不是null，不可以被搜索到
PUT my_index/_doc/2
{
  "status_code": [] 
}
 
GET my_index/_search
{
  "query": {
    "term": {
      "status_code": "NULL" 
    }
  }
}

18. position_increment_gap

https://www.elastic.co/guide/en/elasticsearch/reference/6.3/position-increment-gap.html

19. search_analyzer

通常，应在索引和搜索时使用相同的分析器，以确保查询中的术语与反向索引中的属于具有相同的格式。但有时也需要使用不同的分析器，例如在使用 edge_ngram 进行自动补全时。

默认情况下，查询将使用analyzer字段制定的分析器，但也可以被search_analyzer覆盖

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "autocomplete", 
          "search_analyzer": "standard" 
        }
      }
    }
  }
}
 
PUT my_index/my_type/1
{
  "text": "Quick Brown Fox" 
}
 
GET my_index/_search
{
  "query": {
    "match": {
      "text": {
        "query": "Quick Br", 
        "operator": "and"
      }
    }
  }
}

20. similarity

指定文档的评分模型，参数由"BM25"（默认）, "classic"（TF/IDF）, "boolean"（布尔评分模型）

21. store

默认情况下，field values是可索引和搜索的，但是它们不被存储。这意味着这些field可以被查询，但是原始的field value不能被获取。

不过这没关系，因为_source字段中已经默认保存了一份文档，所以可以从设置_source字段中来取。

在某些情况下，store参数也是有意义的，比如一个文档里面有title、date和一个超大的content字段，我们可能只想获取title和date，这种情况可以这样设置

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}
 
PUT my_index/_doc/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}
 
GET my_index/_search
{
  "stored_fields": [ "title", "date" ] 
}