https://blog.csdn.net/Interstellar_/article/details/81359301#22.%20term_vector
目录
一、Mapping的参数
1. analyzer
分词器可以在query中定义、field中定义、index中定义
PUT /my_index
{
"mappings": {
"_doc": {
"properties": {
"text": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
}
GET my_index/_analyze
{
"field": "text", //使用stardard分析器
"text": "The quick Brown Foxes." // return [ the, quick, brown, foxes ].
}
GET my_index/_analyze
{
"field": "text.english", //使用english分析器
"text": "The quick Brown Foxes." //[ quick, brown, fox ]
}
2. normalizer
normalizer用于解析前的标准化配置,比如把所有的字符转化为小写等。
PUT index
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
PUT index/_doc/1
{
"foo": "BÀR"
}
PUT index/_doc/2
{
"foo": "bar"
}
PUT index/_doc/3
{
"foo": "baz"
}
POST index/_refresh
GET index/_search
{
"query": {
"term": {
"foo": "BAR"
}
}
}
// BAR经过normalizer后会转化为bar,因此文档1和文档2都会被检索到
GET index/_search
{
"query": {
"match": {
"foo": "BAR"
}
}
}
3. boost
用于设置字段的权重,默认值为1
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"boost": 2
},
"content": {
"type": "text"
}
}
}
}
}
4. coerce
coerce属性用于清除脏数据,默认值是true。比如整型数字5有可能会被写成字符串“5”或者浮点数5.0。开启coerce属性可以清洗:
字符串会被转换为整数
浮点数被转换为整数
5. copy_to
可以使多个字段合并成一个字段。比如,first_name和last_name可以合并为full_name字段
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/_doc/1
{
"first_name": "John",
"last_name": "Smith"
}
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
6. doc_values
默认开启,如果不需要对字段进行排序或聚合,或者从脚本访问字段值,则可以将其设为false以节省磁盘空间
7. dynamic
要不要自动添加新字段。默认为true。值为false时,会忽略新字段;值为strict时,会引发异常。
PUT my_index
{
"mappings": {
"_doc": {
"dynamic": false,
"properties": {
"user": {
"properties": {
"name": {
"type": "text"
},
"social_networks": {
"dynamic": true,
"properties": {}
}
}
}
}
}
}
}
8. enable
有些字段我们只想存储但不想对其索引,可以将该字段设为false。设为false后该字段只能从_source中获取,但是不可搜。
9. fielddata
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/fielddata.html
10. format
format主要用来格式化日期,具体格式见https://www.elastic.co/guide/en/elasticsearch/reference/6.3/mapping-date-format.html
11. ignore_above
该字段用来指明字段的最大长度,超过该长度将不会被index或store
12. ignore_malformed
该字段可以忽略不规则数据,默认为false
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"number_one": {
"type": "integer",
"ignore_malformed": true
},
"number_two": {
"type": "integer"
}
}
}
}
}
// 添加成功,因为开启了ignore_malformed字段
PUT my_index/my_type/1
{
"text": "Some text value",
"number_one": "foo"
}
// 添加失败,因为未开启
PUT my_index/my_type/2
{
"text": "Some text value",
"number_two": "foo"
}
13. index
该属性指定字段是否被索引,默认为true
14. index_options
index_options指出哪些信息被加到倒排索引中
docs | 只有文档编号被加入 |
freqs | 文档编号和词的频率被加入 |
positions | 文档编号、词的频率、词的位置被加入 |
offsets | 文档编号、词的频率、词的位置、词项开始和结束的字符位置被加入 |
15. fields
fields可以让同一字段有多种不同的索引方式,比如一个String类型的字段,可以使用text做全文检索,使用keyword做聚合和排序。
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
PUT my_index/_doc/1
{
"city": "New York"
}
PUT my_index/_doc/2
{
"city": "York"
}
GET my_index/_search
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}
16. norms
对评分很有用,但会消耗大量磁盘空间,默认不开启
17. null_value
默认情况下值为null的字段不被index和search,该参数可以让值为null的字段变得可index和search
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"status_code": {
"type": "keyword",
"null_value": "NULL"
}
}
}
}
}
// 值为null,可以被搜索到
PUT my_index/_doc/1
{
"status_code": null
}
// 值为空,不是null,不可以被搜索到
PUT my_index/_doc/2
{
"status_code": []
}
GET my_index/_search
{
"query": {
"term": {
"status_code": "NULL"
}
}
}
18. position_increment_gap
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/position-increment-gap.html
19. search_analyzer
通常,应在索引和搜索时使用相同的分析器,以确保查询中的术语与反向索引中的属于具有相同的格式。但有时也需要使用不同的分析器,例如在使用 edge_ngram
进行自动补全时。
默认情况下,查询将使用analyzer字段制定的分析器,但也可以被search_analyzer覆盖
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Quick Brown Fox"
}
GET my_index/_search
{
"query": {
"match": {
"text": {
"query": "Quick Br",
"operator": "and"
}
}
}
}
20. similarity
指定文档的评分模型,参数由"BM25"(默认), "classic"(TF/IDF), "boolean"(布尔评分模型)
21. store
默认情况下,field values是可索引和搜索的,但是它们不被存储。这意味着这些field可以被查询,但是原始的field value不能被获取。
不过这没关系,因为_source字段中已经默认保存了一份文档,所以可以从设置_source字段中来取。
在某些情况下,store参数也是有意义的,比如一个文档里面有title、date和一个超大的content字段,我们可能只想获取title和date,这种情况可以这样设置
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"store": true
},
"date": {
"type": "date",
"store": true
},
"content": {
"type": "text"
}
}
}
}
}
PUT my_index/_doc/1
{
"title": "Some short title",
"date": "2015-01-01",
"content": "A very long content field..."
}
GET my_index/_search
{
"stored_fields": [ "title", "date" ]
}
22. term_vector
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/term-vector.html