mapping parameters

最新推荐文章于 2022-10-25 13:19:11 发布

姓氏弓长张

最新推荐文章于 2022-10-25 13:19:11 发布

阅读量978

点赞数

分类专栏： elasticsearch 文章标签：搜索

elasticsearch 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

来源url
analyzer
- search_quote_analyzer
boost
coerce
- index-level default
copy_to
doc_values
dynamic
enabled
fielddata
format
- custom date formats
- built in formats
ignore_above
ignore_malformed
- index-level default
include_in_all
index
index_options

来源url

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/mapping-params.html

analyzer

被分析的字符串属性的值通过分词器将字符串转化为多个词组。举个例子，字符串”the quick brown foxes”,根据使用的分词器可以被分成的:quick,brown,fox. 使用中文分词器则根据你的自定义词典决定分成的自足。这些实际分出的词组用于属性值的索引，能够使大的文本信息中的词组搜索更有效率。
分词过程不仅仅发生在索引的时间，同样发生在搜索的时间，搜索的字符串也需要通过相同或者详细的分词器这样才能查找到索引中相同格式的存在。
elasticsearch拥有一定数量的原始分词器参看
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analysis-analyzers.html
同时你也可以在索引之前配置自定义的分词器，参考地址
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analysis-charfilters.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analysis-tokenizers.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analysis-tokenfilters.html
具体内容参考文档中的分析章节。
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analysis.html

分词器可以在查询前，属性设置前，索引创建前进行设置，在索引时elasticsearch将按照如下顺序查找分词器
field mapping中定义的analyzer
在索引设置中的default 的分词器
标准分词器 standard analyzer

在query时，有更多的层次
在full-text query中定义的analyzer
在field mapping中设计的search_analyzer
在field mapping中定义的analyzer
在index设置中的default_search
在index设置中的 default
标准分词器 standard analyzer

一个最简单的对一个特殊的属性设置一个analyzer是在属性的mapping中设置

PUT /my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": { 
          "type": "text",
          "fields": {
            "english": { 
              "type":     "text",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}
GET my_index/_analyze?field=text 
{
  "text": "The quick Brown Foxes."
}
GET my_index/_analyze?field=text.english 
{
  "text": "The quick Brown Foxes."
}

上面的使用模式，当对text进行分析时使用的是标准分词器，而对text.english分析时使用的是英语分词器。

search_quote_analyzer

后续补充
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/analyzer.html#search-quote-analyzer

boost

每个属性都已经自动的进行了增强，在query时统计关联性评分的时候。设置boost参数如下

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "text",
          "boost": 2 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

匹配title属性将是content属性两倍的权重，默认boost为1.0
除了match query外boost只能被应用余额term query （prefix，range，fuzzy query不能使用）

boost当值被拷贝到_all域中时同样可以被应用。意味着，当查询_all域时，title属性中的值含有比content里面的值更高的分属。这个功能带来功能消耗，在_all域中query比不使用boost要慢。

deprecated in 5 version
index time boost is deprecated. Instead, the field mapping boost is applied at query time. For indices created before 5.0.0 the boost will still be applied at index time

为什么index time boosting是个坏主意
我们建议停止使用index time boosting因为如下原因：
除了对你的所有文档重建索引外你不能改变index-time boost 值
每个query都支持query-time boosting能够获取同样的想过。而不同的是在query时使用不用重新索引
index-time boosts被存储成 norm的一部分，占用1个字节。将减少标准化因子的长度，这样将会导致关联系计算的低质量。

coerce

数据不总是纯净的。创建一个数字需要在json对应中也是一个数字比如5，但也可能被表述成为”5”.另外，一个应该是整数的数据被表述成为一个浮点数，比如5.0 甚至是“5.0”
coerce视图在填写属性的数据类型时，清除脏值。
比如：
字符串将不能转化为数字
浮点将缩短成为整数

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer"
        },
        "number_two": {
          "type": "integer",
          "coerce": false
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "number_one": "10" 
}

PUT my_index/my_type/2
{
  "number_two": "10" 
}

number_one将能够正常存储，但是number_two因为强制转换被设置成disabled将会被拒绝。

tip
同一个field的coerce 设置允许变动。已存在的field对应该值的设置可以使用 put mapping api进行更新。

index-level default

index.mapping.coerce 设置能够在index级别对所有的type进行强制转换不可用的设置。

PUT my_index
{
  "settings": {
    "index.mapping.coerce": false
  },
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer",
          "coerce": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{ "number_one": "10" } 

PUT my_index/my_type/2
{ "number_two": "10" }

number_one 在field层级进行的设置覆盖了index层级的设置，而number_two将不能保存。

copy_to

copy_to参数允许你自定义创建类似_all这样的属性。换句话说多个属性的值可以被复制进一个大的属性中，这个属性可以作为一个属性被query

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

需要注意的重要的点：
被复制出来的属性不是表达方式（不是分析过程产生的结果）
获取copied values 原始的_source field将不会改变
在设置copy_to时可以将一个值复制到多个属性中 “copy_to”:[“field_1”,”field_2”]

doc_values

大部分的属性默认被索引，使他们searchable。倒排索引允许query在倒排列表中去寻找searchable的单词，倒排索引是将文档中包含的单词通过单词查询文档。
排序，聚合，在脚本中使用field需要不同的数据访问模式。为了取代寻找单词以及查找文档，我们需要能够找到文档以及在属性中的单词。
doc values是存储在硬盘上的数据结构，建立文档索引时，创建合理的数据访问模式。他们存储类似_source的数据但是基于列的方式，这种方式在排序和聚合时更有效率。doc values支持几乎所有属性类型，对含有分词器的字符串属性会有异常。
所有的属性支持doc values默认为enabled。如果你确定你不需要排序或者聚合针对某一属性，或者在脚本中使用这一属性，你可以disable doc values 去节省硬盘空间。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "status_code": { 
          "type":       "keyword"
        },
        "session_id": { 
          "type":       "keyword",
          "doc_values": false
        }
      }
    }
  }
}

session_id 禁用了 doc_values ,但是依旧可以query。

dynamic

默认的，mapping中的属性将会自动添加到文档中，或者是添加到文档的内部对象中，用于索引文档中包含的新的属性。

PUT my_index/my_type/2 
{
  "username": "marywhite",
  "email": "mary@white.com",
  "name": {
    "first": "Mary",
    "middle": "Alice",
    "last": "White"
  }
}
GET my_index/_mapping

关于新的属性如何被侦测并自动添加到mapping中的参看dynamic mapping
dynamic设置控制新属性是否能够自动添加，他接收三种设置：
true 新侦测到的属性被添加到mapping(默认)
false 新侦测的属性被忽略。新属性必须显式的添加.
strict 如果新属性被侦测，文档拒绝添加并且抛出异常。

dynamic设置可以设置到mapping type层级，也可以设置到内部对象层级。内部对象将继承他们的父对象层级或者继承mapping type层级

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic": false, 
      "properties": {
        "user": { 
          "properties": {
            "name": {
              "type": "text"
            },
            "social_networks": { 
              "dynamic": true,
              "properties": {}
            }
          }
        }
      }
    }
  }
}

tip
同一个field的dynamic设置允许变动。已存在的field对应该值的设置可以使用 put mapping api进行更新。

试验
当将type或者是内部对象设置为dynamic:false时，此时创建的文档将不会被索引，试验时使用query将无法对对应属性进行查询等操作，猜测聚合，排序也无法进行。然后将mapping进行更新，将dynamic设置为true，则新增的文档可以进行query查询，但在dynamic:false时增加的文档将无法使用query查询(即当插入dynamic为false的数据后即使将dynamic设置为true，之前的数据也不会被索引)。

enabled

elasticsearch总是试图索引所有你给他的属性，但是有时你希望仅仅存储这些属性但是不索引它。举个实例，想象一下你用elasticsearch作为web session存储。你可能希望索引session id以及最后更新时间，但是你不需要对session数据本身query或者进行聚合操作。
enabled设置，将提供给mapping type 和对象属性，因为elasticsearch将跳过解析属性实体完整内容。JSON可以被从_source域中获取，但是将不能searchable或者在其他方式存储。

PUT my_index
{
  "mappings": {
    "session": {
      "properties": {
        "user_id": {
          "type":  "keyword"
        },
        "last_updated": {
          "type": "date"
        },
        "session_data": { 
          "enabled": false
        }
      }
    }
  }
}

PUT my_index/session/session_1
{
  "user_id": "kimchy",
  "session_data": { 
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  },
  "last_updated": "2015-12-06T18:20:22"
}

PUT my_index/session/session_2
{
  "user_id": "jpountz",
  "session_data": "none", 
  "last_updated": "2015-12-06T18:22:13"
}

session_data 属性enabled:false
任何任意的数据都能存储到session_data因为它的验证被完全的忽略。
session_data 也同样可以存储不是json的对象的值。

整个mapping type同样可以设置成disabled，这样的话文档将会存储到_source属性中，意味着可以被获取，但是文档的内容不用通过任何方式被索引。（在当成数据存储中心不索引时使用）

PUT my_index
{
  "mappings": {
    "session": { 
      "enabled": false
    }
  }
}

PUT my_index/session/session_1
{
  "user_id": "kimchy",
  "session_data": {
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  },
  "last_updated": "2015-12-06T18:20:22"
}

GET my_index/session/session_1 

GET my_index/_mapping

整个session 的type是disabled
文档可以被获取，但是查看mapping显示没有任何属性被添加。

tip
同一个field的enabled设置允许变动。已存在的field对应该值的设置可以使用 put mapping api进行更新。

fielddata

大部分的属性都被默认索引，这些属性可被查询。排序，聚合以及在脚本中使用这些属性的值，而不同类型属性的查询需要不同的访问模式。
查询需要回答的问题“哪个文档中包含这个单词？”，排序和聚合需要回答的是另外的问题 “这个文档中的这个域的值是多少？”
大部分的属性可以使用index-time，on-disk doc_values这种数据访问模式，但是text类型的属性不支持doc_values.
text属性使用query-time in-memory 这种数据结构叫做fielddata。这种数据结构会在属性首次被聚合，排序或在脚本中使用时被建立。通过读取硬盘每个段的倒排索引，建立文档的单词倒排关系，存储结果在内存中，在jvm的heap（堆）中创建。

fielddata is disabled on text fields by default

fielddata会消耗大量的堆空间，特别当载入大基数的text属性。一旦fielddata被加载到堆中，它将存在segment整个生命周期。并且，加载fielddata是一种昂贵的流程在用户获取结果时。这就是为什么fielddata默认为distabled。
如果你尝试排序，聚合或者在脚本中访问text类型属性值，你将会看到这样的异常：
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

before enabling fielddata

在你调整fielddata之前，认真考虑为什么你需要对text类型的属性进行聚合，排序，或者在脚本中使用。这样做通常是没意义的。
一个text属性在索引之前是被分词器进行分词过的，比如new york在搜索是可以使用new或者york都能查找。当您期望返回一个叫new york的数据桶时，将返回new的数据桶以及york的数据桶。
更好的方案是，你应该有使用text类型进行全文搜索，使用一个不用分词的keyword类型（doc_values enabled）进行聚合。例如下面的方式。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_field": { 
          "type": "text",
          "fields": {
            "keyword": { 
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

使用my_field进行搜索，使用my_field.keyword进行聚合，排序和脚本中使用。

enabling fielddata on text fields

你可以对存在的text属性进行设置

PUT my_index/_mapping/my_type
{
  "properties": {
    "my_field": { 
      "type":     "text",
      "fielddata": true
    }
  }
}

global ordinals（感觉这块翻译不太对）
global ordinals是fielddata和doc values的顶级数据类型，为每个单词在词典中的排序保持增长的数据。每个单词拥有唯一的数据，并且保证单词A一定小于单词B，global ordinals仅支持text和keyword属性。
fielddata和doc values都拥有ordinals，它是每个单词在特定的倒排段和属性中的唯一数值。global ordinals建立在ordinals之上，提供了段ordinals以及global ordinals的映射。
global ordinals被用于使用段ordinals的使用场景，比如排序，单词聚合，优化执行时间。一个单词的聚合是执行每个分片的聚合，然后转化global ordinals 到真诚的单词，最终将不同的分片组合起来得到结果。
一个被设置的filed的global ordinals是与一个分片上所有的倒排段相关联的，fieldata和doc values ordinals是与一个单独的倒排段相关联的。彼此不同，因此一旦一个新的倒排段出现global ordinals需要完全重建。
The loading time of global ordinals depends on the number of terms in a field, but in general it is low, since it source field data has already been loaded. The memory overhead of global ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals can move the loading time from the first search request, to the refresh itself.

fielddata_frequency_filter

fielddata 过滤可以被用于减少加载到内存中的单词的数量，可以减少内存的使用。单词可以被周期性的过滤。
这种频率过滤器允许你只加载文档中在最大和最小值之间的频率的单词，值可以表述为绝对数字（数字大于1.0）或者使用百分数（0.01是1% 1.0代表100%）。频率是在每个倒排段计算的。百分比基于含有指定属性值的文档的数量，域倒排段中的所有文档的占比。
当设置了min_segment_size时，当某个属性在倒排段中占比很少时，这些倒排段将完全被排除在统计外。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "tag": {
          "type": "text",
          "fielddata": true,
          "fielddata_frequency_filter": {
            "min": 0.001,
            "max": 0.1,
            "min_segment_size": 500
          }
        }
      }
    }
  }
}

format

在json文档中，date被表述成为字符串。elasticsearch使用预配置的格式的集合去识别和解析这些字符串成为长整型的UTC的标准时间毫秒表示。
除了內建格式外，你自己的自定义格式也可以设置例如使用类似yyyy/MM/dd 的语法。
具体的参看 mapping fieldtype中的date，有自定义格式的链接。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "date": {
          "type":   "date",
          "format": "yyyy-MM-dd"
        }
      }
    }
  }
}

许多apis 都支持时间计算表达式date math expressions,比如now-1m/d 当前时间减去一个月，四舍五入到最近的一天。

custom date formats

具体的语法解释查看joda docs

built in formats

下面的表格勒出所有的默认格式

epoch_millis

A formatter for the number of milliseconds since the epoch. Note, that this timestamp is subject to the limits of a Java Long.MIN_VALUE and Long.MAX_VALUE.

epoch_second

A formatter for the number of seconds since the epoch. Note, that this timestamp is subject to the limits of a Java Long.MIN_VALUE and Long. MAX_VALUE divided by 1000 (the number of milliseconds in a second).

date_optional_time or strict_date_optional_time

A generic ISO datetime parser where the date is mandatory and the time is optional. Full details here.

basic_date

A basic formatter for a full date as four digit year, two digit month of year, and two digit day of month: yyyyMMdd.

basic_date_time

A basic formatter that combines a basic date and time, separated by a T: yyyyMMdd’T’HHmmss.SSSZ.

basic_date_time_no_millis

A basic formatter that combines a basic date and time without millis, separated by a T: yyyyMMdd’T’HHmmssZ.

basic_ordinal_date

A formatter for a full ordinal date, using a four digit year and three digit dayOfYear: yyyyDDD.

basic_ordinal_date_time

A formatter for a full ordinal date and time, using a four digit year and three digit dayOfYear: yyyyDDD’T’HHmmss.SSSZ.

basic_ordinal_date_time_no_millis

A formatter for a full ordinal date and time without millis, using a four digit year and three digit dayOfYear: yyyyDDD’T’HHmmssZ.

basic_time

A basic formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, three digit millis, and time zone offset: HHmmss.SSSZ.

basic_time_no_millis

A basic formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, and time zone offset: HHmmssZ.

basic_t_time

A basic formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, three digit millis, and time zone off set prefixed by T: ‘T’HHmmss.SSSZ.

basic_t_time_no_millis

A basic formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, and time zone offset prefixed by T: ‘T’HHmmssZ.

basic_week_date or strict_basic_week_date

A basic formatter for a full date as four digit weekyear, two digit week of weekyear, and one digit day of week: xxxx’W’wwe.

basic_week_date_time or strict_basic_week_date_time

A basic formatter that combines a basic weekyear date and time, separated by a T: xxxx’W’wwe’T’HHmmss.SSSZ.

basic_week_date_time_no_millis or strict_basic_week_date_time_no_millis

A basic formatter that combines a basic weekyear date and time without millis, separated by a T: xxxx’W’wwe’T’HHmmssZ.

date or strict_date

A formatter for a full date as four digit year, two digit month of year, and two digit day of month: yyyy-MM-dd.

date_hour or strict_date_hour

A formatter that combines a full date and two digit hour of day: yyyy-MM-dd’T’HH.

date_hour_minute or strict_date_hour_minute

A formatter that combines a full date, two digit hour of day, and two digit minute of hour: yyyy-MM-dd’T’HH:mm.

date_hour_minute_second or strict_date_hour_minute_second

A formatter that combines a full date, two digit hour of day, two digit minute of hour, and two digit second of minute: yyyy-MM-dd’T’HH:mm:ss.

date_hour_minute_second_fraction or strict_date_hour_minute_second_fraction

A formatter that combines a full date, two digit hour of day, two digit minute of hour, two digit second of minute, and three digit fraction of second: yyyy-MM-dd’T’HH:mm:ss.SSS.

date_hour_minute_second_millis or strict_date_hour_minute_second_millis

A formatter that combines a full date, two digit hour of day, two digit minute of hour, two digit second of minute, and three digit fraction of second: yyyy-MM-dd’T’HH:mm:ss.SSS.

date_time or strict_date_time

A formatter that combines a full date and time, separated by a T: yyyy-MM-dd’T’HH:mm:ss.SSSZZ.

date_time_no_millis or strict_date_time_no_millis

A formatter that combines a full date and time without millis, separated by a T: yyyy-MM-dd’T’HH:mm:ssZZ.
hour or strict_hour
A formatter for a two digit hour of day: HH

hour_minute or strict_hour_minute

A formatter for a two digit hour of day and two digit minute of hour: HH:mm.

hour_minute_second or strict_hour_minute_second

A formatter for a two digit hour of day, two digit minute of hour, and two digit second of minute: HH:mm:ss.

hour_minute_second_fraction or strict_hour_minute_second_fraction

A formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, and three digit fraction of second: HH:mm:ss.SSS.

hour_minute_second_millis or strict_hour_minute_second_millis

A formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, and three digit fraction of second: HH:mm:ss.SSS.

ordinal_date or strict_ordinal_date

A formatter for a full ordinal date, using a four digit year and three digit dayOfYear: yyyy-DDD.

ordinal_date_time or strict_ordinal_date_time

A formatter for a full ordinal date and time, using a four digit year and three digit dayOfYear: yyyy-DDD’T’HH:mm:ss.SSSZZ.

ordinal_date_time_no_millis or strict_ordinal_date_time_no_millis

A formatter for a full ordinal date and time without millis, using a four digit year and three digit dayOfYear: yyyy-DDD’T’HH:mm:ssZZ.

time or strict_time

A formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, three digit fraction of second, and time zone offset: HH:mm:ss.SSSZZ.

time_no_millis or strict_time_no_millis

A formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, and time zone offset: HH:mm:ssZZ.

t_time or strict_t_time

A formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, three digit fraction of second, and time zone offset prefixed by T: ‘T’HH:mm:ss.SSSZZ.

t_time_no_millis or strict_t_time_no_millis

A formatter for a two digit hour of day, two digit minute of hour, two digit second of minute, and time zone offset prefixed by T: ‘T’HH:mm:ssZZ.

week_date or strict_week_date

A formatter for a full date as four digit weekyear, two digit week of weekyear, and one digit day of week: xxxx-‘W’ww-e.

week_date_time or strict_week_date_time

A formatter that combines a full weekyear date and time, separated by a T: xxxx-‘W’ww-e’T’HH:mm:ss.SSSZZ.

week_date_time_no_millis or strict_week_date_time_no_millis

A formatter that combines a full weekyear date and time without millis, separated by a T: xxxx-‘W’ww-e’T’HH:mm:ssZZ.

weekyear or strict_weekyear

A formatter for a four digit weekyear: xxxx.

weekyear_week or strict_weekyear_week

A formatter for a four digit weekyear and two digit week of weekyear: xxxx-‘W’ww.

weekyear_week_day or strict_weekyear_week_day

A formatter for a four digit weekyear, two digit week of weekyear, and one digit day of week: xxxx-‘W’ww-e.

year or strict_year

A formatter for a four digit year: yyyy.

year_month or strict_year_month

A formatter for a four digit year and two digit month of year: yyyy-MM.

year_month_day or strict_year_month_day

A formatter for a four digit year, two digit month of year, and two digit day of month: yyyy-MM-dd.

ignore_above

字符串长于ignore_above设置时，将不会被索引或者索引存储（可以被内容存储）

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "keyword",
          "ignore_above": 20 
        }
      }
    }
  }
}

PUT my_index/my_type/1 
{
  "message": "Syntax error"
}

PUT my_index/my_type/2 
{
  "message": "Syntax error with some long stacktrace"
}

GET _search 
{
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}

这个message属性将忽略任何长度超过20个字符的字符串.
第一个文档索引成功。第二个文档不会索引message字段，但是将会被记录。
search 返回所有的文档，但是只有第一个才会进行单词聚合。
lucene’s term byte-length 为32766
ignore_above的值是字符数据量，但是lucene是字节数据量，如果你使用UTF-8,你需要设置limit to 32766/3= 10922

ignore_malformed

有时你不想控制你接收到的数据形式。比如一个用户可以在login这个属性里发送date类型的数据，而另外一个在login属性发送一个email地址字符串。
当索引一个错误的数据类型到属性中（即建立属性的mapping时）默认情况会抛出异常，并被文档拒绝。当ignore_malformed参数设置为true时，允许这个异常被忽略。malformed的属性不会被索引，但是文档中的其他属性则正常存储和索引。
举例

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer",
          "ignore_malformed": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}

PUT my_index/my_type/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

索引为2的文档无法存储，但是1则正常存储，并且text属性可以被索引。

index-level default

index.mapping.ignore_malformed设置可以在索引级别上进行设置允许忽略所有types的malformed的内容。

PUT my_index
{
  "settings": {
    "index.mapping.ignore_malformed": true 
  },
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": { 
          "type": "byte"
        },
        "number_two": {
          "type": "integer",
          "ignore_malformed": false 
        }
      }
    }
  }
}

include_in_all

include_in_all参数提供了每个属性是否需要拷贝到对应_all属性中的设置，默认为true，除非index层级设置为no。
下面的例子展示如何将date属性不放入_all属性中

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": { 
          "type": "text"
        },
        "content": { 
          "type": "text"
        },
        "date": { 
          "type": "date",
          "include_in_all": false
        }
      }
    }
  }
}

include_in_all参数可以在type层级或者object和nested类型的属性都可以设置，他们的子属性将会继承设置。

PUT my_index
{
  "mappings": {
    "my_type": {
      "include_in_all": false, 
      "properties": {
        "title":          { "type": "text" },
        "author": {
          "include_in_all": true, 
          "properties": {
            "first_name": { "type": "text" },
            "last_name":  { "type": "text" }
          }
        },
        "editor": {
          "properties": {
            "first_name": { "type": "text" }, 
            "last_name":  { "type": "text", "include_in_all": true } 
          }
        }
      }
    }
  }
}

note multi-fields and include_in_all
The original field value is added to the _all field, not the terms produced by a field’s analyzer. For this reason, it makes no sense to set include_in_all to true on multi-fields, as each multi-field has exactly the same value as its parent.

index

index选项控制属性值是否可以被索引，它可以被设置成true或者false。属性不被索引的情况下不能被query。

index_options

控制索引内容用于查询和高亮显示，可以设置以下参数:
docs 仅索引文档编号，能够解决文档中单词存在在哪个属性。
freqs 索引文档编号和单词频率，单词频率可以使多次出现的单词比单个单词的评分高。
position 索引文档编号，单词频率以及单词位置，单词位置用于相关度或者短语查询proximity or phrase queries
offsets 索引文档编号，单词频率，位置，开始字符和结束字符的偏移量（单词在原始字符串中）。offsets主要用于posting highlighter的高亮显示posting highlighter

被分析的String属性值默认使用positions设置，其他属相使用docs作为默认设置。

put my_index
{
    "mapping":{
        "my_type":{
            "properties":{
                "text":{
                    "type":"text",
                "index_options":"offsets"
                }
            }
        }
    }
}

PUT my_index/my_type/1
{
  "text": "Quick brown fox"
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": "brown fox"
    }
  },
  "highlight": {
    "fields": {
      "text": {} 
    }
  }
}

The text field will use the postings highlighter by default because offsets are indexed. ↩

姓氏弓长张

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mapping parameters

来源urlanalyzersearch_quote_analyzerboostcoerceindex-level defaultcopy_todoc_valuesdynamicenabledfielddatafielddata is disabled on text fields by defaultbefore enabling fielddataenabling fie
复制链接

扫一扫