mapping Meta-Fields

最新推荐文章于 2024-09-14 05:49:54 发布

姓氏弓长张

最新推荐文章于 2024-09-14 05:49:54 发布

阅读量481

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch

elasticsearch 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

_all field

介绍

_all 属性是特殊的将其他所有域的值组合成一个大的string的值，使用空格作为分隔符，可以被分词和索引但是不被单独存储。意味着它可以被用于查找条件但是不能直接获取到。
_all 域允许你去查询文档在你不知道文档每个域都含有哪些值的时候。当你不知道mapping就存储一个新的数据集的时候，这种查询比较有用。

PUT my_index/user/1 
{
  "first_name":    "John",
  "last_name":     "Smith",
  "date_of_birth": "1970-10-24"
}
GET my_index/_search
{
  "query": {
    "match": {
      "_all": "john smith 1970"
    }
  }
}

这里的_all域将包含如下词组: [ “john”, “smith”, “1970”, “10”, “24” ]
所有的值都会当成字符串对待，上面例子date_of_birth域被应射程date类型，索引的表述应该是1970-10-24 00:00:00 UTC。_all域将日期值当成3个字符词组索引:”1970”,”24”,”10”。
验证:日期主要是看存储时的原始值是什么，比如我在date_of_birth设置时使用毫秒数，则使用1970将不会被找到，而使用毫秒中的全部数字(全部数字作为一个词组所以无法使用部分)将能够找到。_all查找时的最小单位应该是词组(text类型会被拆成多个词组)
实验: 当你使用中文分词时，查询使用对应词组同样有效果,可以用于任意一个属性包含某个词组的情况。
注意:_all域是每个域的值的字符串组合在一起的，它不能将不同域的词组组合在一起合成新的词组，即它仅能够按照原始值中的词组进行查询。
_all域是一个text类型域，所以同样可以接受text的参数：包括analyzer，term_vectors,index_options,store.
_all域是有用的，特别是简单的过滤器浏览新的数据是。因为通过拼接个域的值成为一个大的字符串，_all域丢失了域之间的相互关联关系。当你需要依托于对象的相互关联去搜索时，最好使用单个域的特有的搜索。
_all域的使用并不是无消耗的，它需要额外的cpu周期以及使用更多的硬盘空间，如果非必要，_all可以完全禁用或者使用基于每个域的自定义设置是否进入_all中。

_all disable或调整包含属性

禁用_all
PUT my_index
{
  "mappings": {
    "type_1": { 
      "properties": {...}
    },
    "type_2": { 
      "_all": {
        "enabled": false
      },
      "properties": {...}
    }
  }
}
不将author放入_all中
"properties": {
        "title":          { "type": "text" },
        "author": {
          "include_in_all": false, 
          "properties": {
            "first_name": { "type": "text" },
            "last_name":  { "type": "text" }
          }
        }

在搜索中使用_all

使用query_string和simple_query_string查询时如果没有设置具体的域，则默认查询_all 域

GET _search
{
  "query": {
    "query_string": {
      "query": "john smith 1970"
    }
  }
}

同样的使用?q= 参数在URI search 请求中(起始是query_string查询的链接写法)

GET _search?q=john+smith+1970

其他的查询，比如使用match和term的查询，需要你显式的对_all域进行设置，参看第一个例子query:match
如果_all域被禁用（设置为disabled）那么URI search,query_string,simple_query_string 查询将不能被使用，你可以配置他们使用别的域进行查询，使用index.query.default_field设置

PUT my_index
{
  "mappings": {
    "my_type": {
      "_all": {
        "enabled": false 
      },
      "properties": {
        "content": {
          "type": "text"
        }
      }
    }
  },
  "settings": {
    "index.query.default_field": "content" 
  }
}

boost参数对_all查询的影响

每个域能够通过设置boost参数去增加索引匹配的优先度（或者得分情况），而_all域也将获得这些加成

PUT myindex
{
  "mappings": {
    "mytype": {
      "properties": {
        "title": { 
          "type": "text",
          "boost": 2
        },
        "content": { 
          "type": "text"
        }
      }
    }
  }
}

使用boost会影响查询性能和执行。通常更好的解决方法是单独查询域。

自定义类似_all域

每个索引只有一个_all域，copy_to参数允许创建多个类似与_all的自定义域。下面的例子将first_name 和last_name 组合在一起成为full_name

PUT myindex
{
  "mappings": {
    "mytype": {
      "properties": {
        "first_name": {
          "type":    "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type":    "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type":    "text"
        }
      }
    }
  }
}

PUT myindex/mytype/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET myindex/_search
{
  "query": {
    "match": {
      "full_name": "John Smith"
    }
  }
}

_field_names field

介绍

_field_names域索引了文档中包含的所有非null的每一个域的名字。这个域可被用于exists查询文档或者不含有任何非空值的特定属性。
_field_name属性允许在查询中设置。

PUT my_index/my_type/1
{
  "title": "This is a document"
}

PUT my_index/my_type/2?refresh=true
{
  "title": "This is another document",
  "body": "This document has a body"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_field_names": [ "title" ] 
    }
  }
}

注意：我猜测可能是nested数据类型是分离与文档保存关联关系的，因此在上述查询中无法查询nested类型的属性，即查询该属性则无返回结果，但是可以正常查询object类型。上面查询的意思为查询title域存在的文档。

_id field

介绍

每个文档由_type和_id联合进行索引的。_id属性的值可以从_uid属性中自动抽取但是不用做索引。
_id属性值可以在明确的查询（term,terms,match,query_string,simple_query_string）中使用，但是不能被聚合，和脚本中使用，也不能排序。需要的话可以使用_uid进行替代进行操作。

GET my_index/_search
{
  "query": {
    "terms": {
      "_id": [ "1", "2" ] 
    }
  }
}

上述操作也可以使用 ids query进行查询。

_index field

介绍

当运行查询跨多个索引时，有时候希望能够增加联合制定索引内的文档到查询条件中。_index属性允许匹配已经被索引文档的索引。它的值允许在term,terms查询以及聚合，脚本和排序中使用。
_index作为虚拟field被使用，它没有作为真实属性存入lucene。这意味着你可以使用_index在term或者terms查询（或者任意可以被重写成term查询的查询，比如match,query_string 或者simple_query_string）,但它不支持prefix,wildcard,regexp 或者fuzzy查询。

GET index_1,index_2/_search
{
  "query": {
    "terms": {
      "_index": ["index_1", "index_2"] 
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "_index", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_index": { 
        "order": "asc"
      }
    }
  ],
  "script_fields": {
    "index_name": {
      "script": {
        "lang": "painless",
        "inline": "doc['_index']" 
      }
    }
  }
}

对查询结果通过_index进行聚合，排序，并返回index_name通过文档中的_index

_meta field

介绍

每个type可以含有自定义的元数据组合，他们不被elasticsearch使用，但是可以预约存储系统配置的元数据，比如文档属于的类是什么。

PUT my_index
{
  "mappings": {
    "user": {
      "_meta": { 
        "class": "MyApp::User",
        "version": {
          "min": "1.0",
          "max": "1.3"
        }
      }
    }
  }
}

_meta信息可以通过get mapping的api获取，也可以在已存在的情况下使用put mapping api进行更新。

_parent field

介绍

父子关系可以在相同index中的不同type的documents中创建。

PUT my_index
{
  "mappings": {
    "my_parent": {},
    "my_child": {
      "_parent": {
        "type": "my_parent" 
      }
    }
  }
}

PUT my_index/my_parent/1 
{
  "text": "This is a parent document"
}

PUT my_index/my_child/2?parent=1 
{
  "text": "This is a child document"
}

PUT my_index/my_child/3?parent=1&refresh=true 
{
  "text": "This is another child document"
}

GET my_index/my_parent/_search
{
  "query": {
    "has_child": { 
      "type": "my_child",
      "query": {
        "match": {
          "text": "child document"
        }
      }
    }
  }
}

先创建父子type，然后在创建document时在url上表明parent的文档id。
关于has_child has_parent,以及children聚合的信息，参考dsl和聚合的章节。
_parent属性可以用于查询聚合以及脚本内使用。

parent-child的限制

父子的type必须是不同的，父子关系不能在相同的type中的文档中建立
_parent.type设置智能指向不存在的type（应该是不含有文档），意味着被创建含有文档的type不能变成父type
父子文档必须在同一个分片被索引。父文档id需要子文档的路由值，保证子文档和父文档在同一个分片上。这意味着当子文档 getting，deleteing和updating时需要提供parent的值（好麻烦）

Global ordinals

Parent-child uses global ordinals to speed up joins. Global ordinals need to be rebuilt after any change to a shard. The more parent id values are stored in a shard, the longer it takes to rebuild the global ordinals for the _parent field.

Global ordinals, by default, are built lazily: the first parent-child query or aggregation after a refresh will trigger building of global ordinals. This can introduce a significant latency spike for your users. You can use eager_global_ordinals to shift the cost of building global ordinals from query time to refresh time, by mapping the _parent field as follows:

PUT my_index
{
  "mappings": {
    "my_parent": {},
    "my_child": {
      "_parent": {
        "type": "my_parent",
        "eager_global_ordinals": true
      }
    }
  }
}

The amount of heap used by global ordinals can be checked as follows: