ElasticSearch5.x实践_day05_03_Mapping_Meta-Fields

最新推荐文章于 2024-10-09 15:55:17 发布

weixin_33758863

最新推荐文章于 2024-10-09 15:55:17 发布

阅读量149

点赞数

文章标签：大数据 java python

原文链接：https://my.oschina.net/LucasZhu/blog/1491778

版权

2019独角兽企业重金招聘Python工程师标准>>>

二、Meta-Fields(元数据)

2.1 _all

_all字段是把其它字段拼接在一起的超级字段，所有的字段用空格分开，_all字段会被解析和索引，但是不存储。当你只想返回包含某个关键字的文档但是不明确地搜某个字段的时候就需要使用_all字段。
例子：

GET http://192.168.20.46:9200/my_index/blog/1 
{
  "title":    "Master Java",
  "content":     "learn java",
  "author": "Tom"
}

_all字段包含:[ “Master”, “Java”, “learn”, “Tom” ] 搜索：

POST http://192.168.20.46:9200/my_index/_search?pretty
{
  "query": {
    "match": {
      "_all": "Java"
    }
  }
}

使用copy_to自定义_all字段：

PUT http://192.168.20.46:9200/my_index
{
  "mappings": {
    "mytype": {
      "properties": {
        "title": {
          "type":    "text",
          "copy_to": "full_content" 
        },
        "content": {
          "type":    "text",
          "copy_to": "full_content" 
        },
        "full_content": {
          "type":    "text"
        }
      }
    }
  }
}

POST http://192.168.20.46:9200/my_index/mytype/1
{
  "title": "Master Java",
  "content": "learn Java"
}
POST http://192.168.20.46:9200/my_index/_search?pretty
{
  "query": {
    "match": {
      "full_content": "java"
    }
  }
}

2.2 _field_names

_field_names字段用来存储文档中的所有非空字段的名字，这个字段常用于exists查询。例子如下:

terms查询

POST http://192.168.20.46:9200/my_index/my_type/1
{
  "title": "This is a document"
}
POST http://192.168.20.46:9200/my_index/my_type/2
{
  "title": "This is another document",
  "body": "This document has a body"
}

POST http://192.168.20.46:9200/my_index/_search
{
  "query": {
    "terms": {
      "_field_names": [ "body" ] 
    }
  }
}

结果会返回第二条文档，因为第一条文档没有title字段。
同样，可以使用exists查询：

POST http://192.168.20.46:9200/my_index/_search
{
    "query": {
        "exists" : { "field" : "body" }
    }
}

2.3 _id

每条被索引的文档都有一个_type和_id字段，_id可以用于term查询、temrs查询、match查询、query_string查询、simple_query_string查询，但是不能用于聚合、脚本和排序。例子如下：

POST http://192.168.20.46:9200/my_index/my_type/1
{
  "text": "Document with ID 1"
}

POST http://192.168.20.46:9200/my_index/my_type/2
{
  "text": "Document with ID 2"
}

POST http://192.168.20.46:9200/my_index/_search
{
  "query": {
    "terms": {
      "_id": [ "1", "2" ] 
    }
  }
}

2.4 _index

多索引查询时，有时候只需要在特地索引名上进行查询，_index字段提供了便利，也就是说可以对索引名进行term查询、terms查询、聚合分析、使用脚本和排序。

_index是一个虚拟字段，不会真的加到Lucene索引中，对_index进行term、terms查询(也包括match、query_string、simple_query_string)，但是不支持prefix、wildcard、regexp和fuzzy查询。

举例，2个索引2条文档

http://192.168.20.46:9200/index_1/my_type/1
{
  "text": "Document in index 1"
}

http://192.168.20.46:9200/index_2/my_type/2
{
  "text": "Document in index 2"
}

POST http://192.168.20.46:9200/index_1,index_2/_search
{
  "query": {
    "terms": {
      "_index": ["index_1", "index_2"] 
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "_index", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_index": { 
        "order": "asc"
      }
    }
  ],
  "script_fields": {
    "index_name": {
      "script": {
        "lang": "painless",
        "inline": "doc['_index']" 
      }
    }
  }
}

{
    "took": 1210,
    "timed_out": false,
    "_shards": {
        "total": 10,
        "successful": 10,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": null,
        "hits": [
            {
                "_index": "index_1",
                "_type": "my_type",
                "_id": "2",
                "_score": null,
                "fields": {
                    "index_name": [
                        "index_1"
                    ]
                },
                "sort": [
                    "index_1"
                ]
            },
            {
                "_index": "index_1",
                "_type": "my_type",
                "_id": "1",
                "_score": null,
                "fields": {
                    "index_name": [
                        "index_1"
                    ]
                },
                "sort": [
                    "index_1"
                ]
            },
            {
                "_index": "index_2",
                "_type": "my_type",
                "_id": "2",
                "_score": null,
                "fields": {
                    "index_name": [
                        "index_2"
                    ]
                },
                "sort": [
                    "index_2"
                ]
            }
        ]
    },
    "aggregations": {
        "indices": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "index_1",
                    "doc_count": 2
                },
                {
                    "key": "index_2",
                    "doc_count": 1
                }
            ]
        }
    }
}

2.5 _parent

_parent用于指定同一索引中文档的父子关系。下面例子中现在mapping中指定文档的父子关系，然后索引父文档，索引子文档时指定父id，最后根据子文档查询父文档。

POST http://192.168.20.46:9200/my_index
{
	"mappings":{
		"my_parent":{},
		"my_child":{
			"_parent":{
				"type":{
					"type":"my_parent"
				}
			}
		}
	}
}
POST http://192.168.20.46:9200/my_index/my_parent/1
{
  "text": "This is a parent document"
}

POST http://192.168.20.46:9200/my_index/my_child/2?parent=1 
{
  "text": "This is a child document"
}

POST http://192.168.20.46:9200/my_index/my_child/3?parent=1&refresh=true 
{
  "text": "This is another child document"
}
POST http://192.168.20.46:9200/my_index/my_parent/_search
{
  "query": {
    "has_child": { 
      "type": "my_child",
      "query": {
        "match": {
          "text": "child document"
        }
      }
    }
  }
}

2.6 _routing

路由参数，ELasticsearch通过以下公式计算文档应该分到哪个分片上：

shard_num = hash(_routing) % num_primary_shards

默认的_routing值是文档的_id或者_parent，通过_routing参数可以设置自定义路由。例如，想把user1发布的博客存储到同一个分片上，索引时指定routing参数，查询时在指定路由上查询：

POST http://192.168.20.46:9200/my_index/my_type/1?routing=user1&refresh=true
{
  "title": "This is a document"
}
GET http://192.168.20.46:9200/my_index/my_type/1?routing=user1
==>
{
    "_index": "my_index",
    "_type": "my_type",
    "_id": "1",
    "_version": 2,
    "_routing": "user1",
    "found": true,
    "_source": {
        "title": "This is a document"
    }
}

POST http://192.168.20.46:9200/my_index/_search
{
  "query": {
    "terms": {
      "_routing": [ "user1" ] 
    }
  }
}==>
{
    "took": 33,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "my_index",
                "_type": "my_type",
                "_id": "1",
                "_score": 0.2876821,
                "_routing": "user1",
                "_source": {
                    "title": "This is a document"
                }
            }
        ]
    }
}

POST http://192.168.20.46:9200/my_index/_search?routing=user1,user2 
{
  "query": {
    "match": {
      "title": "document"
    }
  }
}

在Mapping中指定routing为必须的：

PUT http://192.168.20.46:9200/my_index2
{
  "mappings": {
    "my_type": {
      "_routing": {
        "required": true 
      }
    }
  }
}
POST http://192.168.20.46:9200/my_index2/my_type/1 
{
  "text": "No routing value provided  routing_missing_exception"
}

2.7 _source

存储的文档的原始值。默认_source字段是开启的，也可以关闭：

POST http://192.168.20.46:9200/tweets
{
  "mappings": {
    "tweet": {
      "_source": {
        "enabled": false
      }
    }
  }
}

但是一般情况下不要关闭，除法你不想做一些操作：

使用update、update_by_query、reindex
使用高亮
数据备份、改变mapping、升级索引
通过原始字段debug查询或者聚合

2.8 _type

每条被索引的文档都有一个_type和_id字段，可以根据_type进行查询、聚合、脚本和排序。例子如下：

POST http://192.168.20.46:9200/my_index/type_1/1
{
  "text": "Document with type 1"
}
POST http://192.168.20.46:9200/my_index/type_2/2?refresh=true
{
  "text": "Document with type 2"
}

POST http://192.168.20.46:9200/my_index/_search
{
  "query": {
    "terms": {
      "_type": [ "type_1", "type_2" ] 
    }
  },
  "aggs": {
    "types": {
      "terms": {
        "field": "_type", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_type": { 
        "order": "desc"
      }
    }
  ],
  "script_fields": {
    "type": {
      "script": {
        "lang": "painless",
        "inline": "doc['_type']" 
      }
    }
  }
}

2.9 _uid

_uid和_type和_index的组合。和_type一样，可用于查询、聚合、脚本和排序。例子如下：

POST http://192.168.20.46:9200/my_index/my_type/1
{
  "text": "Document with ID 1"
}
POST http://192.168.20.46:9200/my_index/my_type/2?refresh=true
{
  "text": "Document with ID 2"
}

POST http://192.168.20.46:9200/my_index/_search
{
  "query": {
    "terms": {
      "_uid": [ "my_type#1", "my_type#2" ] 
    }
  },
  "aggs": {
    "UIDs": {
      "terms": {
        "field": "_uid", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_uid": { 
        "order": "desc"
      }
    }
  ],
  "script_fields": {
    "UID": {
      "script": {
         "lang": "painless",
         "inline": "doc['_uid']" 
      }
    }
  }
}

转载于:https://my.oschina.net/LucasZhu/blog/1491778