Elasticsearch（五）---映射

30岁老阿姨

已于 2023-12-22 09:02:18 修改

阅读量298

点赞数

分类专栏： ElasticSearch 文章标签： elasticsearch 大数据搜索引擎

于 2023-11-01 09:54:56 首次发布

本文链接：https://blog.csdn.net/yaya_jn/article/details/134154497

版权

ElasticSearch 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

映射

# curl -XPUT node3:9200/books?pretty
{
  "acknowledged" : true
}

# curl node3:9200/books/_mapping?pretty
{
  "books" : {
    "mappings" : { }
  }
}

# curl -XPOST node3:9200/books/it/1?pretty -d '{
"id":1,
"publish_date":"2017-06-01",
"name":"master Elasticsearch"
}'
{
  "_index" : "books",
  "_type" : "it",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}


# curl node3:9200/books/_mapping?pretty
{
  "books" : {
    "mappings" : {
      "it" : {
        "properties" : {
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "string"
          },
          "publish_date" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          }
        }
      }
    }
  }
}

如果将ES当作主要的数据存储使用，并希望出现未知字段时抛出异常来提醒你注意这一问题，那么开启动态Mapping并不适用。在Mapping中可以通过dynamic设置来控制是否自动新增字段，接受以下参数：

true 默认值为true，自动添加字段
false 忽略新的字段
strict 严格模式，发现新的字段抛出异常

# curl -XDELETE node3:9200/books?pretty
{
  "acknowledged" : true
}

# curl -XPOST node3:9200/books?pretty -d '{
"mappings": {
"it":{
"dynamic":"strict",
"properties": {
"title":{
"type":"string"
},
"publish_date":{
"type":"date"
}
}
}
}
}'

# curl node3:9200/books/_mapping?pretty
{
  "books" : {
    "mappings" : {
      "it" : {
        "dynamic" : "strict",
        "properties" : {
          "publish_date" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "title" : {
            "type" : "string"
          }
        }
      }
    }
  }
}


# curl -XPOST node2:9200/books/it/1?pretty -d '{
"title":"master Elasticsearch",
"publish_date":"2017-06-01"
}'
{
  "_index" : "books",
  "_type" : "it",
  "_id" : "1",
  "_version" : 3,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}

# curl -XPOST node2:9200/books/it/2?pretty -d '{
"title":"master Elasticsearch"
}'
{
  "_index" : "books",
  "_type" : "it",
  "_id" : "2",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}

# curl -XPOST node2:9200/books/it/3?pretty -d '{
 "title":"master Elasticsearch",
 "publish_date":"2017-06-01",
 "author":"Tom"
}'
{
  "error" : {
    "root_cause" : [ {
      "type" : "strict_dynamic_mapping_exception",
      "reason" : "mapping set to strict, dynamic introduction of [author] within [it] is not allowed"
    } ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [author] within [it] is not allowed"
  },
  "status" : 400
}

当ES遇到一个新的字符串类型的字段的时候，它会检查这个字符串是否包含一个可识别的日期。如果看起来像日期，比如：2017-09-12，它会识别为一个date类型的字段，否则会将它作为string字符串添加。这样有一些问题。比如：

{"note":"2017-09-12"}

第一次识别为日期，但是如果下一条记录为：

{"note":"Logged out"}

就会导致一个异常。可以通过在根对象上将date_detection设置为false来关闭日期检测：

# curl -XPOST node2:9200/my_index?pretty
{
  "acknowledged" : true
}


# curl node2:9200/my_index/_mapping?pretty
{
  "my_index" : {
    "mappings" : { }
  }
}

# curl -XPOST node2:9200/my_index1?pretty -d'{
"mappings":{
 "my_type":{
   "date_detection":false
}
 }
}'
{
  "acknowledged" : true
}

# curl node2:9200/my_index1/_mapping?pretty
{
  "my_index1" : {
    "mappings" : {
      "my_type" : {
        "date_detection" : false
      }
    }
  }
}

静态映射

在创建索引时手工指定索引映射，类似于建表时在SQL中指定字段属性。

静态映射更详细、更精准。

# curl -XPOST node2:9200/my_index?pretty -d '{
  "mappings":{
    "user":{
      "_all":{"enabled":false},
      "properties":{
        "title":{"type":"string"},
        "name":{"type":"string"}, 
        "age":{"type":"integer"}
      }
    },
    "blogpost":{
      "_all":{"enabled":false}, 
      "properties":{
        "title":{"type":"string"},
        "body":{"type":"string"},
        "user_id":{"type":"string"},
        "created":{
          "type":"date",
          "format":"strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}'
{
  "acknowledged" : true
}

# curl node2:9200/my_index/_mapping?pretty
{
  "my_index" : {
    "mappings" : {
      "blogpost" : {
        "_all" : {
          "enabled" : false
        },
        "properties" : {
          "body" : {
            "type" : "string"
          },
          "created" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "title" : {
            "type" : "string"
          },
          "user_id" : {
            "type" : "string"
          }
        }
      },
      "user" : {
        "_all" : {
          "enabled" : false
        },
        "properties" : {
          "age" : {
            "type" : "integer"
          },
          "name" : {
            "type" : "string"
          },
          "title" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

字段的类型

JSON格式的数据	自动推测的字段类型
null	不添加字段
true或false	boolean类型
浮点类型数字	float
数字	long
JSON对象	object类型
数组	由数组中第一个非空值决定
string	可能是date类型（开启日期检测）、double或long类型、text类型、keyword类型

ES字段类型包括核心类型、复合类型、地理类型和特殊类型。

一级分类	二级分类	具体类型
核心类型	字符串类型	string、text、keyword
	数字类型	long、integer、short、byte、double、float、half_float、scaled_float
	日期类型	date
	布尔类型	boolean
	二进制类型	binary
	范围类型	range
复合类型	数组类型	array
	对象类型	object
	嵌套类型	nested
地理类型	地理坐标	geo_point
地理类型	地理图形	geo_shape
特殊类型	IP类型	ip
	范围类型	completion
	令牌计数类型	token_count
	附件类型	attachment
	抽取类型	percolator

A.string

ES 5.X之后字段类型不再支持，由text或keyword取代。

B.text

如果一个字段要被全文搜索，应该使用此类型。设置该类型后，字段内容会被分析，在生成倒排索引之前，字符串会被分词器分成一个一个词项。text类型字段不用于排序。

put my_index
{
  "mappings": {
    "my_type": {
  "properties": {
    "full_name": {
       "type": "text"
      }
    }
 }
}
}

C.keyword

适用于索引结构化的字段，如email地址、主机名、状态码和标签。通常用于过滤、排序、聚合。该类型的字段只能通过精确值搜索到，区别于text。

D.数字类型

类型	取值范围	类型	取值范围
long	-2^63到2^63-1	double	64位双精度IEEE 754浮点类型
integer	-2^31到2^31-1	float	32位单精度IEEE 754浮点类型
short	-32768到32767	half_float	16位单精度IEEE 754浮点类型
byte	-128到127	scaled_float	缩放类型的浮点数

处理浮点数时，优先考虑使用scaled_float类型。scaled_float通过缩放因子把浮点数变成long类型的。比如精确到分的价格，设置放大因子为100，存储的就是整数了。所有的API都会把价格当作浮点数，ES底层存储的是整数类型，因为压缩整数比压缩浮点数更加节省存储空间。

put my_index
{
  "mappings": {
  "my_type": {
  "properties": {
  "number_of_bytes": {"type": "integer"}
  "time_in_seconds": {"type": "float"}
  "price": {
  "type": "scaled_float",
  "scaling_factor": 100
}
}
}
}
}

E.date

ES中日期可以是以下几种形式：

格式化日期的字符串，如2015-01-01或2015/01/01 12:10:30

毫秒值，从1970年1月1日00:00:00开始算起秒，从1970年

默认"strict_date_optional_time||epoch_millis"

put my_index
{
  "mappings": {
  "my_type": {
  "properties": {
  "date": {
  "type": "date"
}
}
}
}
}

put my_index/my_type/1 {"date": "2015-01-01"}
put my_index/my_type/2 {"date": "2015-01-01T12:10:30Z"}
put my_index/my_type/3 {"date": 1420070400001}
上述三种时间都可以识别，ES内部存储的是毫秒计时的长整型数。

ES元字段

元字段分类	具体属性	作用
文档属性的元字段	_index	文档所属索引
	_uid	包含_type和_id的符合字段
	_type	文档的类型
	_id	文档id
原文档的元字段	_source	文档的原始JSON字符串
原文档的元字段	_size	_source字段的大小
索引的元字段	_all	包含索引全部字段的超级字段
索引的元字段	_field_names	文档中包含非空值的所有字段
路由的元字段	_parent	指定文档间的父子关系
路由的元字段	_routing	将文档路由到特定分片的自定义路由值
自定义元字段	_meta	用于自定义元数据

_index

_index支持对索引名进行term查询、terms查询、聚合分析、使用脚本和排序。不支持prefix、wildcard、regexp和fuzzy查询。

# curl -XPUT node3:9200/index_1/my_type/1?pretty -d '{
 "text":"Document in index 1"
}'
{
  "_index" : "index_1",
  "_type" : "my_type",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

# curl -XPUT node3:9200/index_2/my_type/2?refresh=true -d '{
"text":"Document in index 2"
}'
{"_index":"index_2","_type":"my_type","_id":"2","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}

# curl node3:9200/index_1,index_2/_search?pretty -d '{
   "query":{
     "terms":{"_index":["index_1", "index_2"]}
   },
   "aggs":{
     "indices":{
       "terms":{
       "field":"_index",
         "size":10
       }
     }
   },
   "sort":[
     {
       "_index":{
         "order":"asc"
       }
     }
   ]
}'
{
  "took" : 105,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [ {
      "_index" : "index_1",
      "_type" : "my_type",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "text" : "Document in index 1"
      },
      "sort" : [ "index_1" ]
    }, {
      "_index" : "index_2",
      "_type" : "my_type",
      "_id" : "2",
      "_score" : null,
      "_source" : {
        "text" : "Document in index 2"
      },
      "sort" : [ "index_2" ]
    } ]
  },
  "aggregations" : {
    "indices" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "index_1",
        "doc_count" : 1
      }, {
        "key" : "index_2",
        "doc_count" : 1
      } ]
    }
  }
}

_type

每条被索引的文档都有一个_type和_id字段，可以根据_type进行查询、聚合、脚本和排序。

# curl -XPUT node2:9200/my_index/type_1/1?pretty -d '{
 "text":"Document with type 1"
}'
{
  "_index" : "my_index",
  "_type" : "type_1",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}

# curl -XPUT node2:9200/my_index/type_2/2?pretty -d '{
"text":"Document with type 2"
}'
{
  "_index" : "my_index",
  "_type" : "type_2",
  "_id" : "2",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}


# curl node3:9200/my_index/_search?pretty -d '{
   "query":{
     "terms":{
       "_type":["type_1", "type_2"]
     }
   },
  "aggs":{
    "types":{
      "terms":{
        "field":"_type",
        "size":"10"
      }
    }
  },
  "sort":[
    {
      "_type":{
        "order":"desc"
      }
    }
  ]
}'
{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "type_2",
      "_id" : "2",
      "_score" : null,
      "_source" : {
        "text" : "Document with type 2"
      },
      "sort" : [ "type_2" ]
    }, {
      "_index" : "my_index",
      "_type" : "type_1",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "text" : "Document with type 1"
      },
      "sort" : [ "type_1" ]
    } ]
  },
  "aggregations" : {
    "types" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "type_1",
        "doc_count" : 1
      }, {
        "key" : "type_2",
        "doc_count" : 1
      } ]
    }
  }
}

_id

_id可以用于term查询、terms查询、match查询、query_string查询、simple_query_string查询，但是不能用于聚合、脚本和排序。

# curl node2:9200/my_index/_search?pretty -d '{
"query":{
"terms":{"_id":["1", "2"]}
}
}'
{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.35355338,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "type_2",
      "_id" : "2",
      "_score" : 0.35355338,
      "_source" : {
        "text" : "Document with type 2"
      }
    }, {
      "_index" : "my_index",
      "_type" : "type_1",
      "_id" : "1",
      "_score" : 0.35355338,
      "_source" : {
        "text" : "Document with type 1"
      }
    } ]
  }
}

30岁老阿姨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Elasticsearch（五）---映射

如果看起来像日期，比如：2017-09-12，它会识别为一个date类型的字段，否则会将它作为string字符串添加。_id可以用于term查询、terms查询、match查询、query_string查询、simple_query_string查询，但是不能用于聚合、脚本和排序。适用于索引结构化的字段，如email地址、主机名、状态码和标签。每条被索引的文档都有一个_type和_id字段，可以根据_type进行查询、聚合、脚本和排序。ES字段类型包括核心类型、复合类型、地理类型和特殊类型。
复制链接

扫一扫