es 数据类型

最新推荐文章于 2024-06-17 00:15:00 发布

hy飞无

最新推荐文章于 2024-06-17 00:15:00 发布

阅读量6.5k

点赞数

分类专栏： java

本文链接：https://blog.csdn.net/hyhanyu/article/details/97898184

版权

java 专栏收录该内容

47 篇文章 0 订阅

订阅专栏

string: 字符串类型被分为了text和keyword 类型。
text 默认是分词的，不能进行聚合。

如果我们让es自动映射数据，那么es 会把字符串定义为text字段，并且还加了一个keyword字段。

插入数据：

PUT test_field/_doc/1

{

"name": "test1"

}

查看索引映射：

GET test_field/_mapping

返回结果：

{

"test_field" : {

"mappings" : {

"_doc" : {

"properties" : {

"name" : {

"type" : "text",

"fields" : {

"keyword" : {

"type" : "keyword",

"ignore_above" : 256

}

查看具体字段的属性

GET test_field/_mapping/field/name?include_defaults=true

{

"test_field" : {

"mappings" : {

"_doc" : {

"name" : {

"full_name" : "name",

"mapping" : {

"name" : {

"type" : "text",

"boost" : 1.0,

"index" : true,

"store" : false,

"doc_values" : false,

"term_vector" : "no",

"norms" : true,

"index_options" : "positions",

"eager_global_ordinals" : false,

"similarity" : "BM25",

"fields" : {

"keyword" : {

"type" : "keyword",

"boost" : 1.0,

"index" : true,

"store" : false,

"doc_values" : true,

"term_vector" : "no",

"norms" : false,

"index_options" : "docs",

"eager_global_ordinals" : false,

"similarity" : "BM25",

"null_value" : null,

"include_in_all" : true,

"ignore_above" : 256,

"normalizer" : null,

"split_queries_on_whitespace" : false

}

"analyzer" : "default",

"search_analyzer" : "default",

"search_quote_analyzer" : "default",

"include_in_all" : true,

"position_increment_gap" : -1,

"fielddata" : false

}

在结果中可以看出fielddata 是false，analyzer选择的就是默认的。如果希望text能够进行聚合需要把fielddata 设置成true。

Keyword：是不进行分词的，可以进行聚合。但是有长度限制，默认是长度限制是1000。如果希望创建不分词的字符串类型，要使用text字段把index属性改成false。

数字类型:

long, integer, short, byte, double, float, half_float, scaled_float

long, integer, short, byte，double, float, half_float 都有自己的大小范围，根据数据大小选择合适就行。

scaled_float： scaling_factor 这个是放大因数。

编码值时使用的缩放系数。值将在索引时间乘以此因子并四舍五入到最接近的长值。例如，scale_factor为10的scaled_float将在内部存储2.34为23，所有搜索时操作（查询，聚合，排序）的行为就像文档的值为2.3一样。 scale_factor的高值提高了准确性，但也增加了空间要求。此参数是必需的。

PUT my_index

{

"mappings": {

"_doc":{

"properties":{

"price":{

"type": "scaled_float",

"scaling_factor": 100

}

对29.567 *100 2956.7 进行四舍五入就是 2957

PUT my_index/_doc/1

{

"price": "29.567"

}

查询

GET my_index/_search

{

"query": {

"term": {

"price": "29.57"

}

返回结果

"hits" : [

{

"_index" : "my_index",

"_type" : "_doc",

"_id" : "1",

"_score" : 1.0,

"_source" : {

"price" : "29.567"

}

]

查询

GET my_index/_search

{

"query": {

"term": {

"price": "29.569"

}

返回结果

"hits" : [

{

"_index" : "my_index",

"_type" : "_doc",

"_id" : "1",

"_score" : 1.0,

"_source" : {

"price" : "29.567"

}

]

根据范围大小选择合适的范围类型，有利于节省空间。

日期类型：

Date：可以指定多个日期格式。

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"date": {

"type": "date",

"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"

}

布尔值类型:

Boolean

二进制类型：

Binary

注意：Base64编码的二进制值不得包含嵌入的换行符。默认情况下不被存储也不能被搜索到。

范围类型 （不知道使用场景）

integer_range, float_range, long_range, double_range, date_range

PUT range_index

{

"settings": {

"number_of_shards": 2

"mappings": {

"_doc": {

"properties": {

"expected_attendees": {

"type": "integer_range"

"time_frame": {

"type": "date_range",

"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"

}

PUT range_index/_doc/1?refresh

{

"expected_attendees" : {

"gte" : 10,

"lte" : 20

"time_frame" : {

"gte" : "2015-10-31 12:00:00",

"lte" : "2015-11-01"

}

GET range_index/_search

{

"query": {

"term": {

"expected_attendees": {

"value": "11"

}

返回结果：

"hits" : [

{

"_index" : "range_index",

"_type" : "_doc",

"_id" : "1",

"_score" : 1.0,

"_source" : {

"expected_attendees" : {

"gte" : 10,

"lte" : 20

"time_frame" : {

"gte" : "2015-10-31 12:00:00",

"lte" : "2015-11-01"

}

]

PUT range_index/_doc/2?refresh

{

"expected_attendees" : {

"gte" : 5,

"lte" : 10

"time_frame" : {

"gte" : "2015-10-31 12:00:00",

"lte" : "2015-11-01"

}

进行rang查询，range 字段支持一种关系查询relation 一个有3个属性WITHIN, CONTAINS, INTERSECTS (default)

INTERSECTS ：表示相交，只要在查询服务内有相交的值就能被查询出来。

GET range_index/_search

{

"query": {

"range": {

"expected_attendees": {

"gte": 5,

"lte": 10,

"relation": "INTERSECTS"

}

返回的结果：

"hits" : {

"total" : 2,

"max_score" : 1.0,

"hits" : [

{

"_index" : "range_index",

"_type" : "_doc",

"_id" : "1",

"_score" : 1.0,

"_source" : {

"expected_attendees" : {

"gte" : 10,

"lte" : 20

"time_frame" : {

"gte" : "2015-10-31 12:00:00",

"lte" : "2015-11-01"

}

{

"_index" : "range_index",

"_type" : "_doc",

"_id" : "2",

"_score" : 1.0,

"_source" : {

"expected_attendees" : {

"gte" : 5,

"lte" : 10

"time_frame" : {

"gte" : "2015-10-31 12:00:00",

"lte" : "2015-11-01"

}

]

WITHIN：表示在字段的范围在查询的范围内

GET range_index/_search

{

"query": {

"range": {

"expected_attendees": {

"gte": 5,

"lte": 10,

"relation": "WITHIN"

}

返回结果：

"hits" : [

{

"_index" : "range_index",

"_type" : "_doc",

"_id" : "2",

"_score" : 1.0,

"_source" : {

"expected_attendees" : {

"gte" : 5,

"lte" : 10

"time_frame" : {

"gte" : "2015-10-31 12:00:00",

"lte" : "2015-11-01"

}

]

CONTAINS：表示查询字段范围被字段范围所包含。

GET range_index/_search

{

"query": {

"range": {

"expected_attendees": {

"gte": 7,

"lte": 9,

"relation": "CONTAINS"

}

返回结果：

"hits" : [

{

"_index" : "range_index",

"_type" : "_doc",

"_id" : "2",

"_score" : 1.0,

"_source" : {

"expected_attendees" : {

"gte" : 5,

"lte" : 10

"time_frame" : {

"gte" : "2015-10-31 12:00:00",

"lte" : "2015-11-01"

}

]

坐标点类型：geo_point

属性：

ignore_malformed: 如果为true异常点会被忽略掉，如果为false异常点会抛出异常es会拒绝掉整条数据

ignore_z_value: 如果为true 当插入的数据超过2个点时，只对前面2个点进行索引，第3个点被忽略点，如果为false 超过2个点就会报错

null_value：指定一个空值用户被搜索，默认为null（这个表示和missing一样）。注意指定的空值并不能改变文档，只是改变的索引方式

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"location": {

"type": "geo_point"

}

官网的几种插入格式：

PUT my_index/_doc/1

{

"text": "Geo-point as an object",

"location": {

"lat": 41.12,

"lon": -71.34

}

PUT my_index/_doc/2

{

"text": "Geo-point as a string",

"location": "41.12,-71.34"

}

PUT my_index/_doc/3

{

"text": "Geo-point as a geohash",

"location": "drm3btev3e86"

}

PUT my_index/_doc/4

{

"text": "Geo-point as an array",

"location": [ -71.34, 41.12 ]

}

注意数组和字符串的顺序是相反的。数组时lon lat，字符串是lat lon

地理形状类型：Geo-Shape datatype 用于区域查询

Ip类型：支持ipv4和ipv6

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"ip_addr": {

"type": "ip"

}

PUT my_index/_doc/1

{

"ip_addr": "192.168.1.1"

}

复合类型

对象类型：为单个json对象。（因为存数组时，es会进行优化存储，查询会出问题，所以如果类型是数组请用嵌套类型）

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"region": {

"type": "keyword"

"manager": {

"properties": {

"age": { "type": "integer" },

"name": {

"properties": {

"first": { "type": "keyword" },

"last": { "type": "keyword" }

}

PUT my_index/_doc/1

{

"region" : "region1",

"manager": [{

"age" : 1,

"name" : {

"first" : "first1",

"last" : "last2"

}

{

"age" : 2,

"name" : {

"first" : "first2",

"last" : "last2"

}

]

}

GET my_index/_search

{

"query": {

"bool": {

"must": [

{

"term": {

"manager.age": {

"value": "1"

}

{

"term": {

"manager.name.first": {

"value": "first2"

}

]

}

查询的age =1 ，first = first2 被查询出来了。但是 agg=1，first 的值应该是first 但是数据被查询出来了。这是因为es对数据进行了优化

{{“age”: [1, 2],{name.first: [ ‘first1’, “first2”]}这样就相当于被数组遍历，只要数据中包含查询的条件就会返回。

嵌套类型：多个json对象。

数组：在es中，数组不需要宣称一个数据类型，任何字段都可以有0个或多个值，但是在一个数组中数据类型必须一样。

常见的数组几种格式：

an array of strings: [ "one", "two" ]
an array of integers: [ 1, 2 ]
an array of arrays: [ 1, [ 2, 3 ]] which is the equivalent of [ 1, 2, 3 ]
an array of objects: [ { "name": "Mary", "age": 12 }, { "name": "John", "age": 10 }]

注意点：对象数组不能像您期望的那样工作：您无法独立于数组中的其他对象查询每个对象。如果您需要能够执行此操作，则应使用嵌套数据类型而不是object数据类型。

多个字段：

es中一个字段可以有多个类型。比如一个字符串，可以用text类型进行全局搜索，一个keyword类型进行聚合。你也可以根据不同应用场景创建不同的字段使用不同的分词。

数据类型的别名：（只要6.0以后的版本才能创建）

创建字段别名为distance 创建别名route_length_miles

PUT trips

{

"mappings": {

"_doc": {

"properties": {

"distance": {

"type": "long"

"route_length_miles": {

"type": "alias",

"path": "distance"

"transit_mode": {

"type": "keyword"

}

插入2条数据

PUT trips/_doc/1

{

"distance": "40",

"transit_mode" :"1"

}

PUT trips/_doc/2

{

"distance": "28",

"transit_mode" :"2"

}

GET _search

{

"query": {

"range" : {

"route_length_miles" : {

"gte" : 39

}

返回结果

"hits" : [

{

"_index" : "trips",

"_type" : "_doc",

"_id" : "1",

"_score" : 1.0,

"_source" : {

"distance" : "40",

"transit_mode" : "1"

}

]

使用注意：基本上所有的查询都支持字段别名。

别名关联的目标字段：
这个目标必须是一个具体字段不能是一个对象或者另一个字段别名。

这个目标必须存在当别名被创建的时候，

当嵌套对被定义了，字段别名必须和嵌套对象有着一样的嵌套空间。

一个字段别名只能有一个目标字段。

目前不支持 copy_to 和 _source 返回字段。

token_count: token_count类型的字段实际上是一个整数字段，它接受字符串值，分析它们，然后索引字符串中的标记数。就是一个字符串被分成了多少term的总数。

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"name": {

"type": "text",

"fields": {

"length": {

"type": "token_count",

"analyzer": "standard"

}

PUT my_index/_doc/1

{ "name": "John Smith" }

PUT my_index/_doc/2

{ "name": "Rachel Alice Williams" }

GET my_index/_search

{

"query": {

"term": {

"name.length": 3

}

返回结果：

"hits" : [

{

"_index" : "my_index",

"_type" : "_doc",

"_id" : "2",

"_score" : 1.0,

"_source" : {

"name" : "Rachel Alice Williams"

}

]

percolator type:(目前不知道有什么用)：

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"query": {

"type": "percolator"

"field": {

"type": "keyword"

}

PUT my_index/_doc/match_value1

{

"query" : {

"match" : {

"field" : "value"

}

PUT my_index/_doc/2

{

"query" : {

"match" : {

"field" : "aa"

}

PUT my_index/_doc/3

{

"query" : {

"match" : {

"field" : "value aa"

}

GET my_index/_search

{

"query": {

"percolate": {

"field": "query",

"document": {

"field" : "value aa"

}

返回结果：

"hits" : [

{

"_index" : "my_index",

"_type" : "_doc",

"_id" : "3",

"_score" : 0.2876821,

"_source" : {

"query" : {

"match" : {

"field" : "value aa"

}

"fields" : {

"_percolator_document_slot" : [

]

}

]

目前的理解就是先写一个查询存储在query字段里，当用search 进行查询时，如果查询的值和刚开始插入的值匹配就能查询出来。目前不知道有什么用。

Join datatype: 这个类型可以创建一个父关联或子子关联。

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"my_join_field": {

"type": "join",

"relations": {

"question": "answer"

}

其中 question 代表父， answer 代表子。

创建question 类型

PUT my_index/_doc/1?refresh

{

"text": "This is a question",

"my_join_field": {

"name": "question"

}

PUT my_index/_doc/2?refresh

{

"text": "This is another question",

"my_join_field": {

"name": "question"

}

创建 answer类型，这里使用的routing，这里的routing必须为父的routing，只有这样才会在一个分片里，这样查询才会更快。

PUT my_index/_doc/3?routing=1&refresh

{

"text": "This is an answer",

"my_join_field": {

"name": "answer",

"parent": "1"

}

PUT my_index/_doc/4?routing=2&refresh

{

"text": "This is another answer",

"my_join_field": {

"name": "answer",

"parent": "1"

}

查询

GET my_index/_search

{

"query": {

"parent_id": {

"type": "answer",

"id": "1"

}

使用限制

一个索引只能有一个 join field.
2. 父子文档必须在同一个分片上编制索引。这意味着走获取删除更新子文档时需要提供相同的路由值。

3.一个元素可以有多个子元素，但是只有一个父元素。

4. 可以向现有的字段添加一个新的连接

5.也可以将子项添加到现有元素，但仅当元素已经是父元素时才可以。

全局序数：

连接字段使用全局序数来加速连接。在对碎片进行任何更改后，需要重建全局序数。父分区值存储在分片中的次数越多，重建联接字段的全局序数所需的时间就越长。

默认情况下，全局序数是急切建立的：如果索引已更改，则连接字段的全局序数将作为刷新的一部分重建。这可以为刷新增加大量时间。但是大多数情况下这是正确的权衡，否则在使用第一个父连接查询或聚合时会重建全局序数。这可能会为您的用户带来显着的延迟峰值，并且通常情况会更糟，因为当发生许多写入时，可能会在单个刷新间隔内尝试重建连接字段的多个全局序数。

多个子元素。

PUT my_index

{

"mappings": {

"_doc": {

"properties": {

"my_join_field": {

"type": "join",

"relations": {

"question": ["answer", "comment"],

"answer": "vote"

}

PUT my_index/_doc/1

{

"name" : "text1",

"my_join_field" : "question"

}

PUT my_index/_doc/2?routing=1

{

"name" : "answer1",

"my_join_field": {

"name" : "answer",

"parent": "1"

}

PUT my_index/_doc/3?routing=1

{

"name" : "comment1",

"my_join_field": {

"name" : "comment",

"parent": "1"

}

PUT my_index/_doc/4?routing=1

{

"name" : "vote1",

"my_join_field": {

"name" : "vote",

"parent": "2"

}

// 需要注意的，子元素的子元素也必须和祖元素在一个分片，所以 routing 必须是祖元素的routing

GET my_index/_search

{

"query": {

"has_parent": {

"parent_type": "answer",

"query": {

"match_all": {}

}

hy飞无

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
es 数据类型

string: 字符串类型被分为了text和keyword 类型。text 默认是分词的，不能进行聚合。如果我们让es自动映射数据，那么es 会把字符串定义为text字段，并且还加了一个keyword字段。插入数据：PUT test_field/_doc/1{"name": "test1"}查看索引映射：GET test_field/_mapping...
复制链接

扫一扫