ElasticSearch映射(三)

最新推荐文章于 2024-03-02 17:16:12 发布

666呀

最新推荐文章于 2024-03-02 17:16:12 发布

阅读量297

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/Suubyy/article/details/118339978

版权

elasticsearch 专栏收录该内容

39 篇文章 7 订阅

订阅专栏

ElasticSearch映射(三)

字段数据类型

每一个字段都有字段数据类型或者字段类型。这个指定了字段的数据种类，例如string、boolean等。例如，你可以索引string到text或者keyword类型的字段上。然而text字段值被分析用于全文检索，而keyword字符串则保留原样用于过滤和排序。

字段类型按系列分组，同一个系列的类型支持相同的查询函数，但是可能有不同的空间使用和性能特性。

目前只有keyword系列，这个系列包括keyword、constant_keyword、wildcard字段类型。其他的系列只有独立的字段类型。例如，boolean类型系列只包含boolean字段类型。

通用类型

binary：Base64编码的二进制字符串
boolean：值为true或者false
keywords：keyword系列，包含keyword、constant_keyword、wildcard
Numbers：数值类型，例如long、double，用于表示金额
Dates：日期类型，包含date、date_nanos
alias：为已存在的字段定义别名

对象和关系类型

object：JSON对象
flattened：压扁。整个JSON作为一个字段值
nested：嵌套。一个 JSON 对象，保留其子字段之间的关系。
join：为同一个索引中的文档定义父子关系

结构化数据类型

range：范围数据类型。例如long_range,double_range,date_range,ip_range
ip：IPV4或者IPV6地址
version：软件版本
murmur3：计算和存储值的哈希值

聚合数据类型

aggregate_metric_double：预聚合指标值
histogram：直方图形式的预聚合值

文本检索类型

text：经过分析过的非结构化数据
annotated-text：包含特殊标记的文本。用于识别命名实体。
completion：用于自动建议
search_as_you_type：
token_count：文本的被标记的数量

文档排名类型

聚合指标（`aggregate metric`）

metric aggregations是用来存储预聚合的数值。aggregate_metric_double字段是一个包含min, max, sum, 和value_count一个或者多个子字段的对象。当你在aggregate_metric_double字段上运行某些聚合的时候，这种聚合将使用这些子字段的值。例如，在aggregate_metric_double字段上运行min聚合将会返回所有min子字段中的最小值。

提示：aggregate_metric_double字段为每一个指标子字段存储一个单独的数字。数组不支持aggregate_metric_double。min，max,sum得到的值是double类型，value_count是long类型。

curl -X PUT "localhost:9200/my-index?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "my-agg-metric-field": {
        "type": "aggregate_metric_double",
        "metrics": [ "min", "max", "sum", "value_count" ],
        "default_metric": "max"
      }
    }
  }
}
'

`aggregate_metric_double`字段参数

metric：（必须，字符串数组），存储指标子字段的数组。其中的每一个值都对应着一个metric aggregation。有效值为min，max，sum和value_count。你至少要指定一个。
default_metric：（必须，字符串），在查询、脚本和聚合中没有使用指标字段的情况下采用默认的指标字段。必须是指标数组里的值。

使用

我们设计这个aggregate_metric_double这个字段是为了用于以下聚合：

min聚合返回所有min指标字段的最小值
max聚合返回所有max指标字段的最大值
sum：聚合返回所有sum指标的总和
value_count：返回所有value_count指标的总和
avg：sum的总和除以value_count总和

如果将aggregate_metric_double与其他聚合字段一起使用，该字段使用default_metric默认值，他的行为与double相同。default_metric默认值也可以使用在脚本或者以下查询中：

exists
range
term
terms

例子

根据以下创建索引API请求创建一个名字为agg_metric并且类型为 aggregate_metric_double的字段。这个请求设置为max为字段default_metric的默认值。

curl -X PUT "localhost:9200/stats-index?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "agg_metric": {
        "type": "aggregate_metric_double",
        "metrics": [ "min", "max", "sum", "value_count" ],
        "default_metric": "max"
      }
    }
  }
}
'

以下索引 API 请求在 agg_metric 字段中添加具有预聚合数据的文档。

curl -X PUT "localhost:9200/stats-index/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "agg_metric": {
    "min": -302.50,
    "max": 702.30,
    "sum": 200.0,
    "value_count": 25
  }
}
'
curl -X PUT "localhost:9200/stats-index/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "agg_metric": {
    "min": -93.00,
    "max": 1702.30,
    "sum": 300.00,
    "value_count": 25
  }
}
'

您可以在 agg_metric 字段上运行 min、max、sum、value_count 和 avg 聚合。

curl -X POST "localhost:9200/stats-index/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "metric_min": { "min": { "field": "agg_metric" } },
    "metric_max": { "max": { "field": "agg_metric" } },
    "metric_value_count": { "value_count": { "field": "agg_metric" } },
    "metric_sum": { "sum": { "field": "agg_metric" } },
    "metric_avg": { "avg": { "field": "agg_metric" } }
  }
}
'

聚合结果基于相关的度量子字段值。

{
...
  "aggregations": {
    "metric_min": {
      "value": -302.5
    },
    "metric_max": {
      "value": 1702.3
    },
    "metric_value_count": {
      "value": 50
    },
    "metric_sum": {
      "value": 500.0
    },
    "metric_avg": {
      "value": 10.0
    }
  }
}

对aggregate_metric_double 字段的查询使用default_metric 值。

curl -X GET "localhost:9200/stats-index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "agg_metric": {
        "value": 702.30
      }
    }
  }
}
'

搜索返回以下命中。 default_metric 字段的值 max 与查询值匹配。

{
  ...
    "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "stats-index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "agg_metric": {
            "min": -302.5,
            "max": 702.3,
            "sum": 200.0,
            "value_count": 25
          }
        }
      }
    ]
  }
}

别名（`alias`）

alias别名是为索引中的字段定义一个备用的名字。在查询请求中alias可以代替目标字段：

curl -X PUT "localhost:9200/trips?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "distance": {
        "type": "long"
      },
      "route_length_miles": {
        "type": "alias",
        "path": "distance" 
      },
      "transit_mode": {
        "type": "keyword"
      }
    }
  }
}
'
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range" : {
      "route_length_miles" : {
        "gte" : 39
      }
    }
  }
}
'

"path": "distance"：是目标字段的路径。需要注意的是这个路径必须是完整路径。

几乎所有请求查询的组件都支持alias别名。特别是，可以在查询、聚合排序、以及请求docvalue_field、stored_field，建议和高亮中使用alais。脚本也支持别名。

在查询请求的某些部分和请求字段功能时，可以提供字段通配符模式。这种情况下，除了其他字段外，通配符还可以匹配别名。

curl -X GET "localhost:9200/trips/_field_caps?fields=route_*,transit_mode&pretty"

别名的目标（`alias targets`）

别名目标有一些限制

目标必须是一个具体的字段，而不是对象或者其他字段的别名
目标必须在创建别名之前已存在
如果是嵌套对象，字段的别名必须与目标具有相同的嵌套范围

不支持的API

不支持写入字段别名：尝试在索引或更新请求中使用别名将导致失败。同样，别名不能用作 copy_to 或多字段的目标。

因为文档源中不存在别名，所以在执行源过滤时不能使用别名。例如，以下请求将返回 _source 的空结果：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query" : {
    "match_all": {}
  },
  "_source": "route_length_miles"
}
'

目前只有搜索和字段功能 API 将接受和解析字段别名。其他接受字段名称的 API，例如术语向量，不能与字段别名一起使用。

最后，一些查询，例如terms、geo_shape 和more_like_this，允许从索引文档中获取查询信息。由于在获取文档时不支持字段别名，因此指定查找路径的查询部分不能通过别名引用字段。

数组（`array`）

在ElasticSearch中，没有专用的array数据类型。默认情况下任何一个字段都可以包含0个或者更多个值，然而，在数组里的所有值必须具有相同的数据类型。

数组中的第一个字段决定数据的类型。

数组可能包含空值，这些值要么被配置的 null_value 替换，要么完全被跳过。空数组 [] 被视为缺失字段 — 没有值的字段。

在文档中使用数组不需要预先配置任何东西，它们是开箱即用的：

curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "message": "some arrays in this document...",
  "tags":  [ "elasticsearch", "wow" ], 
  "lists": [ 
    {
      "name": "prog_list",
      "description": "programming list"
    },
    {
      "name": "cool_list",
      "description": "cool stuff list"
    }
  ]
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "message": "no arrays in this document...",
  "tags":  "elasticsearch",
  "lists": {
    "name": "prog_list",
    "description": "programming list"
  }
}
'
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "tags": "elasticsearch" 
    }
  }
}
'

二进制（`binary`）

二进制数据类型接收类似于Base64编码的二进制字符串。这个字段默认情况下不会被存储与检索。

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "blob": {
        "type": "binary"
      }
    }
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Some binary blob",
  "blob": "U29tZSBiaW5hcnkgYmxvYg==" 
}
'

Boolean

布尔字段接受 JSON true 和 false 值，但也可以接受解释为 true 或 false 的字符串：

False values	`false`, `"false"`, `""` (empty string)
True values	`true`, `"true"`

日期

JSON中没有日期类型，所以ElasticSearch中的日期可以是：

格式化的日期字符串，类似于"2015-01-01"`` or ``"2015/01/01 12:10:30"
自纪元以来的毫秒数
自纪元以来的秒数

在内部，日期被转换为 UTC（如果指定了时区）并存储为一个表示自纪元以来的毫秒数的长数字。

对日期的查询在内部转换为对此长整型的范围查询，聚合和存储字段的结果根据与字段关联的日期格式转换回字符串。

提示：日期将始终呈现为字符串，即使它们最初在 JSON 文档中以 long 形式提供。

日期格式可以自定义，但是如果format没有被指定，则使用以下默认格式：

"strict_date_optional_time||epoch_millis"

这就意味着它将接收一个可选的日期时间戳，这些时间戳符合strict_date_optional_time 或milliseconds-since-the-epoch 支持的格式。

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "date": {
        "type": "date" 
      }
    }
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{ "date": "2015-01-01" }
'
curl -X PUT "localhost:9200/my-index-000001/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{ "date": "2015-01-01T12:10:30Z" }
'
curl -X PUT "localhost:9200/my-index-000001/_doc/3?pretty" -H 'Content-Type: application/json' -d'
{ "date": 1420070400001 }
'
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "sort": { "date": "asc"} 
}
'

多个日期格式

可以通过使用 || 分隔来指定多种格式作为分隔符。将依次尝试每种格式，直到找到匹配的格式。第一种格式将用于将自纪元以来的毫秒值转换回字符串。

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}
'

地理位置（`Geo-point`）

geo_point接收经纬度对，可以使用：

寻找边界框内、中心点的一定距离内、多边形内或geo_shape查询内的地理点。
按地理或距中心点的距离聚合文档
将距离整合到文档的相关性分数中
按距离对文档进行排序。

可以通过五种方式指定地理点，如下所示：

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Geo-point as an object",
  "location": { 
    "lat": 41.12,
    "lon": -71.34
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Geo-point as a string",
  "location": "41.12,-71.34" 
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/3?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Geo-point as a geohash",
  "location": "drm3btev3e86" 
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/4?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Geo-point as an array",
  "location": [ -71.34, 41.12 ] 
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/5?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Geo-point as a WKT POINT primitive",
  "location" : "POINT (-71.34 41.12)" 
}
'
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "geo_bounding_box": { 
      "location": {
        "top_left": {
          "lat": 42,
          "lon": -72
        },
        "bottom_right": {
          "lat": 40,
          "lon": -74
        }
      }
    }
  }
}
'

`geo-shape`

geo_shape 数据类型有助于对任意地理形状（例如矩形和多边形）进行索引和搜索。当被索引的数据或正在执行的查询包含除点以外的形状时，应该使用它。

geo_shape 映射将 geo_json 几何对象映射到 geo_shape 类型。要启用它，用户必须将字段显式映射到 geo_shape 类型。

`IP`

ip 字段可以索引/存储 IPv4 或 IPv6 地址。

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "ip_addr": {
        "type": "ip"
      }
    }
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "ip_addr": "192.168.1.1"
}
'
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}
'

查询 ip 地址最常用的方法是使用 CIDR 表示法：[ip_address]/[prefix_length]。例如：

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}
'

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "ip_addr": "2001:db8::/48"
    }
  }
}
'

`JOIN`

连接数据类型是一个特殊字段，它在同一索引的文档中创建父/子关系。关系部分在文档中定义了一组可能的关系，每个关系是一个父名和一个子名。父/子关系可以定义如下：

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "my_id": {
        "type": "keyword"
      },
      "my_join_field": { 
        "type": "join",
        "relations": {
          "question": "answer" 
        }
      }
    }
  }
}
'

要使用连接索引文档，必须在源中提供关系名称和文档的可选父级。例如，以下示例在问题上下文中创建两个父文档：

curl -X PUT "localhost:9200/my-index-000001/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "my_id": "1",
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/2?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "my_id": "2",
  "text": "This is another question",
  "my_join_field": {
    "name": "question"
  }
}
'

在索引父文档时，您可以选择仅将关系的名称指定为快捷方式，而不是将其封装在正常的对象表示法中：

curl -X PUT "localhost:9200/my-index-000001/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "my_id": "1",
  "text": "This is a question",
  "my_join_field": "question" 
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/2?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "my_id": "2",
  "text": "This is another question",
  "my_join_field": "question"
}
'

索引子项时，必须在 _source 中添加关系名称以及文档的父 ID。

提示：需要在同一分片中索引父文档的谱系，因此您必须始终使用子文档的更大父 ID 路由子文档。

例如，以下示例显示了如何索引两个子文档：

curl -X PUT "localhost:9200/my-index-000001/_doc/3?routing=1&refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "my_id": "3",
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/4?routing=1&refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "my_id": "4",
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "1"
  }
}
'

`keyword`

keyword：关键字，用于结构化内容，例如 ID、电子邮件地址、主机名、状态代码、邮政编码或标签。
constant_keyword：constant_keyword 用于始终包含相同值的关键字字段。
wildcard：非结构化机器生成内容的通配符。通配符类型针对具有大值或高基数的字段进行了优化。

关键字字段通常用于排序、聚合和术语级别的查询，例如术语。

`nested`嵌套

嵌套类型是对象数据类型的特殊版本，它允许对象数组以一种可以相互独立查询的方式进行索引。

对象数组如何扁平化

Elasticsearch 没有内部对象的概念。因此，它将对象层次结构扁平化为一个简单的字段名称和值列表。例如，考虑以下文档：

curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}
'

之前的文档将在内部转换为看起来更像这样的文档：

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

user.first 和 user.last 字段被扁平化为多值字段，alice 和 white 之间的关联丢失。此文档将错误地匹配 alice 和smith 的查询：

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}
'

使用我对象数组使用`nested`字段

如果需要索引对象数组并保持数组中每个对象的独立性，请使用嵌套数据类型而不是对象数据类型。

在内部，嵌套对象将数组中的每个对象作为单独的隐藏文档进行索引，这意味着可以使用嵌套查询独立于其他对象查询每个嵌套对象：

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "user": {
        "type": "nested" 
      }
    }
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}
'
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "Smith" }} 
          ]
        }
      }
    }
  }
}
'
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }} 
          ]
        }
      },
      "inner_hits": { 
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}
'

`Number`数值

支持以下数字类型：

`long`	A signed 64-bit integer with a minimum value of `-263` and a maximum value of `263-1`.
`integer`	A signed 32-bit integer with a minimum value of `-231` and a maximum value of `231-1`.
`short`	A signed 16-bit integer with a minimum value of `-32,768` and a maximum value of `32,767`.
`byte`	A signed 8-bit integer with a minimum value of `-128` and a maximum value of `127`.
`double`	A double-precision 64-bit IEEE 754 floating point number, restricted to finite values.
`float`	A single-precision 32-bit IEEE 754 floating point number, restricted to finite values.
`half_float`	A half-precision 16-bit IEEE 754 floating point number, restricted to finite values.
`scaled_float`	A floating point number that is backed by a `long`, scaled by a fixed `double` scaling factor.
`unsigned_long`	An unsigned 64-bit integer with a minimum value of 0 and a maximum value of `264-1`.

`Object`对象

JSON 文档本质上是分层的：文档可能包含内部对象，而内部对象又可能包含内部对象本身：

curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{ 
  "region": "US",
  "manager": { 
    "age":     30,
    "name": { 
      "first": "John",
      "last":  "Smith"
    }
  }
}
'

在内部，这个文档被索引为一个简单的、平面的键值对列表，如下所示：

{
  "region":             "US",
  "manager.age":        30,
  "manager.name.first": "John",
  "manager.name.last":  "Smith"
}

上述文档的显式映射可能如下所示：

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": { 
      "region": {
        "type": "keyword"
      },
      "manager": { 
        "properties": {
          "age":  { "type": "integer" },
          "name": { 
            "properties": {
              "first": { "type": "text" },
              "last":  { "type": "text" }
            }
          }
        }
      }
    }
  }
}
'

`Point`点数据

点数据类型有助于索引和搜索落在二维平面坐标系中的任意 x、y 对。

您可以使用形状查询使用此类型查询文档。

有四种方式可以指定一个点，如下所示：

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "location": {
        "type": "point"
      }
    }
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Point as an object",
  "location": { 
    "x": 41.12,
    "y": -71.34
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Point as a string",
  "location": "41.12,-71.34" 
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/4?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Point as an array",
  "location": [41.12, -71.34] 
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/5?pretty" -H 'Content-Type: application/json' -d'
{
  "text": "Point as a WKT POINT primitive",
  "location" : "POINT (41.12 -71.34)" 
}
'

`range`范围

范围字段类型表示介于上限和下限之间的连续值范围。例如，范围可以表示 10 月中的任何日期或 0 到 9 之间的任何整数。它们使用运算符 gt 或 gte 定义下限，使用 lt 或 lte 运算符定义上限。

它们可用于查询，并且对聚合的支持有限。唯一支持的聚合是histogram、cardinality。

支持以下范围类型：

`integer_range`	A range of signed 32-bit integers with a minimum value of `-231` and maximum of `231-1`.
`float_range`	A range of single-precision 32-bit IEEE 754 floating point values.
`long_range`	A range of signed 64-bit integers with a minimum value of `-263` and maximum of `263-1`.
`double_range`	A range of double-precision 64-bit IEEE 754 floating point values.
`date_range`	A range of `date` values. Date ranges support various date formats through the `format` mapping parameter. Regardless of the format used, date values are parsed into an unsigned 64-bit integer representing milliseconds since the Unix epoch in UTC. Values containing the `now` date math expression are not supported.
`ip_range`	A range of ip values supporting either IPv4 or IPv6 (or mixed) addresses.

下面是使用各种范围字段配置映射的示例，然后是索引多种范围类型的示例。

curl -X PUT "localhost:9200/range_index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "expected_attendees": {
        "type": "integer_range"
      },
      "time_frame": {
        "type": "date_range", 
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}
'
curl -X PUT "localhost:9200/range_index/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "expected_attendees" : { 
    "gte" : 10,
    "lt" : 20
  },
  "time_frame" : {
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-01"
  }
}
'

以下是对名为“expected_attendees”的 integer_range 字段进行术语查询的示例。 12 是范围内的一个值，所以它会匹配。

curl -X GET "localhost:9200/range_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query" : {
    "term" : {
      "expected_attendees" : {
        "value": 12
      }
    }
  }
}
'

上述查询产生的结果。

{
  "took": 13,
  "timed_out": false,
  "_shards" : {
    "total": 2,
    "successful": 2,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "range_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "expected_attendees" : {
            "gte" : 10, "lt" : 20
          },
          "time_frame" : {
            "gte" : "2015-10-31 12:00:00", "lte" : "2015-11-01"
          }
        }
      }
    ]
  }
}

以下是对名为“time_frame”的 date_range 字段进行 date_range 查询的示例。

curl -X GET "localhost:9200/range_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query" : {
    "range" : {
      "time_frame" : { 
        "gte" : "2015-10-31",
        "lte" : "2015-11-01",
        "relation" : "within" 
      }
    }
  }
}
'

此查询产生类似的结果：

{
  "took": 13,
  "timed_out": false,
  "_shards" : {
    "total": 2,
    "successful": 2,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "range_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "expected_attendees" : {
            "gte" : 10, "lt" : 20
          },
          "time_frame" : {
            "gte" : "2015-10-31 12:00:00", "lte" : "2015-11-01"
          }
        }
      }
    ]
  }
}

除了上述范围格式之外，还可以使用 CIDR 表示法提供 IP 范围：

curl -X PUT "localhost:9200/range_index/_mapping?pretty" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "ip_allowlist": {
      "type": "ip_range"
    }
  }
}
'
curl -X PUT "localhost:9200/range_index/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "ip_allowlist" : "192.168.0.0/16"
}
'

`shape`图形

形状数据类型有助于对任意 x、y 笛卡尔形状（例如矩形和多边形）进行索引和搜索。它可用于索引和查询坐标落在二维平面坐标系中的几何图形。

您可以使用形状查询使用此类型查询文档。

`text`文本

用于索引全文值的字段，例如电子邮件正文或产品描述。这些字段被分析，也就是说，它们在被索引之前通过分析器将字符串转换为单个术语的列表。分析过程允许 Elasticsearch 在每个全文字段中搜索单个单词。文本字段不用于排序，也很少用于聚合（尽管重要的文本聚合是一个明显的例外）。

文本字段最适合非结构化但人类可读的内容。如果您需要索引非结构化机器生成的内容，请参阅映射非结构化内容。如果您需要索引结构化内容，例如电子邮件地址、主机名、状态代码或标签，您可能应该使用关键字字段。

下面是一个文本字段的映射示例：

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "full_name": {
        "type":  "text"
      }
    }
  }
}
'

使用一个字段既作为`text`又作为`keyword`

有时同时拥有同一字段的全文（文本）和关键字（关键字）版本很有用：一个用于全文搜索，另一个用于聚合和排序。这可以通过多字段来实现。

`token count`

token_count字段类型实际是一个integer类型的字段，该字段用于接收字符串值，并分析他们，然后索引在字符串中标记的数量。

例如：

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { 
        "type": "text",
        "fields": {
          "length": { 
            "type":     "token_count",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}
'
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{ "name": "John Smith" }
'
curl -X PUT "localhost:9200/my-index-000001/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{ "name": "Rachel Alice Williams" }
'
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "name.length": 3 
    }
  }
}
'

666呀

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch映射(三)

文章目录ElasticSearch映射(三)字段数据类型通用类型对象和关系类型结构化数据类型文本检索类型文档排名类型聚合指标（`aggregate metric`）`aggregate_metric_double`字段参数使用例子别名（`alias`）别名的目标（`alias targets`）不支持的API数组（`array`）二进制（`binary`）日期多个日期格式地理位置（`Geo-point`）`geo-shape``IP``JOIN``keyword``nested`嵌套对象数组如何扁平化使用我
复制链接

扫一扫