ElasticSearch5.x实践_day05_01_Mapping_Field datatype

一、Field datatype(字段数据类型)

1.1string类型

ELasticsearch 5.X之后的字段类型不再支持string,由text或keyword取代。 如果仍使用string,会给出警告。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type":  "string"
        }
      }
    }
  }
}
#! Deprecation: The [string] field is deprecated, please use [text] or [keyword] instead on [title]
{
  "acknowledged": true,
  "shards_acknowledged": true
}

1.2 text类型

text取代了string,当一个字段是要被全文搜索的,比如Email内容、产品描述,应该使用text类型。设置text类型以后,字段内容会被分析,在生成倒排索引以前,字符串会被分析器分成一个一个词项。text类型的字段不用于排序,很少用于聚合(termsAggregation除外)。

把full_name字段设为text类型的Mapping如下:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "full_name": {
          "type":  "text"
        }
      }
    }
  }
}

1.3 keyword类型

keyword类型适用于索引结构化的字段,比如email地址、主机名、状态码和标签。如果字段需要进行过滤(比如查找已发布博客中status属性为published的文章)、排序、聚合。keyword类型的字段只能通过精确值搜索到。

1.4 数字类型

对于数字类型,ELasticsearch支持以下几种:

类型取值范围
long-2^63至2^63-1
integer-2^31至2^31-1
short-32,768至32768
byte-128至127
double64位双精度IEEE 754浮点类型
float32位单精度IEEE 754浮点类型
half_float16位半精度IEEE 754浮点类型
scaled_float缩放类型的的浮点数(比如价格只需要精确到分,price为57.34的字段缩放因子为100,存起来就是5734)  scaled缩放 [skeɪld]

对于float、half_float和scaled_float,-0.0和+0.0是不同的值,使用term查询查找-0.0不会匹配+0.0,同样range查询中上边界是-0.0不会匹配+0.0,下边界是+0.0不会匹配-0.0。

对于数字类型的数据,选择以上数据类型的注意事项:

  1. 在满足需求的情况下,尽可能选择范围小的数据类型。比如,某个字段的取值最大值不会超过100,那么选择byte类型即可。迄今为止吉尼斯记录的人类的年龄的最大值为134岁,对于年龄字段,short足矣。字段的长度越短,索引和搜索的效率越高。
  2. 优先考虑使用带缩放因子的浮点类型。

例子:

POST my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_of_bytes": {
          "type": "integer"
        },
        "time_in_seconds": {
          "type": "float"
        },
        "price": {
          "type": "scaled_float",
          "scaling_factor": 100
        }
      }
    }
  }
}

1.5 Object类型

JSON天生具有层级关系,文档会包含嵌套的对象:

POST my_index/my_type/1
{ 
  "region": "US",
  "manager": { 
    "age":     30,
    "name": { 
      "first": "John",
      "last":  "Smith"
    }
  }
}

上面的文档中,整体是一个JSON,JSON中包含一个manager,manager又包含一个name。最终,文档会被索引成一平的key-value对:

{
  "region":             "US",
  "manager.age":        30,
  "manager.name.first": "John",
  "manager.name.last":  "Smith"
}

上面文档结构的Mapping如下:

PUT my_index
{
  "mappings": {
    "my_type": { 
      "properties": {
        "region": {
          "type": "keyword"
        },
        "manager": { 
          "properties": {
            "age":  { "type": "integer" },
            "name": { 
              "properties": {
                "first": { "type": "text" },
                "last":  { "type": "text" }
              }
            }
          }
        }
      }
    }
  }
}

1.6 date类型

JSON中没有日期类型,所以在ELasticsearch中,日期类型可以是以下几种:

  1. 日期格式的字符串:e.g. “2015-01-01” or “2015/01/01 12:10:30”.
  2. long类型的毫秒数( milliseconds-since-the-epoch)
  3. integer的秒数(seconds-since-the-epoch)

日期格式可以自定义,如果没有自定义,默认格式如下:

"strict_date_optional_time||epoch_millis"
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "date": {
          "type": "date" 
        }
      }
    }
  }
}

POST my_index/my_type/1
{ "date": "2015-01-01" } 

POST my_index/my_type/2
{ "date": "2015-01-01T12:10:30Z" } 

POST my_index/my_type/3
{ "date": 1420070400001 } 

POST my_index/_search
{
  "sort": { "date": "asc"} 
}

查看三个日期类型:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "date": "2015-01-01T12:10:30Z"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "date": "2015-01-01"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": 1,
        "_source": {
          "date": 1420070400001
        }
      }
    ]
  }
}

排序结果:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "date": "2015-01-01"
        },
        "sort": [
          1420070400000
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": null,
        "_source": {
          "date": 1420070400001
        },
        "sort": [
          1420070400001
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "date": "2015-01-01T12:10:30Z"
        },
        "sort": [
          1420114230000
        ]
      }
    ]
  }
}

1.7 Array类型

ELasticsearch没有专用的数组类型,默认情况下任何字段都可以包含一个或者多个值,但是一个数组中的值要是同一种类型。例如:

  1. 字符数组: [ “one”, “two” ]
  2. 整型数组:[1,3]
  3. 嵌套数组:[1,[2,3]],等价于[1,2,3]
  4. 对象数组:[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]

注意事项:

  • 动态添加数据时,数组的第一个值的类型决定整个数组的类型
  • 混合数组类型是不支持的,比如:[1,”abc”]
  • 数组可以包含null值,空数组[ ]会被当做missing field对待。

1.8 binary类型

binary类型接受base64编码的字符串,默认不存储也不可搜索。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name": {
          "type": "text"
        },
        "blob": {
          "type": "binary"
        }
      }
    }
  }
}

POST my_index/my_type/1
{
  "name": "Some binary blob",
  "blob": "U29tZSBiaW5hcnkgYmxvYg==" 
}

搜索blog字段:

POST my_index/_search
{
  "query": {
    "match": {
      "blob": "test" 
    }
  }
}

返回结果:
{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "Binary fields do not support searching",
        "index_uuid": "fgA7UM5XSS-56JO4F4fYug",
        "index": "my_index"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "3dQd1RRVTMiKdTckM68nPQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "Binary fields do not support searching",
          "index_uuid": "fgA7UM5XSS-56JO4F4fYug",
          "index": "my_index"
        }
      }
    ]
  },
  "status": 400
}

Base64加密、解码工具:http://www1.tc711.com/tool/BASE64.htm

1.9 ip类型

ip类型的字段用于存储IPV4或者IPV6的地址。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "ip_addr": {
          "type": "ip"
        }
      }
    }
  }
}

POST my_index/my_type/1
{
  "ip_addr": "192.168.1.1"
}

POST my_index/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0"
    }
  }
}

1.10 range类型

range类型支持以下几种:

类型范围
integer_range-2^31至2^31-1
float_range32-bit IEEE 754
long_range-2^63至2^63-1
double_range64-bit IEEE 754
date_range64位整数,毫秒计时

range类型的使用场景:比如前端的时间选择表单、年龄范围选择表单等。
例子:

PUT range_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "expected_attendees": {
          "type": "integer_range"
        },
        "time_frame": {
          "type": "date_range", 
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}

POST range_index/my_type/1
{
  "expected_attendees" : { 
    "gte" : 10,
    "lte" : 20
  },
  "time_frame" : { 
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-01"
  }
}

上面代码创建了一个range_index索引,expected_attendees的人数为10到20,时间是2015-10-31 12:00:00至2015-11-01。

查询:

"query" : {
    "range" : {

      "time_frame" : {
        "gte" : "2015-08-01",
        "lte" : "2015-12-01",
        "relation" : "within"
      }
    }
  }

POST range_index/_search
{
  "query" : {
    "range" : {
      "time_frame" : { 
        "gte" : "2015-08-01",
        "lte" : "2015-12-01",
        "relation" : "within" 
      }
    }
  }
}

查询结果:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "range_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "expected_attendees": {
            "gte": 10,
            "lte": 20
          },
          "time_frame": {
            "gte": "2015-10-31 12:00:00",
            "lte": "2015-11-01"
          }
        }
      }
    ]
  }
}

1.11 nested类型

nested嵌套类型是object中的一个特例,可以让array类型的Object独立索引和查询。 使用Object类型有时会出现问题,比如文档 my_index/my_type/1的结构如下:

POST my_index/my_type/1
{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

user字段会被动态添加为Object类型。
最后会被转换为以下平整的形式:

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

user.first和user.last会被平铺为多值字段,Alice和White之间的关联关系会消失。上面的文档会不正确的匹配以下查询(虽然能搜索到,实际上不存在Alice Smith):

POST my_index/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}

使用nested字段类型解决Object类型的不足:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "user": {
          "type": "nested" 
        }
      }
    }
  }
}

POST my_index/my_type/1
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

POST my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "Smith" }} 
          ]
        }
      }
    }
  }
}

POST my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }} 
          ]
        }
      },
      "inner_hits": { 
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}

1.12token_count类型

token_count用于统计词频:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name": { 
          "type": "text",
          "fields": {
            "length": { 
              "type":     "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

POST my_index/my_type/1
{ "name": "John Smith" }

POST my_index/my_type/2
{ "name": "Rachel Alice Williams" }

POST my_index/_search
{
  "query": {
    "term": {
      "name.length": 3 
    }
  }
}

1.13 geo point 类型

地理位置信息类型用于存储地理位置信息的经纬度:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "location": {
          "type": "geo_point"
        }
      }
    }
  }
}

POST my_index/my_type/1
{
  "text": "Geo-point as an object",
  "location": { 
    "lat": 41.12,
    "lon": -71.34
  }
}

POST my_index/my_type/2
{
  "text": "Geo-point as a string",
  "location": "41.12,-71.34" 
}

POST my_index/my_type/3
{
  "text": "Geo-point as a geohash",
  "location": "drm3btev3e86" 
}

POST my_index/my_type/4
{
  "text": "Geo-point as an array",
  "location": [ -71.34, 41.12 ] 
}

POST my_index/_search
{
  "query": {
    "geo_bounding_box": { 
      "location": {
        "top_left": {
          "lat": 42,
          "lon": -72
        },
        "bottom_right": {
          "lat": 40,
          "lon": -74
        }
      }
    }
  }
}

 

 

 

 

 

 

 

 

转载于:https://my.oschina.net/LucasZhu/blog/1491253

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值