新栋BOOK教你学elasticsearch（二）-字段属性

最新推荐文章于 2024-07-24 11:09:27 发布

新栋BOOK

最新推荐文章于 2024-07-24 11:09:27 发布

阅读量1k

点赞数

分类专栏：大数据文章标签： elasticsearch 大数据

本文链接：https://blog.csdn.net/wangxindong11/article/details/53899902

版权

大数据专栏收录该内容

5 篇文章 0 订阅

订阅专栏

我们要说elasticsearch的字段属性，先从下面这段映射描述说起。自从有了JSON这种数据格式以后，许多书籍和各种博客文章里面都一博客帖子用户数据说起，咱们也落俗一次。仍然是帖子，用户。


  1. {
  2.   "mappings":{
  3.     "post":{
  4.       "properties":{
  5.         "id":{"type":"long","store":"yes"},
  6.         "name":{"type":"string","store":"yes","index":"not_analyzed"},
  7.         "published":{"type":"date","store":"yes"},
  8.         "contents":{"type":"string","store":"no","index":"analyzed"}
  9.       }
  10.     },
  11.     "user":{
  12.       "properties":{
  13.         "id":{"type":"long","store":"yes"},
  14.         "name":{"type":"string","store":"yes","index":"not_analyzed"},
  15.         "age":{"type":"integer","store":"no"}
  16.       }
  17.     }
  18.   }
  19. }

一、核心类型

其中properties里面的，id、name、published、contents都是字段，就类似传统的SQL数据表中每个字段，有对应的类型。elasticsearch中包含以下5中核心数据类型

1、string 字符串

2、number 数字

数值类型又包括：byte、short、integer、long、float、double

3、date 日期

4、boolean 布尔型

5、binary 二进制

二、非核心类型

除了这些核心类型以外，还有下面几种类型

1、IP地址类型比较有用不用在费劲想ip数据到时候怎么检索了：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "ip_addr": {
          "type": "ip"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "ip_addr": "192.168.1.1"
}

GET my_index/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}

2、token_count类型，一般是配合多字段使用

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name": { 
          "type": "text",
          "fields": {
            "length": { 
              "type":     "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

PUT my_index/my_type/1
{ "name": "John Smith" }

PUT my_index/my_type/2
{ "name": "Rachel Alice Williams" }

GET my_index/_search
{
  "query": {
    "term": {
      "name.length": 3 
    }
  }
}

这里只能查出Rachel Alice Williams，因为只有这文档的name有3个词。

三、公共属性

elasticsearch除了数据类型，还有类型的属性，比如"id":{"type":"long","store":"yes"}这里的store叫属性，"name":{"type":"string","store":"yes","index":"not_analyzed"}这里的index也叫属性。

我们重点说其中的两个属性index和store，这两个属性也是elasticsearch的公共属性，也就是不论什么type的字段都具有的属性。

1、index：默认值analyzed

"name":{"type":"string","store":"yes","index":"analyzed"}

该字段编入索引，搜索的时候使用，普通的使用方式。

"name":{"type":"string","store":"yes","index":"not_analyzed"}

字段不经过分析就编入索引供搜索，搜索的过程中必须全匹配。 name=“wangxiaoming”那么必须搜 wangxiaoming才能匹配到。not_analyzed的用法，只有当type为string的时候才有这个设置。

"name":{"type":"string","store":"yes","index":"no"}

这种情况下无法搜索到该字段

2、store：默认值no

"age":{"type":"integer","store":"no"}

no表示不存储，但不是说你搜索的时候就搜不到。而是指从_source这个字段里面获取值。

"age":{"type":"integer","store":"yes"}

yes，表示在_source之外再存储一份

这里要说一下 _source 这是源文档，当你索引数据的时候， elasticsearch 会保存一份源文档到 _source ，如果文档的某一字段设置了 store 为 yes (默认为 no)，这时候会在 _source 存储之外再为这个字段独立进行存储，这么做的目的主要是针对内容比较多的字段，放到 _source 返回的话，因为_source 是把所有字段保存为一份文档，命中后读取需要一次 IO，包含内容特别多的字段会很占带宽影响性能，通常我们也不需要完整的内容返回(可能只关心摘要)，这时候就没必要放到 _source 里一起返回了(当然也可以在查询时指定返回字段)。

上面所说的字段属性，都是在日常开放过程中常用到的，其它的字段类型以及扩展标识符的了解可以参看elasticsearch的官方文档。

参考资料：

https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html