ELK学习(六、)

最新推荐文章于 2023-08-30 21:11:25 发布

mengyeweiwu

最新推荐文章于 2023-08-30 21:11:25 发布

阅读量270

点赞数

文章标签： elasticsearch

本文链接：https://blog.csdn.net/mengyeweiwu/article/details/111019324

版权

1.es的copy_to参数
2.嵌套类型的创建
3.es分片的设置
4.match其它参数的应用
5.ik分词器安装以及介绍

1.copy_to参数

copy_to字段是将一个字段的值传递给另一个字段
创建copy_to类型(在设置字段类型(mapping)时，传入这一参数)

PUT s7
{
  "mappings": {
    "properties": {
      "t1":{
        "type": "text",
        "copy_to": "t3"
      },
      "t2":{
        "type": "text",
        "copy_to":"t3"
      },
      "t3":{
        "type": "text"
      }
    }
  }
}

在这里插入图片描述
2对copy_to参数功能进行验证
当没有给t3赋值时仍旧可以当作主查询进行索引

PUT s7/_doc/1
{
  "t1":"soul",
  "t2":"madam"
}

GET s7/_search
{
  "query": {
    "match": {
      "t3": "soul"
    }
  }
}

在这里插入图片描述

二、嵌套数据类型

当需要插入这种数据时如何创建映射
在这里插入图片描述
创建嵌套映射索引

PUT s12
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "age":{
        "type": "long"
      },
      "info":{
        "properties": {
          "addr":{
            "type":"text"
          },
          "phone":{
            "properties":{
              "iphone":{
                "type":"text"
              }
            }
          }
        }
      }
    }
  }
}

三、es分片设置

1.分片是什么
简单来讲就是咱们在ES中所有数据的文件块，也是数据的最小单元块。
当有大量的文档时，由于内存的限制、磁盘处理能力不足、无法足够快的响应客户端的请求等，一个节点可能不够。这种情况下，数据可以分为较小的分片。每个分片放到不同的服务器上。
当你查询的索引分布在多个分片上时，ES会把查询发送给每个相关的分片，并将结果组合在一起，而应用程序并不知道分片的存在。

2.副本是什么
当主分片丢失时，如：该分片所在的数据不可用时，集群将副本提升为新的主分片。作为一个安全措施

3.es设置分片和副本
number_of_shards 是指索引要做多少分片,只能在创建索引时指定，后期无法修改
number_of_replicas 是指每个分片要做多少个副本,后期可以动态修改

##设置多分片和副本
PUT s13
{
  "mappings": {
    "properties": {
      "t1":{
        "type": "text"
      }
    }
  },
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 5
  }
}

在这里插入图片描述

四、match其它参数的应用

1.match_phrase是短语查询的参数
如果使用match来查询一个词语的话,es会默认的将词语按照每一个字进行查询列如使用match查询中国这个词组，结果会返回所有只要带这两个字其中一个字的结果

##match_parse短语查询
PUT t1/_doc/1
{
  "title":"中国是我的国家"
}
PUT t1/_doc/2
{
  "title":"美国军事力量很强"
}
PUT t1/_doc/3
{
  "title":"中间有很多人"
}
GET t1/_search
{
  "query": {
    "match": {
      "title": "中国"
    }
  }
}

在这里插入图片描述
当使用match_parse参数则会返回带这个词组的结果‘

GET t1/_search
{
  "query": {
    "match_phrase": {
      "title": "中国"
    }
  }
}

在这里插入图片描述
2.Match还存在slop参数，当两个词语不连接在一起时，是不可以通过match_parse来查询的，但是可以通过slop参数来指定两个短语之间的间隔,es会根据间隔忽略这个间隔内的数据

GET t1/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "中国我国家",
        "slop": 2
      }
      }
    }
  }

3.左前缀查询
当搜索时很多情况下只能记住前半段列如拼写apple，只记住app三个字母应该怎么办呢，es提供了一个match_phrese_prefix参数，只需要记住查询数据的左边任意部分该参数会帮助你自动补齐

GET t2/_search
{
  "query": {
    "match_phrase_prefix": {
      "title": "app"
    }
  }
}

在这里插入图片描述

3.多字段查询
Multi_match:多字段查询可以完成match_phrase和match_phrase_prefix
当一个文档中存在字段时可以使用多字段查询，并且次字段下还定义了type类型用来完成match_phrase和match_phrase_prefix的操作

GET t2/_search
{
  "query": {
    "multi_match": {
      "query": "app",
      "fields": ["title","title1"]
      , "type": "phrase_prefix"
    }
  }
}

五、ik分词器安装以及介绍

我们在使用kilbana对中文进行分割时是按一个词一个词划分显然是不科学的
列如对一段话进行分词

GET _analyze
{
  "analyzer": "chinese",
  "text":"上海自来水来自中国"
}

结果为

{
  "tokens" : [
    {
      "token" : "上",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "海",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "自",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "来",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "水",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "来",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "自",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "中",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    }
  ]
}

}所以我们需要安装中文分词器来解决这个问题。

2.安装步骤

Ik分词器安装
https://github.com/medcl/elasticsearch-analysis-ik/releases
下载对应的版本

首先下载es对应版本的ik分词器的zip包，上传到es服务器上，在es的安装目录下有一个plugins的目录，在这个目录下创建一个叫ik的目录
然后将解压好的内容，拷贝到ik目录
将ik目录拷贝到其他的es节点
重新启动所有的es

3.验证ik分词器
Ik分词器测试
IK提供了两个分词算法ik_smart 和 ik_max_word
其中 ik_smart 为最少切分，ik_max_word为最细粒度划分
我们分别来试一下

3.1最少细分

GET _analyze
{
  "analyzer": "ik_smart",
  "text":"上海自来水来自中国"
}

结果为

{
  "tokens" : [
    {
      "token" : "上海",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "自来水",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "来自",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

3.2 ik_max_word为最细粒度划分
创建

GET _analyze
{
  "analyzer": "ik_max_word",
  "text":"上海自来水来自中国"
}

结果为

"tokens" : [
    {
      "token" : "上海",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "自来水",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "自来",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "水",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "来自",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "中国",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}