ELK高级搜索五之索引管理

yangyanping20108

已于 2022-05-30 22:56:01 修改

阅读量365

点赞数

分类专栏：搜索文章标签： java 后端

于 2022-04-20 11:38:34 首次发布

本文链接：https://blog.csdn.net/yangyanping20108/article/details/124293552

版权

搜索专栏收录该内容

10 篇文章 0 订阅

订阅专栏

创建索引

创建索引的语法

PUT /index
{
    "settings": { 
       "index":{
          "number_of_shards":"3",
          "number_of_replicas":"2"
        }
    },
    "mappings": {
       "dynamic":false,
       "properties" : {
            "field1" : { "type" : "text" },
            "field2" : {"type"  : "integer" }
        }
    },
    "aliases": {
    	"otherName": {}
  } 
}

创建一个新闻索引

创建一个分片数=3，副本数=2 ，别名=news 的新闻索引。

PUT  /article
{
    "settings":{
        "number_of_shards":3,
        "number_of_replicas":2
    },
    "mappings":{
        "dynamic":false,
        "properties":{
            "title":{
                "type":"text",
                "analyzer":"ik_smart",
                "search_analyzer":"ik_max_word"
            },
            "content":{
                "type":"text",
                "analyzer":"ik_smart",
                "search_analyzer":"ik_max_word"
            },
            "categoryName":{
                "type":"keyword"
            },
            "view_count":{
              "type": "integer"
            },
            "publishTime":{
                "type":"date",
                "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
            }
        }
    },
    "aliases":{
        "news":{

        }
    }
}

使用 kibana 插入4条数据

PUT /article/_doc/1
{
    "title":"芯片人才争夺",
    "content":"芯片人才争夺“生猛”,需要大量人才",
    "categoryName":"科技",
    "view_count" : 60,
    "publishTime":"2022-04-19 12:00:00"
}

PUT /article/_doc/2
{
    "title":"2021年我国数字阅读用户规模破5亿 人均电子阅读11.58本",
    "content":"2021年，我国数字阅读用户规模为5.06亿，相比2020年增长了2.49%；人均阅读量电子阅读11.58本，有声阅读7.08本。在首届全民阅读大会数字阅读分论坛暨第八届数字阅读年会上，中国音像与数字出版协会发布发布了《2021年度中国数字阅读报告》，展现了过去一年中国数字阅读行业发展情况与特点",
    "categoryName":"科技",
    "view_count" : 80,
    "publishTime":"2022-04-26 12:00:00"
}

PUT /article/_doc/3
{
    "title":"徙的鸟死于城市灯光，气象雷达如何拯救它们",
    "content":"城市灯光会吸引迁徙的鸟类，并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章，讲述了美国的鸟类学家如何利用气象雷达和建模，降低人造灯光影响下的鸟类死亡率",
    "categoryName":"科技",
    "view_count":70,
    "publishTime":"2022-04-26 12:00:00"
}

PUT /article/_doc/4
{
    "title":"季度动力电池装机量排行榜",
    "content":"在汽车电动化时代，中国领跑全球；而在汽车动力电池领域，中国的宁德时代继续领跑全球",
    "categoryName":"汽车",
    "view_count" : 50,
    "publishTime":"2022-04-26 12:00:00"
}

简单查询语法

GET /<target>/_search
GET /_search
POST /<target>/_search
POST /_search

根据ID查询

GET /article/_doc/1

-- 输出内容
{
  "_index" : "article",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "芯片人才争夺",
    "content" : "芯片人才争夺“生猛”,需要大量人才",
    "categoryName" : "科技",
    "view_count" : 60,
    "publishTime" : "2022-04-19 12:00:00"
  }
}

使用别名查询

GET /news/_doc/1

--输出内容
{
  "_index" : "article",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "芯片人才争夺",
    "content" : "芯片人才争夺“生猛”,需要大量人才",
    "categoryName" : "科技",
    "view_count" : 60,
    "publishTime" : "2022-04-19 12:00:00"
  }
}

无条件搜索所有

GET /news/_search

{
  "took" : 60,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "2021年我国数字阅读用户规模破5亿 人均电子阅读11.58本",
          "content" : "2021年，我国数字阅读用户规模为5.06亿，相比2020年增长了2.49%；人均阅读量电子阅读11.58本，有声阅读7.08本。在首届全民阅读大会数字阅读分论坛暨第八届数字阅读年会上，中国音像与数字出版协会发布发布了《2021年度中国数字阅读报告》，展现了过去一年中国数字阅读行业发展情况与特点",
          "categoryName" : "科技",
          "view_count" : 80,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "徙的鸟死于城市灯光，气象雷达如何拯救它们",
          "content" : "城市灯光会吸引迁徙的鸟类，并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章，讲述了美国的鸟类学家如何利用气象雷达和建模，降低人造灯光影响下的鸟类死亡率",
          "categoryName" : "科技",
          "view_count" : 70,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "title" : "季度动力电池装机量排行榜",
          "content" : "在汽车电动化时代，中国领跑全球；而在汽车动力电池领域，中国的宁德时代继续领跑全球",
          "categoryName" : "汽车",
          "view_count" : 50,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}

字段	描述
took	耗费了几毫秒
timed_out	是否超时，这里是没有
_shards	数据拆成3个分片，对于搜索请求，会打到所有的primary shard（或者是它的某个replica shard也可以），所以total和successful会是3；
hits	查询的所有结果
hits.total	查询结果的数量（多少个 document）
hits.max_score	score的含义就是document对于一个search的相关度的匹配分数，越相关、就越匹配，分数也越高；
hits.hits（hits里面包含了hits）	包含了匹配搜索的document的详细数据-----里面的hits包含的是和每个文档相关的数据，外面的hits有的数据是统计数据，如total等--------一般都有两个hits嵌套
_index	该文档所属的index
_type	该文档所属的type
_id	该文档的id
_source	具体的内容，即存储的json串

传参

与http请求传参类似

GET /news/_search?q=title:人才&sort=publishTime:desc
{
  "took" : 48,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        },
        "sort" : [
          1650369600000
        ]
      }
    ]
  }
}

分页查询

GET /news/_search?from=1&size=2
{
  "took" : 45,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "徙的鸟死于城市灯光，气象雷达如何拯救它们",
          "content" : "城市灯光会吸引迁徙的鸟类，并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章，讲述了美国的鸟类学家如何利用气象雷达和建模，降低人造灯光影响下的鸟类死亡率",
          "categoryName" : "科技",
          "view_count" : 70,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "title" : "季度动力电池装机量排行榜",
          "content" : "在汽车电动化时代，中国领跑全球；而在汽车动力电池领域，中国的宁德时代继续领跑全球",
          "categoryName" : "汽车",
          "view_count" : 50,
          "publishTime" : "2022-04-26 12:00:00"
        }
      }
    ]
  }
}

query DSL入门

query基本匹配查询关键字说明

关键字	说明
match_all	查询简单的匹配所有文档。在没有指定查询方式时，它是默认的查询
match	用于全文搜索或者精确查询，如果在一个精确值的字段上使用它，例如数字、日期、布尔或者一个 not_analyzed 字符串字段，那么它将会精确匹配给定的值
range	查询找出那些落在指定区间内的数字或者时间 gt 大于；gte 大于等于；lt 小于；lte 小于等于
term	被用于精确值匹配
terms	terms 查询和 term 查询一样，但它允许你指定多值进行匹配
exists	查找那些指定字段中有值的文档
missing	查找那些指定字段中无值的文档
must	多组合查询必须匹配这些条件才能被包含进来
must_not	多组合查询必须不匹配这些条件才能被包含进来
should	多组合查询如果满足这些语句中的任意语句，将增加 _score ，否则，无任何影响。它们主要用于修正每个文档的相关性得分
filter	多组合查询这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档

查询全部 GET /book/_search

POST localhost:9200/news/_search
{
    "query":{
        "match_all":{

        }
    }
}

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}

查询指定条件

GET /news/_search
{
  "query": {
    "match": {
      "title": "人才"
    }
  }
}

--输出
{
  "took" : 30,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}

排序查询

GET /news/_search
{
  "query": {
    "match": {
      "title": "人才"
    }
  },
  "sort": [
    {
      "view_count": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 2
}


--输出
{
  "took" : 45,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        },
        "sort" : [
          60
        ]
      }
    ]
  }
}

term不分词查询

value值部分会作为整体被查询, 不会被分词, 与match做区分, match的value是会被分词作匹配查询的

GET /news/_search
{
  "query": {
    "term": {
      "title": {
        "value": "芯片人才"
      }
    }
  }
}

--输出
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

match_phrase

//title中 "芯片", "人才" 会做精准匹配, 都全部含有这两个短语的文档才会被检索出来
GET /news/_search
{
  "query": {
    "match_phrase": {
      "title": "芯片人才"
    }
  }
}

bool多条件复合查询

bool查询的使用:
Bool查询对应Lucene中的BooleanQuery，它由一个或者多个子句组成，每个子句都有特定的类型。

must

用于全文搜索或者精确查询，如果在一个精确值的字段上使用它，例如数字、日期、布尔或者一个 not_analyzed 字符串字段，那么它将会精确匹配给定的值。返回的文档必须满足must子句的条件，并且参与计算分值

GET /news/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
          "title": "人才"
          }
        },
        {
          "match": {
            "content": "人才"
          }
        },{
          "range": {
            "view_count": {
              "gte": 50,
              "lte": 60
            }
          }
        }
      ]
    }
  }
}


-- 输出
{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.683245,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.683245,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}

filter

返回的文档必须满足filter子句的条件。但是不会像Must一样，参与计算分值

should

返回的文档可能满足should子句的条件。在一个Bool查询中，如果没有must或者filter，有一个或者多个should子句，那么只要满足一个就可以返回。minimum_should_match参数定义了至少满足几个子句, 默认情况是1

GET /news/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "人才"
          }
        },
        {
         "match": {
            "title": "城市"
          }
        }
      ]
    }
  }
}

---输出
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.9140557,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.9140557,
        "_source" : {
          "title" : "徙的鸟死于城市灯光，气象雷达如何拯救它们",
          "content" : "城市灯光会吸引迁徙的鸟类，并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章，讲述了美国的鸟类学家如何利用气象雷达和建模，降低人造灯光影响下的鸟类死亡率",
          "categoryName" : "科技",
          "view_count" : 70,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}

must_not

返回的文档必须不满足must_not定义的条件。
如果一个查询既有filter又有should，那么至少包含一个should子句。
bool查询也支持禁用协同计分选项disable_coord。一般计算分值的因素取决于所有的查询条件。
bool查询也是采用more_matches_is_better的机制，因此满足must和should子句的文档将会合并起来计算分值。

Query DSL语法

POST localhost:9200/news/_search
{
    "query":{
        "bool":{
            "must":{
                "match":{
                    "title":"人才"
                }
            }
        }
    }
}

{
    "took": 56,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.5753642,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}

简写形式：

POST  localhost:9200/news/_search
{
  "query":{
    "match":{
      "title":"人才"
    }
  }
}

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.5753642,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}

POST localhost:9200/news/_search
{
    "query":{
        "bool":{
            "must":{
                "multi_match":{
                    "query":"人才",
                    "fields":[
                        "title",
                        "content"
                    ]
                }
            }
        }
    }
}

{
    "took": 62,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.7911257,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.7911257,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}

fuzzy query

返回包含与搜索词类似的词的文档，该词由Levenshtein编辑距离度量

POST  localhost:9200/news/_search
{
  "query":{
    "bool":{
      "must":{
       "fuzzy":{
         "content":{"value":"心片"}
        }
      }
    }
  }
}

输出：
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

查询计划

POST localhost:9200/news/_validate/query?explain
{
  "query":{
    "bool":{
      "must":{
       "fuzzy":{
         "title":{"value":"芯片"}
        }
      }
    }
  }
}


输出：
{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": true,
    "explanations": [
        {
            "index": "article",
            "valid": true,
            "explanation": "+title:芯片~0"
        }
    ]
}

搜索与聚合结合，统计类别的数量

POST localhost:9200/news/_search
{
    "size":0,
    "query":{
        "match_all":{

        }
    },
    "aggs":{
        "popular_colors":{
            "terms":{
                "field":"categoryName"
            }
        }
    }
}

输出：
{
    "took": 65,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "popular_colors": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "科技",
                    "doc_count": 3
                },
                {
                    "key": "汽车",
                    "doc_count": 1
                }
            ]
        }
    }
}

GET localhost:9200/myindex/_search?q=is

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "myindex",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "this is my first template"
                }
            }
        ]
    }
}

定制自己的dynamic mapping template

PUT localhost:9200/myindex
{
    "mappings": {
            "dynamic_templates": [
                { 
                  "en": {
                      "match":              "*_en", 
                      "match_mapping_type": "string",
                      "mapping": {
                          "type":           "text",
                          "analyzer":       "english"
                      }
                }                  
            }
        ]
	}
}

插入数据

PUT localhost:9200/myindex/_doc/1
{
  "title":"this is my first template"
}


PUT localhost:9200/myindex/_doc/2
{
  "title_en":"this is my first template"
}

搜索停用词is

搜索关键词template

GET localhost:9200/myindex/_search?q=template
{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "myindex",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "this is my first template"
                }
            },
            {
                "_index": "myindex",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.2876821,
                "_source": {
                    "title_en": "this is my first template"
                }
            }
        ]
    }
}

参考：jianshu.com/p/50dbd7252d0a

yangyanping20108

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ELK高级搜索五之索引管理

创建索引创建索引的语法PUT /index{ "settings": { ... any settings ... }, "mappings": { "properties" : { "field1" : { "type" : "text" } } }, "aliases": { "otherName": {} } }创建一个新闻索引PUT localhost:9200/artic.
复制链接

扫一扫

专栏目录