ELK高级搜索五之索引管理

创建索引


创建索引的语法

PUT /index
{
    "settings": { 
       "index":{
          "number_of_shards":"3",
          "number_of_replicas":"2"
        }
    },
    "mappings": {
       "dynamic":false,
       "properties" : {
            "field1" : { "type" : "text" },
            "field2" : {"type"  : "integer" }
        }
    },
    "aliases": {
    	"otherName": {}
  } 
}

创建一个新闻索引

创建一个分片数=3,副本数=2 ,别名=news 的新闻索引。

PUT  /article
{
    "settings":{
        "number_of_shards":3,
        "number_of_replicas":2
    },
    "mappings":{
        "dynamic":false,
        "properties":{
            "title":{
                "type":"text",
                "analyzer":"ik_smart",
                "search_analyzer":"ik_max_word"
            },
            "content":{
                "type":"text",
                "analyzer":"ik_smart",
                "search_analyzer":"ik_max_word"
            },
            "categoryName":{
                "type":"keyword"
            },
            "view_count":{
              "type": "integer"
            },
            "publishTime":{
                "type":"date",
                "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
            }
        }
    },
    "aliases":{
        "news":{

        }
    }
}

 使用 kibana 插入4条数据

PUT /article/_doc/1
{
    "title":"芯片人才争夺",
    "content":"芯片人才争夺“生猛”,需要大量人才",
    "categoryName":"科技",
    "view_count" : 60,
    "publishTime":"2022-04-19 12:00:00"
}
PUT /article/_doc/2
{
    "title":"2021年我国数字阅读用户规模破5亿 人均电子阅读11.58本",
    "content":"2021年,我国数字阅读用户规模为5.06亿,相比2020年增长了2.49%;人均阅读量电子阅读11.58本,有声阅读7.08本。在首届全民阅读大会数字阅读分论坛暨第八届数字阅读年会上,中国音像与数字出版协会发布发布了《2021年度中国数字阅读报告》,展现了过去一年中国数字阅读行业发展情况与特点",
    "categoryName":"科技",
    "view_count" : 80,
    "publishTime":"2022-04-26 12:00:00"
}
PUT /article/_doc/3
{
    "title":"徙的鸟死于城市灯光,气象雷达如何拯救它们",
    "content":"城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
    "categoryName":"科技",
    "view_count":70,
    "publishTime":"2022-04-26 12:00:00"
}
PUT /article/_doc/4
{
    "title":"季度动力电池装机量排行榜",
    "content":"在汽车电动化时代,中国领跑全球;而在汽车动力电池领域,中国的宁德时代继续领跑全球",
    "categoryName":"汽车",
    "view_count" : 50,
    "publishTime":"2022-04-26 12:00:00"
}

简单查询语法

  • GET /<target>/_search

  • GET /_search

  • POST /<target>/_search

  • POST /_search

 根据ID查询

GET /article/_doc/1

-- 输出内容
{
  "_index" : "article",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "芯片人才争夺",
    "content" : "芯片人才争夺“生猛”,需要大量人才",
    "categoryName" : "科技",
    "view_count" : 60,
    "publishTime" : "2022-04-19 12:00:00"
  }
}

使用别名查询

GET /news/_doc/1

--输出内容
{
  "_index" : "article",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "芯片人才争夺",
    "content" : "芯片人才争夺“生猛”,需要大量人才",
    "categoryName" : "科技",
    "view_count" : 60,
    "publishTime" : "2022-04-19 12:00:00"
  }
}

无条件搜索所有

GET /news/_search

{
  "took" : 60,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "2021年我国数字阅读用户规模破5亿 人均电子阅读11.58本",
          "content" : "2021年,我国数字阅读用户规模为5.06亿,相比2020年增长了2.49%;人均阅读量电子阅读11.58本,有声阅读7.08本。在首届全民阅读大会数字阅读分论坛暨第八届数字阅读年会上,中国音像与数字出版协会发布发布了《2021年度中国数字阅读报告》,展现了过去一年中国数字阅读行业发展情况与特点",
          "categoryName" : "科技",
          "view_count" : 80,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "徙的鸟死于城市灯光,气象雷达如何拯救它们",
          "content" : "城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
          "categoryName" : "科技",
          "view_count" : 70,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "title" : "季度动力电池装机量排行榜",
          "content" : "在汽车电动化时代,中国领跑全球;而在汽车动力电池领域,中国的宁德时代继续领跑全球",
          "categoryName" : "汽车",
          "view_count" : 50,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}
字段描述
took耗费了几毫秒
timed_out是否超时,这里是没有
_shards数据拆成3个分片,对于搜索请求,会打到所有的primary shard(或者是它的某个replica shard也可以),所以total和successful会是3;
hits查询的所有结果
hits.total查询结果的数量(多少个 document)
hits.max_scorescore的含义就是document对于一个search的相关度的匹配分数,越相关、就越匹配,分数也越高;
hits.hits(hits里面包含了hits)包含了匹配搜索的document的详细数据-----里面的hits包含的是和每个文档相关的数据,外面的hits有的数据是统计数据,如total等--------一般都有两个hits嵌套
_index该文档所属的index
_type该文档所属的type
_id该文档的id
_source具体的内容,即存储的json串

传参

与http请求传参类似

GET /news/_search?q=title:人才&sort=publishTime:desc
{
  "took" : 48,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        },
        "sort" : [
          1650369600000
        ]
      }
    ]
  }
}

 分页查询

GET /news/_search?from=1&size=2
{
  "took" : 45,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "徙的鸟死于城市灯光,气象雷达如何拯救它们",
          "content" : "城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
          "categoryName" : "科技",
          "view_count" : 70,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "title" : "季度动力电池装机量排行榜",
          "content" : "在汽车电动化时代,中国领跑全球;而在汽车动力电池领域,中国的宁德时代继续领跑全球",
          "categoryName" : "汽车",
          "view_count" : 50,
          "publishTime" : "2022-04-26 12:00:00"
        }
      }
    ]
  }
}

query DSL入门

query基本匹配查询关键字说明

关键字说明
 match_all查询简单的 匹配所有文档。在没有指定查询方式时,它是默认的查询
match用于全文搜索或者精确查询,如果在一个精确值的字段上使用它, 例如数字、日期、布尔或者一个 not_analyzed 字符串字段,那么它将会精确匹配给定的值
range查询找出那些落在指定区间内的数字或者时间 gt 大于;gte 大于等于;lt 小于;lte 小于等于
term被用于精确值 匹配
termsterms 查询和 term 查询一样,但它允许你指定多值进行匹配
exists查找那些指定字段中有值的文档
missing查找那些指定字段中无值的文档
must多组合查询 必须匹配这些条件才能被包含进来
must_not多组合查询 必须不匹配这些条件才能被包含进来
should多组合查询 如果满足这些语句中的任意语句,将增加 _score ,否则,无任何影响。它们主要用于修正每个文档的相关性得分
filter多组合查询 这些语句对评分没有贡献,只是根据过滤标准来排除或包含文档

查询全部 GET /book/_search

POST localhost:9200/news/_search
{
    "query":{
        "match_all":{

        }
    }
}

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}

查询指定条件

GET /news/_search
{
  "query": {
    "match": {
      "title": "人才"
    }
  }
}

--输出
{
  "took" : 30,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}

排序查询

GET /news/_search
{
  "query": {
    "match": {
      "title": "人才"
    }
  },
  "sort": [
    {
      "view_count": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 2
}


--输出
{
  "took" : 45,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        },
        "sort" : [
          60
        ]
      }
    ]
  }
}

term不分词查询

value值部分会作为整体被查询, 不会被分词, 与match做区分, match的value是会被分词作匹配查询的

GET /news/_search
{
  "query": {
    "term": {
      "title": {
        "value": "芯片人才"
      }
    }
  }
}

--输出
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

match_phrase

//title中 "芯片", "人才" 会做精准匹配, 都全部含有这两个短语的文档才会被检索出来
GET /news/_search
{
  "query": {
    "match_phrase": {
      "title": "芯片人才"
    }
  }
}

bool多条件复合查询

bool查询的使用:
Bool查询对应Lucene中的BooleanQuery,它由一个或者多个子句组成,每个子句都有特定的类型。

must

 用于全文搜索或者精确查询,如果在一个精确值的字段上使用它, 例如数字、日期、布尔或者一个 not_analyzed 字符串字段,那么它将会精确匹配给定的值。返回的文档必须满足must子句的条件,并且参与计算分值

GET /news/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
          "title": "人才"
          }
        },
        {
          "match": {
            "content": "人才"
          }
        },{
          "range": {
            "view_count": {
              "gte": 50,
              "lte": 60
            }
          }
        }
      ]
    }
  }
}


-- 输出
{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.683245,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.683245,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}

filter

返回的文档必须满足filter子句的条件。但是不会像Must一样,参与计算分值

should

返回的文档可能满足should子句的条件。在一个Bool查询中,如果没有must或者filter,有一个或者多个should子句,那么只要满足一个就可以返回。minimum_should_match参数定义了至少满足几个子句, 默认情况是1

GET /news/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "人才"
          }
        },
        {
         "match": {
            "title": "城市"
          }
        }
      ]
    }
  }
}

---输出
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.9140557,
    "hits" : [
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.9140557,
        "_source" : {
          "title" : "徙的鸟死于城市灯光,气象雷达如何拯救它们",
          "content" : "城市灯光会吸引迁徙的鸟类,并产生致死的后果。发表在4月21日《科学》杂志上的一篇文章,讲述了美国的鸟类学家如何利用气象雷达和建模,降低人造灯光影响下的鸟类死亡率",
          "categoryName" : "科技",
          "view_count" : 70,
          "publishTime" : "2022-04-26 12:00:00"
        }
      },
      {
        "_index" : "article",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "芯片人才争夺",
          "content" : "芯片人才争夺“生猛”,需要大量人才",
          "categoryName" : "科技",
          "view_count" : 60,
          "publishTime" : "2022-04-19 12:00:00"
        }
      }
    ]
  }
}

must_not

返回的文档必须不满足must_not定义的条件。
如果一个查询既有filter又有should,那么至少包含一个should子句。
bool查询也支持禁用协同计分选项disable_coord。一般计算分值的因素取决于所有的查询条件。
bool查询也是采用more_matches_is_better的机制,因此满足must和should子句的文档将会合并起来计算分值。

Query DSL语法

POST localhost:9200/news/_search
{
    "query":{
        "bool":{
            "must":{
                "match":{
                    "title":"人才"
                }
            }
        }
    }
}

{
    "took": 56,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.5753642,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}

 简写形式:

POST  localhost:9200/news/_search
{
  "query":{
    "match":{
      "title":"人才"
    }
  }
}

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.5753642,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}
POST localhost:9200/news/_search
{
    "query":{
        "bool":{
            "must":{
                "multi_match":{
                    "query":"人才",
                    "fields":[
                        "title",
                        "content"
                    ]
                }
            }
        }
    }
}

{
    "took": 62,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.7911257,
        "hits": [
            {
                "_index": "article",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.7911257,
                "_source": {
                    "title": "芯片人才争夺",
                    "content": "芯片人才争夺“生猛”,需要大量人才",
                    "publishTime": "2022-04-19 12:00:00"
                }
            }
        ]
    }
}

fuzzy query

返回包含与搜索词类似的词的文档,该词由Levenshtein编辑距离度量

POST  localhost:9200/news/_search
{
  "query":{
    "bool":{
      "must":{
       "fuzzy":{
         "content":{"value":"心片"}
        }
      }
    }
  }
}

输出:
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

查询计划

POST localhost:9200/news/_validate/query?explain
{
  "query":{
    "bool":{
      "must":{
       "fuzzy":{
         "title":{"value":"芯片"}
        }
      }
    }
  }
}


输出:
{
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "valid": true,
    "explanations": [
        {
            "index": "article",
            "valid": true,
            "explanation": "+title:芯片~0"
        }
    ]
}

搜索与聚合结合,统计类别的数量

POST localhost:9200/news/_search
{
    "size":0,
    "query":{
        "match_all":{

        }
    },
    "aggs":{
        "popular_colors":{
            "terms":{
                "field":"categoryName"
            }
        }
    }
}

输出:
{
    "took": 65,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "popular_colors": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "科技",
                    "doc_count": 3
                },
                {
                    "key": "汽车",
                    "doc_count": 1
                }
            ]
        }
    }
}
GET localhost:9200/myindex/_search?q=is

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "myindex",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "this is my first template"
                }
            }
        ]
    }
}

定制自己的dynamic mapping template

PUT localhost:9200/myindex
{
    "mappings": {
            "dynamic_templates": [
                { 
                  "en": {
                      "match":              "*_en", 
                      "match_mapping_type": "string",
                      "mapping": {
                          "type":           "text",
                          "analyzer":       "english"
                      }
                }                  
            }
        ]
	}
}

 插入数据

PUT localhost:9200/myindex/_doc/1
{
  "title":"this is my first template"
}


PUT localhost:9200/myindex/_doc/2
{
  "title_en":"this is my first template"
}

 搜索停用词is

搜索关键词template

GET localhost:9200/myindex/_search?q=template
{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "myindex",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "this is my first template"
                }
            },
            {
                "_index": "myindex",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.2876821,
                "_source": {
                    "title_en": "this is my first template"
                }
            }
        ]
    }
}

参考:jianshu.com/p/50dbd7252d0a

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值