Elastic Certified Engineer复习记录-复习题详解篇-搜索数据（2）

最新推荐文章于 2022-01-12 16:31:35 发布

死敌wen

最新推荐文章于 2022-01-12 16:31:35 发布

阅读量133

点赞数

分类专栏： ECE考试开发基础程序人生文章标签： docker 容器运维

本文链接：https://blog.csdn.net/weixin_40601534/article/details/121766569

版权

开发基础同时被 3 个专栏收录

28 篇文章 0 订阅

订阅专栏

程序人生

24 篇文章 0 订阅

订阅专栏

ECE考试

21 篇文章 3 订阅

订阅专栏

EXAM OBJECTIVE: QUERIES

考点：queries

GOAL: Create search queries for terms, numbers, dates, fuzzy, and

考试目标：构建terms、数字、日期、模糊匹配及符合查询语句

REQUIRED SETUP:

初始化步骤：
建议docker-compose文件：1e1k_base_cluster.yml

a running Elasticsearch cluster with at least one node and a Kibana instance,
1. 运行一个至少含有1ES节点1Kibana节点的集群
add the “Sample web logs” and “Sample flight data” to Kibana
1. 在Kibana里添加"Sample web logs" 和 "Sample flight data"两个示例数据集
Run the next queries on the kibana_sample_data_logs index
1. 在kibana_sample_data_logs索引中运行下面的搜索语句

初始化

搭建集群，docker-compose -f 1e1k_base_cluster.yml up -d --build
添加数据：
1. Kibana点左上角Kibana图标回到首页
2. 点右侧最上面一栏 Add Data to Kibana（中文大概是“为Kibana添加数据”，没有中文版可以测试，具体的要看具体翻译）
3. 点最右边一个栏目 Sample data （中文大概是“样例数据”）
4. 点第一第三个示例数据的 Add data

校验数据，GET _cat/indices

如果出现kibana_sample_data_flights 和 kibana_sample_data_logs，代表添加成功

green  open kibana_sample_data_flights   TxLrY4R4RB2wRcNsw5bQ9Q 1 0 13059 0  6.5mb  6.5mb
green  open kibana_sample_data_ecommerce -BmN-n3MRgOdtIrINDeufw 1 0  4675 0  4.9mb  4.9mb
green  open kibana_sample_data_logs      CfNuYq1kTvelLLVG0T6biA 1 0 14074 0 11.8mb 11.8mb

第1题，构建搜索语句

Filter documents with the response field greater or equal to 400 and less than 500
1. 筛选出response大于等于400而且小于等于500的文档
As above, but add a second filter for documents with the referer field matching “http://twitter.com/success/guion-bluford”
1. 接上个query，加上第二个筛选，referer 字段需要匹配 “http://twitter.com/success/guion-bluford”
Filter documents with the referer field that starts by “http://twitter.com/success”
1. 筛选出referer 字段以 “http://twitter.com/success” 开头的文档
Filter documents with the request field that starts by “/people”
1. 筛选出 request 字段以 “/people” 开头的文档
Filter documents with the memory field containing any indexed value
1. 筛选出包含 memory 字段的文档
(opposite of above) Filter documents with the memory field not containing any indexed value
1. （和上一个相反）筛选出不包含 memory 字段的文档
Search for documents with the agent field containing the string “Windows” and the url field containing the string “name:john”
1. 搜索 agent 字段包含 “Windows” 而且 url 字段包含 “name:john” 的文档
As above, but also filter documents with the phpmemory field containing any indexed value
1. 接上个query，但是过滤出 phpmemory 存在的文档
Search for documents that have either the response field greater or equal to 400 or the tags field having the string “error”
1. 搜索 response 字段大于等于 400 或者 tags 字段包含 “error” 的文档
Search for documents with the tags field that does not contain any of the following strings: “warning”, “error”, “info”
1. 搜索 tags 不包含 “warning”, “error”, “info” 这三个任意一个字符串的文档
Filter documents with the timestamp field containing a date between today and one week ago
1. 筛选出 timestamp 包含的日期在1周以前到现在的时间区间里

第1题，题解

筛选 response 满足 [400, 500] 的文档

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "response": {
            "gte": 400,
            "lte": 500
          }
        }
      }
    }
  }
}

计数

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 801,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
    ]
  }
}

上一个query里加上referer 字段需要匹配 "http://twitter.com/success/guion-bluford"的筛选

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "filter": [
          {
          "range": {
            "response": {
              "gte": 400,
              "lte": 500
            }
          }
        },
        {
          "match": {
            "referer": "http://twitter.com/success/guion-bluford"
          }
        }
      ]
    }
  }
}

计数

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
    ]
  }
}

筛选 referer 以 “http://twitter.com/success” 开头

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "filter": {
        "prefix": {
          "referer": "http://twitter.com/success"
        }
      }
    }
  }
}

计数

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3584,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
    ]
  }
}

筛选 request 以 “/people” 开头

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "filter": {
        "prefix": {
          "request.keyword": "/people"
        }
      }
    }
  }
}

计数

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 452,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
    ]
  }
}

筛选包含 memory 的文档

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "filter": {
        "exists": {
          "field": "memory"
        }
      }
    }
  }
}

计数

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 552,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
    ]
  }
}

筛选不包含 memory 的文档

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "memory"
        }
      }
    }
  }
}

计数

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 0.0,
    "hits" : [
    ]
  }
}

搜索 agent 包含 “Windows” 而且 url 字段包含 “name:john”

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "agent": "Windows"
          }
        },
        {
          "match": {
            "url": "name:john"
          }
        }
      ]
    }
  }
}

计数

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 7.5268917,
    "hits" : [
    ]
  }
}

接上，但是过滤出 phpmemory 存在的文档

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "agent": "Windows"
          }
        },
        {
          "match": {
            "url": "name:john"
          }
        }
      ],
      "filter": {
        "exists": {
          "field": "phpmemory"
        }
      }
    }
  }
}

计数

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 7.5268917,
    "hits" : [
    ]
  }
}

搜索 response 大于等于 400 或者 tags 包含 “error”

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "range": {
            "response": {
              "gte": 400
            }
          }
        },
        {
          "match": {
            "tags": "error"
          }
        }
      ]
    }
  }
}

计数

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2003,
      "relation" : "eq"
    },
    "max_score" : 3.8313324,
    "hits" : [
    ]
  }
}

搜索 tags 不包含 “warning”, “error”, “info”

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "tags": "warning"
          }
        },
        {
          "match": {
            "tags": "error"
          }
        },
        {
          "match": {
            "tags": "info"
          }
        }
      ]
    }
  }
}

或者

POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "terms": {
            "tags": ["warning", "error", "info"]
          }
        }
      ]
    }
  }
}

计数

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2927,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
    ]
  }
}

筛选出 timestamp 包含的日期在1周以前到现在的时间区间里

POST kibana_sample_data_logs/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-7d/d",
        "lte": "now/d"
      }
    }
  }
}

计数

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1840,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    ]
  }
}

第1题，题解说明

这题主要考察的是通过bool 配合 filter，must，should，must_not等关键字进行多检索条件的逻辑计算召回，以及 prefix，exists，match，range，terms 等搜索关键字的使用
- 这里可能会有一个大坑是 prefix 关键字只能作用在 keyword 字段上，所以上面的题解里会有 prefix: referer 和 prefix: request.keyword 的区别
- 在实际生产中可能会存在需要判断字段的值存不存在和字段存不存在的情况，判断字段存不存在可以用 exists 而判断字段的值存不存在可能需要搭配 script 对字段值长度进行判断，或者通过 null_value 等方式
```
POST kibana_sample_data_logs/_search
{
  "query": {
    "bool": {
      "must": {
        "script": {
          "script": {
            "source": "String message = doc['message.keyword'].value; return (null != message && 0 < message.length())"
          }
        }
      }
    }
  }
}
```
1. 参考链接-bool，参考链接-prefix，参考链接-exists，参考链接-match，参考链接-range，参考链接-terms，
2. 页面路径-bool：Query DSL =》 Compound queries =》 Boolean
3. 页面路径-prefix：Query DSL =》 Term-level queries =》 Prefix
4. 页面路径-exists：Query DSL =》 Term-level queries =》 Exists
5. 页面路径-match：Query DSL =》 Full text queries =》 Match
6. 页面路径-prefix：Query DSL =》 Term-level queries =》 Range
7. 页面路径-prefix：Query DSL =》 Term-level queries =》 Terms

第2题，模糊匹配

Run the next queries on the kibana_sample_data_flights index
1. 下面的query请在 kibana_sample_data_flights 索引上执行
Filter documents with either the OriginCityName or the DestCityName fields matching the string “Sydney”
1. 筛选出 OriginCityName 或者 DestCityName 字段里包含 “Sydney” 的文档
As above, but allow inexact fuzzy matching, with a maximum allowed “Levenshtein Edit Distance” set to 2. Test that the query strings “Sydney”, “Sidney” and “Sidnei” always return the same number of results
1. 如上，但是加上模糊匹配，把 “莱文施泰因编辑距离” （Levenshtein Edit Distance）设为2。测试一下当query是"Sydney", “Sidney” 和 “Sidnei” 时返回的结果条数一样。

第2题，题解

筛选OriginCityName 或 DestCityName 包含 “Sydney”

POST kibana_sample_data_flights/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "OriginCityName": "Sydney"
          }
        },
        {
          "match": {
            "DestCityName": "Sydney"
          }
        }
      ]
    }
  }
}

加模糊匹配，调整编辑距离

Sydney

POST kibana_sample_data_flights/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "fuzzy": {
          "OriginCityName": {
            "value": "Sydney",
            "fuzziness": "2"
          } 
          }
        },
        {
          "fuzzy": {
          "DestCityName": {
            "value": "Sydney",
            "fuzziness": "2"
          } 
          }
        }
      ]
    }
  }
}

计数

{
  "took" : 200,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 405,
      "relation" : "eq"
    },
    "max_score" : 8.344088,
    "hits" : [
    ]
  }
}

Sidney

POST kibana_sample_data_flights/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "fuzzy": {
          "OriginCityName": {
            "value": "Sidney",
            "fuzziness": "2"
          } 
          }
        },
        {
          "fuzzy": {
          "DestCityName": {
            "value": "Sidney",
            "fuzziness": "2"
          } 
          }
        }
      ]
    }
  }
}

计数

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 405,
      "relation" : "eq"
    },
    "max_score" : 6.9534063,
    "hits" : [
    ]
  }
}

Sidnei

POST kibana_sample_data_flights/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "fuzzy": {
          "OriginCityName": {
            "value": "Sidnei",
            "fuzziness": "2"
          } 
          }
        },
        {
          "fuzzy": {
          "DestCityName": {
            "value": "Sidnei",
            "fuzziness": "2"
          } 
          }
        }
      ]
    }
  }
}

计数

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 405,
      "relation" : "eq"
    },
    "max_score" : 5.5627246,
    "hits" : [
    ]
  }
}

第2题，题解说明

这题主要考察的是fuzzy的使用
- 在fuzzy里ES会尝试对原query 进行一定的改写以尝试对可能的错误拼写进行模糊匹配（类似纠错功能）
- fuzzy中可以对编辑距离、改写长度等进行限制，从而平衡召回率和准确率
1. 参考链接
2. 页面路径：Query DSL =》 Term-level queries =》 Fuzzy