[ECE]模拟试题-1

df007df

已于 2023-01-18 00:46:13 修改

阅读量710

点赞数

分类专栏： Elasticsearch ECE 文章标签：全文检索 ECE elasticsearch

于 2023-01-14 21:40:32 首次发布

本文链接：https://blog.csdn.net/df007df/article/details/128689289

版权

Elasticsearch ECE 专栏收录该内容

8 篇文章 0 订阅 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

该博客介绍了ECE相关的一系列索引操作和查询任务，包括索引重建，如task2重建为new_task2，使得match匹配the不再返回数据。此外，讨论了task3索引重建，新增fieldg字段，内容为其他字段值的拼接。还涉及地震数据按月聚合，跨集群查询，多字段的multi_match查询，以及使用runtime字段进行聚合分析等高级查询技巧。最后，提出了一道关于飞机延误时间的分析问题，展示了如何按月统计平均延误时间最长的航空公司和目的地国家。

摘要由CSDN通过智能技术生成

有一个索引task2，有field2字段，用match匹配the能查到很多数据，现在要求对task2索引进行重建，重建后的索引叫new_task2，然后match匹配the查不到数据

Text analysis› Token filter reference> stop

DELETE /task2
DELETE /new_task2
PUT task2
{
  "settings": {
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "field2":{
        "type": "text"
      }
    }
  }
}

PUT task2/_doc/1
{
  "field2":"the school"
}


PUT /new_task2
{
  "settings": {
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "stop"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field2": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

POST /_reindex
{
  "source": {
    "index": "task2"
  },
  "dest": {
    "index": "new_task2"
  }
}

GET /new_task2/_search
{
  "query": {
    "match": {
      "field2": "the"
    }
  }
}

有一个索引task3，其中有fielda,fieldb,fieldc,fielde,现要求对task3重建索引，重建后的索引新增一个字段fieldg，其值是fiedla,fieldb,fieldc,fielde的值拼接而成。

Ingest pipelines

Ingest pipelines› Ingest processor reference

DELETE task3
PUT task3
{
  "mappings": {
    "properties": {
      "fielda":{
        "type": "keyword"
      },
      "fieldb":{
        "type": "keyword"
      },
      "fieldc":{
        "type": "keyword"
      },
      "fielde":{
        "type": "keyword"
      }
    }
  }
}

POST task3/_doc/1
{
  "fielda":"aa",
  "fieldb":"bb",
  "fieldc":"cc",
  "fielde":"dd"
}

//可以使用 _simulate 去测试写的 pipeline
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "lowercase": {
          "field": "my-keyword-field"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "my-keyword-field": "FOO"
      }
    },
    {
      "_source": {
        "my-keyword-field": "BAR"
      }
    }
  ]
}

PUT task3_new
{
  "mappings": {
    "properties": {
      "fielda":{
        "type": "keyword"
      },
      "fieldb":{
        "type": "keyword"
      },
      "fieldc":{
        "type": "keyword"
      },
      "fielde":{
        "type": "keyword"
      },
      "fieldg":{
        "type": "keyword"
      }
    }
  }
}

PUT _ingest/pipeline/my_exam1_pipeline
{
  "processors": [
    {
      "script": {
        "source": "ctx.fieldg = ctx.fielda + ctx.fieldb + ctx.fieldc + ctx.fielde"
      }
    }
  ]
}

POST /_reindex
{
  "source": {
    "index": "task3"
  },
  "dest": {
    "index": "task3_new",
    "pipeline": "my_exam1_pipeline"
  }
}

GET task3_new/_search

地震索引，只要2012年的数据（日期格式dd/MM/yyyyTHH:mm:ss），按月分桶，然后对每个桶里对magnitude和depth进行最大值聚合

Aggregations› Bucket aggregations

DELETE /earthquakes2
PUT earthquakes2
{
  "settings": {
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "timestamp":{
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "magnitude":{
        "type": "float"
      },
	  "type":{
	    "type":"integer"
	  },
	  "depth":{
	    "type":"float"
	  }
    }
  }
}

POST earthquakes2/_bulk
{"index":{"_id":1}}
{"timestamp":"2012-01-01 12:12:12", "magnitude":4.56, "type":1, "depth":10}
{"index":{"_id":2}}
{"timestamp":"2012-01-01 15:12:12", "magnitude":6.46, "type":2, "depth":11}
{"index":{"_id":3}}
{"timestamp":"2012-02-02 13:12:12", "magnitude":4, "type":2, "depth":5}
{"index":{"_id":4}}
{"timestamp":"2012-03-02 13:12:12", "magnitude":6, "type":3, "depth":8}
{"index":{"_id":5}}
{"timestamp":"1967-03-02 13:12:12", "magnitude":6, "type":2, "depth":6}

POST /earthquakes2/_search
{
  "size": 0,
  "aggs": {
    "my_filter": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2012-01-01 00:00:00",
            "lte": "2013-01-01 00:00:00"
          }
        }
      },
      "aggs": {
        "bucket_month": {
          "date_histogram": {
            "field": "timestamp",
            "calendar_interval": "month"
          },
          "aggs": {
            "max_magnitude": {
              "max": {
                "field": "magnitude"
              }
            },
            "max_depth":{
              "max": {
                "field": "depth"
              }
            }
          }
        }
      }
    }
  }
}

注册一个快照，在指定的库创建一个指定索引的快照。

Snapshot and restore› Register a snapshot repository

PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/usr/share/elasticsearch/snapshot"
  }, 
  "max_snapshot_bytes_per_sec":"20mb",
  "max_restore_bytes_per_sec":"20mb"
}

//备份索引
PUT /_snapshot/my_backup/snapshot_test_2023-01-03
{
  "indices": ["test"],
  "ignore_unavailable": true,
  "include_global_state": false
}

跨集群查询。
1）配置好跨集群设置
2）配置clustername:索引名即可。

Set up Elasticsearch› Remote clusters

PUT /_cluster/settings
{
  "persistent": {
    "cluster": {
      "remote":{
        "cluster_one":{
          "seeds":[
            "192.168.0.11:9300"
            ]
        }
      }
    }
  }
}

POST /cluster_one:employees/_search
{
  "query": {
    "match_all": {}
  }
}

multi_match查询。在a,b,c,d字段里搜索"fire",D字段的权重为2，最终得分是所有命中字段得分的和。

Query DSL› Full text queries

POST task5/_bulk
{"index":{"_id":1}}
{"a":"fire", "b":"fired", "c":"fox", "d":"box"}

POST task5/_search
{
  "query": {
    "multi_match": {
      "query": "fire",
      "type": "most_fields", 
      "fields": ["a","b","c","d^2"]
    }
  }
}

运行时字段，根据运行字段进行聚合分析，根据文档书写即可。

在task6索引里，创建一个runtime字段，其值是A-B，A,B为字段；创建一个range聚合，分为三级：小于0，0-100，100以上；返回文档数为0

Mapping› Runtime fields

PUT task6/_bulk
{"index":{"_id":1}}
{"A":100, "B":2}
{"index":{"_id":2}}
{"A":120, "B":2}
{"index":{"_id":3}}
{"A":120, "B":25}
{"index":{"_id":4}}
{"A":21, "B":25}


PUT task6/_mapping
{
  
  "runtime":{
    "C":{
      "type":"long",
      "script":{
        "source":"emit(doc['A'].value - doc['B'].value)"
      }
    }
  }
}


POST task6/_search
{
  "size": 0,
  "aggs": {
    "range_c": {
      "range": {
        "field": "C",
        "ranges": [
          {
            "to":0
          },
          {
            "from": 0,
            "to": 100
          },
          {
            "from": 100
          }
        ]
      }
    }
  }
}

检索模板、查询，高亮，排序的混合。创建一个搜索模板满足以下条件：
对于字段A，搜索param为search_string
对于返回值，要高亮A字段的内容，用和框起来
对于返回值，按照B字段排序
对test索引进行搜索，search_string的值为test

Search your data

DELETE test_search_temp
PUT test_search_temp
{
  "settings":{
    "number_of_replicas":0,
    "number_of_shards":1
  },
  "mappings":{
    "properties": {
      "A":{
        "type":"text"
      },
      "B":{
        "type":"integer"
      }
    }
  }
}

POST test_search_temp/_bulk
{"index":{"_id":1}}
{"A":"I love test", "B":1}
{"index":{"_id":2}}
{"A":"I hate test", "B":2}

PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "A": "{{search_string}}"
        }
      },
      "highlight": {
        "fields": {
          "A": {
            "pre_tags": [
              "<em>"
            ],
            "post_tags": [
              "</em>"
            ]
          }
        }
      },
      "sort": [
        {
          "B": {
            "order": "desc"
          }
        }
      ]
    }
  }
}

GET _scripts/my-search-template


GET test_search_temp/_search/template
{
  "id": "my-search-template",
  "params": {
    "search_string": "test"
  }
}

给了飞机每个月飞行的数据，求出每个月平均延误时间最长的公司名字。
● 按到达国家统计每个国家的平均机票价格，并找出平均价格最高的国家。因为没有数据与具体题目，只能按要求做一个类似的。
● 求出每个月平均延误时间最长的目的地国家
● 解析：首先按月（timestamp）分桶，第二按目的地国家（DestCountry）分桶，第三求出每个目的地国家平均延误时间，第四因max_bucket是sibling聚合，在与bucket_DestCountry目的地国家桶并列的求出每个月内平均延误时间最长的国家；
● 结论：每个月内，都有一个最大延误时间的目的地国家


GET /kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "bucket_DestCountry": {
      "terms": {
        "field": "DestCountry",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "AvgTicketPrice"
          }
        }
      }
    },
    "max_price_country":{
      "max_bucket": {
        "buckets_path": "bucket_DestCountry>avg_price"
      }
    }
  }
}