DSL : es搜索引擎的常见语法_location": { "lat": 31.21, "lon": 121.5 }-CSDN博客

本文链接：https://blog.csdn.net/weixin_47478177/article/details/129732723

本文介绍了如何在Elasticsearch中创建、查看和删除索引，以及如何新增、查询和更新文档。详细讲解了DSL查询，包括全文检索、精确查询、地理查询、复合查询和排序。同时，讨论了数据聚合、自定义分词器和自动补全索引库的实现。

摘要由CSDN通过智能技术生成

1.新增索引库

PUT /323test
{
  "mappings": {
    "properties": {
      "info": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email": {
        "type": "keyword",
        "index": false
      },
      "name": {
        "properties": {
          "firstName": {
            "type": "keyword"
          },
           "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

2.查看索引,删除索引

GET /323test

DELETE /323test

3.修改索引库(仅能新增,不能真的修改,如需修改,可删除后再创建)

PUT /323test/_mapping
{
  "properties": {
    "age": {
      "type": "integer"
    }
  }
}

4.新增文档

POST /323test/_doc/1
{
  "age": 24,
  "email": "717166157@qq.com",
  "info": "啊哈哈哈,开心开心,我爱es搜索引擎,我爱java,我爱喝咖啡",
  "name": {
    "firstName": "林灿",
    "lastName": "何"
  }
}

5.根据ID查询文档,根据ID删除文档

GET /323test/_doc/1
DELETE /323test/_doc/1

6.全量修改文档=>先删除,再新增

PUT /323test/_doc/1
{
  "age": 24,
  "email": "717166157@163.com",
  "info": "啊哈哈哈,开心开心,我爱es搜索引擎,我爱java,我爱喝咖啡",
  "name": {
    "firstName": "林灿",
    "lastName": "何"
  }
}

7.局部修改文档字段(不存在则新增)

POST /323test/_update/1
{
  "doc": {
    "email": "717166157@qq.com"
  }
}

8.例子: 新增酒店索引

# 新增酒店的mapping(索引)
PUT /hotel
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all"
      },
      "address": {
        "type": "keyword",
        "index": false
      },
      "price": {
        "type": "integer"
      },
      "score": {
        "type": "integer"
      },
      "brand": {
        "type": "keyword",
        "copy_to": "all"
      },
      "city": {
        "type": "keyword"
      },
      "starName": {
        "type": "keyword"
      },
      "business": {
        "type": "keyword",
        "copy_to": "all"
      },
      "location": {
        "type": "geo_point"
      },
      "pic": {
        "type": "keyword",
        "index": false
      },
      "all": {
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

9.DSL查询语法

9.1.DSL查询分类和基本语法

# 查看所有文档
GET /hotel/_doc/_search

# 查询所有文档
GET /hotel/_doc/_search
{
  "query": {
    "match_all": {}
  }
}

9.2.全文检索查询(会对用户输入的内容分词,常用于搜索框搜索)

# match查询(推荐)!!!
GET /hotel/_doc/_search
{
  "query": {
    "match": {
      "all":"外滩 如家"
    }
  }
}

# multi_match查询(多字段查询,效率低)
GET /hotel/_doc/_search
{
  "query": {
    "multi_match": {
      "query":"外滩如家",
      "fields":["brand","business","name"]
    }
  }
}

9.3.精确查询(一般是查找keyword、数值、日期、boolean等类型字段。所以不会对搜索条件分词)

term:根据词条精确值查询

# term查询
GET /hotel/_doc/_search
{
  "query": {
    "term": {
      "city":{
        "value":"上海"
      }
    }
  }
}

range:根据值的范围查询

# range查询
GET /hotel/_doc/_search
{
  "query": {
    "range": {
      "price":{
        "gte":100,
        "lte":200
      }
    }
  }
}

9.4.地理查询,根据经纬度查询

geo_distance:查询到指定中心点小于某个距离值的所有文档

GET /hotel/_doc/_search
{
  "query": {
    "geo_distance": {
      "distance": "15km",
      "location": "31.21,121.5"
    }
  }
}

9.5.复合查询

9.5.1.修改文档的相关性算分

使用function score query，可以修改文档的相关性算分(query score)，根据新得到的算分排序。

GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "all": "外滩"
        }
      },
      "functions": [
        {
          "filter": {
            "term": {
              "brand": "如家"
            }
          },
          "weight": 10
        }
      ],
      "boost_mode": "sum"
    }
  }
}

9.5.2.Boolean Query(布尔查询是一个或多个查询子句的组合。)

子查询的组合方式有:

must:必须匹配每个子查询，类似“与”

should:选择性匹配子查询，类似“或”

must_not:必须不匹配，不参与算分，类似“非”

filter:必须匹配，不参与算分

需求:搜索名字包含“如家”，价格不高于400，在坐标31.21,121.5周围10km范围内的酒店。

利用bool查询实现功能
GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "如家"
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "price": {
              "gt": 400
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

10.排序

elasticsearch支持对搜索结果排序，默认是根据相关度算分(_score)来排序。可以排序字段类型有: keyword类型、数值类型、地理坐标类型、日期类型等。

案例:对酒店数据按照用户评价降序排序，评价相同的按照价格升序排序

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "score": "desc",
      "price": "asc"
    }
  ]
}

案例:实现对酒店数据按照到你的位置坐标的距离升序排序

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 31.034661,
          "lon":  121.612282
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

11.分页

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": "asc"
    }
  ],
  "from": 30,
  "size": 20
}

12.高亮

高亮查询，默认情况下，ES搜索字段必须与高亮字段一致,所以需要设置require_field_match为false

GET /hotel/_search
{
  "query": {
    "match": {
      "all": "如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {
        "require_field_match": "false",
        "pre_tags": "<em>",
        "post_tags": "<em>"
      },
      "brand": {
        "require_field_match": "false",
        "pre_tags": "<em>",
        "post_tags": "<em>"
      }
    }
  }
}

13.搜索结果处理整体语法

GET /hotel/_search
{
  "query": {
    "match": {
      "all": "如家"
    }
  },
  "from": 0,
  "size": 20,
  "sort": [
    {
      "price": "asc"
    },
    {
      "_geo_distance": {
        "location": {
          "lat": 31.034661,
          "lon": 121.612282
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ],
  "highlight": {
    "fields": {
      "name": {
        "require_field_match": "false",
        "pre_tags": "<em>",
        "post_tags": "<em>"
      },
      "brand": {
        "require_field_match": "false",
        "pre_tags": "<em>",
        "post_tags": "<em>"
      }
    }
  }
}

14.数据聚合-DSL实现桶(Bucket)聚合

14.1.自定义

GET /hotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10
      }
    }
  }
}

14.2.自定义顺序规则(默认降序)

GET /hotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

14.2.限定聚合范围

默认情况下，Bucket聚合是对索引库的所有文档做聚合，如果上亿条数据这种情况,会极大消耗内存, 所以我们可以限定要聚合的文档范围，只要添加query条件即可;

GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lte": 300
      }
    }
  }, 
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10
      }
    }
  }
}

15.数据聚合-DSL实现度量(Metrics)聚合

#嵌套聚合metric
GET /hotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10
      },
      "aggs": {
        "scroeAgg": {
          "stats": {
            "field": "score"
          }
        }
      }
    }
  }
}

实例:根据不同品牌的酒店的平均分进行降序排序

GET /hotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "scroeAgg.avg": "desc"
        }
      },
      "aggs": {
        "scroeAgg": {
          "stats": {
            "field": "score"
          }
        }
      }
    }
  }
}

16.自定义分词器!!!!!!

可以在创建索引库时，通过settings来配置自定义的analyzer(分词器)︰

拼音分词器适合在创建倒排索引的时候使用，但不能在搜索的时候使用。搜索时使用ik_smart或者其他分词器,因为当搜索时也用自定义分词器的话,用户搜索狮子时,就会把虱子搜索出来

"analyzer":"my_analyzer" // 创建时使用自定义分词器

"search_analyzer":"ik_smart" // 搜索时使用ik_smart分词器

// 此自定义分词器仅存在于本索引库中
PUT /test
{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "ik_max_word",
                    "filter": "py"
                }
            },
            "filter": {
                "py": {
                    "type": "pinyin",
                    "keep_full_pinyin": false,
                    "keep_joined_full_pinyin": true,
                    "keep_original": true,
                    "limit_first_letter_length": 16,
                    "remove_duplicated_term": true,
                    "none_chinese_pinyin_tokenize": false
                }
            }
        }
    },
    "mappings":{
        "properties":{
            "name":{
                "type":"text",
                // 创建时使用自定义分词器
                "analyzer":"my_analyzer",
                // 搜索时使用ik_smart分词器(如果搜索时也用自定义分词器,用户搜索狮子时,也会把虱子搜索出来)
                "search_analyzer":"ik_smart"
            }
        }
    }
}

17.自动补全索引库

// 创建自动补全索引库
PUT test2
{
  "mappings": {
    "properties": {
      "title":{
        "type": "completion"
      }
    }
  }
}

// 自动补全查询
POST /test2/_search
{
  "suggest": {
    "title_suggest": {
      "text": "s", // 关键字
      "completion": {
        "field": "title", // 补全字段
        "skip_duplicates": true, // 跳过重复的
        "size": 10 // 获取前10条结果
      }
    }
  }
}

18.例子:新增酒店索引

再copy一遍

// 酒店数据索引库
PUT /hotel
{
  "settings": {
    "analysis": {
      "analyzer": {
        "text_anlyzer": {
          "tokenizer": "ik_max_word",
          "filter": "py"
        },
        "completion_analyzer": {
          "tokenizer": "keyword",
          "filter": "py"
        }
      },
      "filter": {
        "py": {
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart",
        "copy_to": "all"
      },
      "address":{
        "type": "keyword",
        "index": false
      },
      "price":{
        "type": "integer"
      },
      "score":{
        "type": "integer"
      },
      "brand":{
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
        "type": "keyword"
      },
      "starName":{
        "type": "keyword"
      },
      "business":{
        "type": "keyword",
        "copy_to": "all"
      },
      "location":{
        "type": "geo_point"
      },
      "pic":{
        "type": "keyword",
        "index": false
      },
      "all":{
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart"
      },
      "suggestion":{
          "type": "completion",
          "analyzer": "completion_analyzer"
      }
    }
  }
}