Elasticsearch中间件详细教程—高级篇

Mredust

已于 2024-05-15 21:43:30 修改

阅读量328

点赞数 4

分类专栏：笔记文章标签： elasticsearch 搜索引擎 java spring boot docker 中间件全文检索

于 2024-05-15 20:51:13 首次发布

本文链接：https://blog.csdn.net/Mredust/article/details/138923062

版权

笔记专栏收录该内容

4 篇文章 0 订阅

订阅专栏

目录概要

数据聚合
自动补全
数据同步

Elasticseach中间件详细教程—基础篇

Elasticseach中间件详细教程—进阶篇

Elasticseach中间件详细教程—实战篇

数据聚合

桶聚合：类似MySQL的group_by
度量聚合
- max
- min
- avg
- stats：同时求max、min、avg、sum等
管道聚合：以上述2个聚合为基础在做聚合

桶聚合

query：对聚合的范围结果进一步缩小，可以不写
size：不显示文档数据，这里专注聚合的结果，可以不写。
order：默认是对_count降序，可自定义

GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "lte": 200
      }
    }
  }, 
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

度量聚合

stats可换其他聚合类型，搜索出来的结果只是单一，跟MySQL的max、min等用法一样

GET /hotel/_search
{
  "size": 0,
 "aggs": {
   "scoreAgg": {
     "stats": {
       "field": "score"
     }
   }
 }
}

管道聚合

在桶聚合的同级下再去做聚合
如果要对度量聚合结果进行排序，可在order替换对应的映射。e.g："scoreAgg.avg": "asc"

GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "lte": 200
      }
    }
  }, 
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "_count": "asc"
        }
      },
      "aggs": {
        "scoreAgg": {
          "stats": {
            "field": "score"
          }
        }
      }
    }
  }
}

自动补全

引入概念：分词器

对搜索的关键字进行补全提示，要在创建索引时候去引入

扩展分词器
- ik-分词器Github下载地址
  - ik_smart
  - ik_max_word
- 拼音分词器Github下载地址
自定义分词器配置项
- tokenizer
- character filter
- filter
使用

扩展

ik分词器

ik_smart

POST /_analyze
{
  "text": ["不会编程的小白"],
  "analyzer": "ik_smart"
}POST /_analyze
{
  "text": ["不会编程的小白"],
  "analyzer": "ik_smart"
}

ik_max_word

POST /_analyze
{
  "text": ["不会编程的小白"],
  "analyzer": "ik_max_word"
}

拼音分词器

POST /_analyze
{
  "text": ["不会编程的小白"],
  "analyzer": "pinyin"
}

自定义

模板
tokenizer：将文本分割为最大粒度的词语
filter：指定了一个名为py的过滤器，用于进一步处理分词结果
py：
- type：设置过滤器的类型为pinyin。
- keep_full_pinpin：设置为false，表示不保留全拼音。
- keep_joined_full_pinyin：设置为true，表示保留连接起来的全拼音（如“你好”会变成“nihao”）。
- keep_original：设置为true，表示保留原始的中文词语。
- limit_first_letter_term：限制首字母拼音的术语长度为16个字符。
- remove_duplicated_term：设置为true，表示移除重复的术语。
- none_chinese_pinyin_tokenize：设置为false，表示不对非中文字符进行拼音分

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
        "py": {
          "type": "pinyin",
          "keep_full_pinpin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_term": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "introduction": {
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

使用自定义的分词器

POST /test/_analyze
{
  "text": ["不会编程的小白"],
  "analyzer": "my_analyzer"
}

使用

mapping构建使用类型为completion

"mappings": {
    "properties": {
      "introduction": {
        "type": "completion",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }

查询
- 模板

{
  "suggest": {
    "YOUR_SUGGESTION": {
      "text": "YOUR TEXT",
      "term": {
        "FIELD": "MESSAGE"
      }
    }
  }
}

{
  "suggest": {
    "title_suggest": {
      "text": "s", # 查询的关键字
      "completion": {
        "field": "title", # 字段
        "skip_duplicates": true, # 跳过重复
        "size": 10
      }
    }
  }
}

重新构建mapping
插入模拟数据
查询

GET /hotel/_search
{
  "suggest": {
    "suggestions": {
      "text": "h",
      "completion": {
        "field": "suggestion",
        "skip_duplicates": true,
        "size": 10
      }
    }
  }
}