ElasticSearch+kibana（以及ik）

咫尺的梦想007

已于 2024-04-11 00:05:30 修改

阅读量384

点赞数 4

文章标签： elasticsearch 搜索引擎大数据

于 2024-04-06 15:14:29 首次发布

本文链接：https://blog.csdn.net/ylmiraclelife/article/details/137184813

版权

本文详细介绍了Elasticsearch的索引库管理，如mapping设置、ki分词器使用（如ik_max_word）、文档操作（添加、删除和修改）、以及各种查询类型，包括全文检索、精确查询、地理查询和数据聚合。此外，还涉及了如何使用restclient进行操作和搜索结果处理方法。

摘要由CSDN通过智能技术生成

说明：

es搜索首先会对搜索词继续分词，按照词搜索；
es是面向文档存储的，系列化为json格式存储在es中，
首先建立索引，文档的约束使用mapping映射约束，索引中对应多个文档，添加文档；
MySQL和es的区别：MySQL有事务保证数据安全性，es为搜索而生；
Docs

1.索引库

1.mapping映射：

索引库的约束，mapping常见的属性：

tepe：
字符串：
1. text 可分词的文本
2. keyword：精确值；（比如国家，品牌等不可分）
analyzer：
1. 分词器，只有text需要分词，（ik_smart，ik_max_word）;
数值类型：
1. long，integer，short，byte，double，float
布尔：boolean
日期：date
对象：object
index：
1. 是否创建索引（倒排索引，默认true）
properties:
1. 字段的子字段
es和kibana路径是9200和5601；

2.ki分词器

docker exec -it es ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip

 POST /_analyze
 {
   "text":"我是在太原的晴天欧里给特别优秀的人加油",
   "analyzer":"ik_max_word"（ik_smart，ik_max_word）
 }

ik分词器扩展

在IKAnalyzer.cfg.xml配置文件内容添加：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
        <entry key="ext_dict">ext.dic</entry>
        <entry key="ext_stopwords">stopwords.dic</entry>
</properties>

在IK分词器的config目录新建一个 ext.dic，可以参考config目录下复制一个配置文件进行修改

3.索引库：

（1）创建索引

PUT /heima
{
  "mappings": {
    "properties": {
      "info":{
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "emial":{
        "type": "keyword",
        "index": false
      }
    }
  }
}

index默认是true，false则为不参与搜索（比如说密码）；

（2）查，删索引库

GET /heima，delete / heima

（3）修改索引库

es不允许修改索引库（会报404错误），但是可以添加mapping

PUT /索引库名/_mapping
{
  "properties": {
    "新字段名":{
      "type": "integer"
    }
  }
}

4.文档操作

添加,删除，查询文档

,对应的是post，delete，get /heima/_doc/1

POST /heima/_doc/1
{
    "info": "黑马程序员Java讲师",
    "email": "zy@itcast.cn",
    "name": {
        "firstName": "云",
        "lastName": "赵"
    }
}

（添加文档一定要加上id，要不然es会随机生成一个id,一定要加_doc）

修改文档（put）

全局修改和局部修改；

全修改会先删除再修改，如果不存在就相当于新增

PUT /heima/_doc/1
{
    "info": "黑马程序员高级Java讲师",
    "email": "zy@itcast.cn",
    "name": {
        "firstName": "云",
        "lastName": "赵"
    }
}

局部修改是只修改指定id匹配的文档中的部分字段

POST /heima/_update/1
{
  "doc": {
    "email": "ZhaoYun@itcast.cn"
  }
}

5.es客户端（restclient）

多个字段分词搜索：可以定义一个“all”字段，分词字段加上

1.使用步骤

//1.依赖
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>


//2.指定版本，和服务器版本一致
<properties>
      <maven.compiler.source>11</maven.compiler.source>
      <maven.compiler.target>11</maven.compiler.target>
      <elasticsearch.version>7.12.1</elasticsearch.version>
  </properties>

//初始化
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
));

2.创建索引库

void testCreateIndex() throws IOException {
    // 1.创建Request对象
    CreateIndexRequest request = new CreateIndexRequest("items");
    // 2.准备请求参数
    request.source(MAPPING_TEMPLATE, XContentType.JSON);
    // 3.发送请求
    client.indices().create(request, RequestOptions.DEFAULT);
}

static final String MAPPING_TEMPLATE = "{\n" +
            "  \"mappings\": {\n" +
            "    \"properties\": {\n" +
            "      \"id\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"name\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\"\n" +
            "      },\n" +
            "      \"price\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"stock\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"image\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"category\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"brand\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"sold\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"commentCount\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"isAD\":{\n" +
            "        \"type\": \"boolean\"\n" +
            "      },\n" +
            "      \"updateTime\":{\n" +
            "        \"type\": \"date\"\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";

3.删除索引库

 // 1.创建Request对象
    DeleteIndexRequest request = new DeleteIndexRequest("items");
    // 2.发送请求
    client.indices().delete(request, RequestOptions.DEFAULT);

4.文档操作-新增文档

POST /{索引库名}/_doc/1
{
    "name": "Jack",
    "age": 21
}

2.分布式搜索引擎

1.查询类型

1.查询所有；match_all;

GET /{索引库名}/_search
{
  "query": {
    "查询类型": {
      // .. 查询条件
    }
  }
}

GET /items/_search
{
  "query": {
    "match_all": {
      
    }
  }
}

2.全文检索搜索

//会对用户的输入内容分词，按照倒排索引搜索，推荐第一种
GET /{索引库名}/_search
{
  "query": {
    "match": {
      "字段名": "TEXT"//一般情况是倒排索引的text的字段
    }
  }
}


GET /{索引库名}/_search
{
  "query": {
    "multi_match": {
      "query": "搜索条件",
      "fields": ["字段1", "字段2"]
    }
  }
}
//查询字段越多性能越差，建议使用copyto

3.精确查询

顾名思义，词条级别的查询。也就是说不会对用户输入的搜索条件再分词，而是作为一个词条，与搜索的字段内容精确值匹配。因此推荐查找keyword、数值、日期、boolean类型的字段。

term是词条精确值查询，比如说酒店名称
GET /{索引库名}/_search
{
  "query": {
    "term": {
      "字段名": {
        "value": "搜索条件"
      }
    }
  }
}
range是范围查询，比如说金额范围
GET /{索引库名}/_search
{
  "query": {
    "range": {
      "字段名": {
        "gte": {最小值},
        "lte": {最大值}
      }
    }
  }
}

4.地理查询

比如查询附近的酒店；

2.搜索结果处理

排序

分页

高亮

3.数据聚合

概念
1. 实现对文档数据的统计分析和运算；
分类：
1. 桶分组：（）bucket
  1. tremaggregation和mysql中的分组类似；
  2. date histogram：按照日期分组。一周一组或者一月一组等；
2. 度量聚合（metric）
  1. 用于计算，比如说最大最小值，平均值等；
  2. AVg；
  3. max；
  4. min；
  5. stats同时求前三个；