ElasticSearch基础知识点整理

java1224

于 2024-12-13 16:56:48 发布

阅读量609

点赞数 14

文章标签： elasticsearch 大数据搜索引擎 java

本文链接：https://blog.csdn.net/java1224/article/details/144454956

版权

一、概述

ElasticSearch是一个基于 Lucene 的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。
Elasticsearch 8最低jdk版本要求jdk17
ElasticSearch vs Solr 总结
- es基本是开箱即用，非常简单。Solr安装略微复杂。
- Solr 利用 Zookeeper 进行分布式管理，而 Elasticsearch 自身带有分布式协调管理功能。
- Solr 支持更多格式的数据，比如JSON、XML、CSV，而 Elasticsearch 仅支持json文件格式。
- Solr 是传统搜索应用的有力解决方案，但 Elasticsearch 更适用于新兴的实时搜索应用。
现在很多互联网应用都是要求实时搜索的，所以我们选择了elasticsearch。

二、倒排索引（*）

ElasticSearch 实现搜索过程底层使用倒排索引
第一个层面：--分词--匹配--排序--

第一步分词

第二步匹配

第三步排序

第二个层面

底层从文档内容里面抽取很多关键词条，根据词条创建索引 ，结构：词条 + 对应文档

三、核心概念

ES(mysql)--索引库(数据库)--Type类型默认_doc(表)--文档json格式(记录)--字段

在mysql数据库里面

创建数据库，在数据库创建表（表里面包含字段和记录..）

在es里面操作

创建索引库，在索引库创建类型（相当于mysql里面表），在每个类型里面有很多文档，每个文档里面有字段

索引库

操作es首先创建索引库，相当于mysql里面数据库

Type 类型

在es索引库操作类型，类型相当于mysql数据库里面表

目前使用es是8.x，在es8.x版本里面不支持自己创建类型，只能使用默认的类型，名称固定的： _doc

通俗描述：在es的索引库里面只有一个类型（表）

文档

在索引库的类型里面有很多文档，文档格式是json数据格式

字段(Field)

在每个文档里面有很多字段，字段对应具体值

映射(Mapping)

mapping是处理数据的方式和规则方面做一些限制，如：某个字段的数据类型、默认值

四、ES基础操作

1、分词器基本使用

es官方提供分词器，但是对于中文没有很好支持，所以为了中文分词，使用第三方分词工具，目前使用工具 IK分词器
es官方提供的分词器，不支持中文，单字分割

IK分词器：ik_smart 和 ik_max_word

2.索引操作

#创建索引库
PUT /my_index
#查看所有索引
GET /_cat/indices?v
#查看单个索引
get /my_test
#删除索引库
DELETE /my_test

3.文档基本操作

# 添加创建文档
PUT /my_test/_doc/1
{
  "title": "小米手机",
  "category": "小米",
  "images": "http://www.gulixueyuan.com/xm.jpg",
  "price": 3999
}
# 查看文档
GET /my_test/_doc/1
# 查看索引文档
GET /my_test/_search
#修改文档
PUT /my_test/_doc/1
{
  "title": "小米手机",
  "category": "小米",
  "images": "http://www.gulixueyuan.com/xm.jpg",
  "price": 4500
}
#修改局部属性
POST /my_test/_update/1
{
  "doc": {
    "price": 4500
  }
}
# 删除
DELETE /my_test/_doc/1

4.映射mapping

创建数据库表需要设置字段名称，类型，长度，约束等；
es 的索引库也一样，需要知道这个类型下有哪些字段，每个字段有哪些约束信息，这就叫做映射(mapping)。
es 中能分词的字段，这个字段数据类型必须是 text！keyword 不分词！
查看映射 GET /my_test/_mapping

（1）动态映射

-- 根据添加文档数据，自动识别类型，创建映射

（2）静态映射

-- 在创建索引库时候，就设置好字段类型或者约束等信息

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "index": true,
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"   # 利用ik分词器对字段进行分词
      },
      "category": {
        "type": "keyword",         # keyword表示不进行分词
        "index": true,
        "store": true
      },
      "images": {
        "type": "keyword",
        "index": true,
        "store": true
      },
      "price": {
        "type": "integer",
        "index": true,
        "store": true
      }
    }
  }
}

五、DSL高级查询

Query DSL概述: Domain Specific Language(领域专用语言)，Elasticsearch提供了基于JSON的DSL来定义查询。

DSL作用：专门用在es里面，查询es里面json格式文档

创建索引库时，应该使用静态映射(可以指定分词规则)；因为动态映射使用的是默认分词器，不支持ik分词器

若term关键字精确查询的字段，还进行了分词，会导致查询的结果为空，因为term和分词器会有冲突。

1.查询所有文档 match_all

# 查询所有文档
POST /my_index/_search
{
  "query": {
    "match_all": {}
  }
}

2.匹配查询 match

# 匹配查询
POST /my_index/_search
{
  "query": {
    "match": {
      "title": "华为智能手机"
    }
  }
}

3.多字段匹配 multi_match

# 多字段匹配
POST /my_index/_search
{
  "query": {
    "multi_match": {
      "query": "华为智能手机",
      "fields": ["title","category"]
    }
  }
}

4.精确查询 term

如果使用term不能进行分词查询，查询字段不能设置分词器规则，如果设置字段分词器，term和分词器冲突了，结果不正确，为空
term关键字和分词器有冲突的，不能一起使用

# 关键字精确查询
## 不进行分词查询
POST /my_index/_search
{
  "query": {
    "term": {
      "title": {
        "value": "华为手机"
      }
    }
  }
}

5.范围查询 range

范围查询使用range。

gte: 大于等于
lte: 小于等于
gt: 大于
lt: 小于

# 范围查询
POST /my_index/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 5500,
        "lte": 6000
      }
    }
  }
}

6.指定返回字段 _source

# 指定返回字段
POST /my_index/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 5500,
        "lte": 6000
      }
    }
  },
  "_source": ["title","category"]
}

7.组合查询 bool

es里面实现组合查询，使用关键字 bool

bool 各个条件之间有and,or或not的关系

must: 各个条件bool都必须满足，所有条件是and的关系
should: 各个条件有一个满足即可，即各条件是or的关系
must_not: 不满足所有条件，即各条件是not的关系
filter: 与must效果等同，但是它不计算得分，效率更高点。

must (and关系)

## must  and关系
POST /my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "华为"
          }
        },
        {
          "range": {
            "price": {
              "gte": 5000,
              "lte": 6000
            }
          }
        }
      ]
    }
  }
}

should (or关系)

# should or关系
POST /my_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "华为"
          }
        },
        {
          "range": {
            "price": {
              "gte": 2000,
              "lte": 6000
            }
          }
        }
      ]
    }
  }
}

must not (条件都不满足)

POST /my_index/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "title": "华为"
          }
        },
        {
          "range": {
            "price": {
              "gte": 3000,
              "lte": 5000
            }
          }
        }
      ]
    }
  }
}

filter (效率高,不计得分)

filter和must效果相同的，也是and关系
但是filter和must区别是：filter查询过程中不计算得分(匹配度)，效率更高

POST /my_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "title": "华为"
          }
        }
      ]
    }
  }
}

8.聚合操作

max

# 聚合 max
POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "max_price": {
      "max": {
        "field": "price"
      }
    }
  }
}

min

POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "min_price": {
      "min": {
        "field": "price"
      }
    }
  }
}

avg

POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0, 
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

sum

POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0, 
  "aggs": {
    "sum_price": {
      "sum": {
        "field": "price"
      }
    }
  }
}

stats

POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0, 
  "aggs": {
    "stats_price": {
      "stats": {
        "field": "price"
      }
    }
  }
}

9.分组 -- terms

# 聚合 分组
POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "groupby_category": {
      "terms": {
        "field": "category",
        "size": 10
      }
    }
  }
}

10.排序 -- sort

# 排序
POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ]
}

11.分页 -- from+size

# limit后面第一个参数：数据开始位置   (当前页-1)*每页显示记录数
# limit后面第二个参数：每页显示记录数
select * from users limit 0,3

分页的两个关键属性:from、size。

from: 当前页的起始索引，默认从0开始。 from = (pageNum - 1) * size
size: 每页显示多少条

# 分页
POST /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 2
}

12.高亮查询 -- highlight

# 高亮显示
POST /my_index/_search
{
  "query": {
    "match": {
      "title": "华为手机"
    }
  },
  "highlight": {
    "fields": {
      "title": {}
    },
    "pre_tags": ["<font color:#e4393c>"],
    "post_tags": ["</font>"]
  }
}

六、nested 介绍

在es创建映射时候，字段支持很多种类型，比如text、long等等，除了这些常见类型之后，有特殊数据类型：nested
如果数据有复杂数据，默认使用Object类型
比如复杂数据 [{...},{....}]

使用特殊类型nested，将数组中的每个对象索引为单独的隐藏文档

PUT my_comment_index
{
  "mappings": {
      "properties": {
        "comments": {
          "type": "nested"   // 因为comments包含复杂数据，故设置字段的特殊数据类型映射
        }
    }
  }
}

PUT my_comment_index/_doc/1
{
  "title": "狂人日记",
  "body": "《狂人日记》是一篇象征性和寓意很强的小说，当时，鲁迅对中国国民精神的麻木愚昧颇感痛切。",
  "comments": [
    {
      "name": "张三",
      "age": 34,
      "rating": 8,
      "comment": "非常棒的文章",
      "commented_on": "30 Nov 2023"
    },
    {
      "name": "李四",
      "age": 38,
      "rating": 9,
      "comment": "文章非常好",
      "commented_on": "25 Nov 2022"
    },
    {
      "name": "王五",
      "age": 33,
      "rating": 7,
      "comment": "手动点赞",
      "commented_on": "20 Nov 2021"
    }
  ]
}

GET /my_comment_index/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "comments.name": "李四"
              }
            },
            {
              "match": {
                "comments.age": 34
              }
            }
          ]
        }
      }
    }
  }
}

查询结果：居然正常的响应结果了

原因分析：comments 字段默认的数据类型是Object，故我们的文档内部存储为： { "title": [ 狂人日记], "body": [ 《狂人日记》是一篇象征性和寓意很强的小说，当时... ], "comments.name": [ 张三, 李四, 王五 ], "comments.comment": [ 非常棒的文章,文章非常好,王五,... ], "comments.age": [ 33, 34, 38 ], "comments.rating": [ 7, 8, 9 ] }

我们可以清楚地看到，comments.name和comments.age之间的关系已丢失。这就是为什么我们的文档匹配李四和34的查询。

结果发现没有返回任何的文档，这是何故？

当将字段设置为nested 嵌套对象将数组中的每个对象索引为单独的隐藏文档，这意味着可以独立于其他对象查询每个嵌套对象。文档的内部表示：

{ { "comments.name": [ 张三], "comments.comment": [ 非常棒的文章 ], "comments.age": [ 34 ], "comments.rating": [ 9 ] }, { "comments.name": [ 李四], "comments.comment": [ 文章非常好 ], "comments.age": [ 38 ], "comments.rating": [ 8 ] }, { "comments.name": [ 王五], "comments.comment": [手动点赞], "comments.age": [ 33 ], "comments.rating": [ 7 ] }, { "title": [ 狂人日记 ], "body": [ 《狂人日记》是一篇象征性和寓意很强的小说，当时，鲁迅对中国... ] } }

每个内部对象都在内部存储为单独的隐藏文档。这保持了他们的领域之间的关系。

七、Java Api操作ES

Java操作es有两种方式

第一种方式：使用es自带原生api进行操作

第二种方式：SpringBoot整合es操作（SpringData）

使用原生api操作适合复杂查询操作
使用SpringData适合简单es操作

（一）es原生api操作

第一步创建springboot，引入依赖

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.0.5</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>co.elastic.clients</groupId>
            <artifactId>elasticsearch-java</artifactId>
            <version>8.5.3</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.12.3</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

第二步创建启动类，初始化ElasticsearchClient对象

@SpringBootApplication
public class EsApplication {

    public static void main(String[] args) {
        SpringApplication.run(EsApplication.class,args);
    }

    @Bean
    public ElasticsearchClient buildElasticsearchClient() {
        BasicCredentialsProvider credsProv = new BasicCredentialsProvider();
        credsProv.setCredentials(
                AuthScope.ANY, new UsernamePasswordCredentials("elastic", "111111")
        );

        RestClient restClient = RestClient
                .builder(HttpHost.create("http://192.168.6.128:9200"))
                .setHttpClientConfigCallback(hc -> hc
                        .setDefaultCredentialsProvider(credsProv)
                )
                .build();

        // Create the transport with a Jackson mapper
        ElasticsearchTransport transport = new RestClientTransport(
                restClient, new JacksonJsonpMapper());

        // And create the API client
        ElasticsearchClient esClient = new ElasticsearchClient(transport);
        return esClient;
    }
}

第三步编写测试方法

同一级别点方法，下一级别lamba表达式

@SpringBootTest
public class ElasticsearchDemoApplicationTests {

    //注入ElasticsearchClient
    @Autowired
    private ElasticsearchClient elasticsearchClient;

    @Test
    public void find() throws IOException {

        //创建SearchRequest对象
        SearchRequest.Builder request = new SearchRequest.Builder();

        request.index("my_index").query(
                f->f.match(
                        f1->f1.field("title").query("华为")));

        SearchResponse<Object> response =
                elasticsearchClient.search(request.build(), Object.class);
        
       // elasticsearchClient.search(dsl语句,返回结果类型)
//        POST /my_index/_search
//        {
//            "query": {
//              "match": {
//                "title": "华为智能手机"
//              }
//        }
//        }
//            SearchResponse<Object> response = elasticsearchClient.search(
//                    s -> s.index("my_index").query(
//                            f -> f.match(
//                                    f1 -> f1.field("title").query("华为智能手机")
//                            )),
//                    Object.class);

            //List<Hit<Object>> hits = response.hits().hits();/
            for (Hit<Object> hit : response.hits().hits()) {
                System.out.println(hit.source());
            }
    }
}

（二） SpringData整合ES

在Spring里面有模块 SpringData，使用SpringData可以操作数据库，包含mysql，Redis，ES.......

第一步创建SpringBoot项目，引入依赖

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.0.5</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.12.3</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
    </dependencies>

第二步创建启动类

@SpringBootApplication
public class EsDataApplication {

    public static void main(String[] args) {
        SpringApplication.run(EsDataApplication.class,args);
    }
}

第三步创建配置文件

server:
  port: 8502
mybatis-plus:
  configuration:
    log-impl: org.apache.ibatis.logging.stdout.StdOutImpl # 查看日志
feign:
  sentinel:
    enabled: true
spring:
  main:
    allow-bean-definition-overriding: true #当遇到同样名字的时候，是否允许覆盖注册
  cloud:
    sentinel:
      transport:
        dashboard: localhost:8080
    openfeign:
      lazy-attributes-resolution: true
      client:
        config:
          default:
            connectTimeout: 30000
            readTimeout: 30000
            loggerLevel: basic
  elasticsearch:
    uris: http://192.168.6.128:9200
    username: elastic
    password: 111111
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8

第四步创建实体类

@Data
@Document(indexName = "user") //索引库名
@JsonIgnoreProperties(ignoreUnknown = true)
public class User implements Serializable {

    @Id
    private Long id;

    //  es 中能分词的字段，这个字段数据类型必须是 text！keyword 不分词！ analyzer = "ik_max_word"
    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String username;

    @Field(type = FieldType.Keyword)
    private String address;

    @Field(type = FieldType.Long)
    private Long age;

    @Field(type = FieldType.Date,format = DateFormat.date_time, pattern = "yyyy-MM-dd HH:mm:ss")
    private Date createTime;
}

第五步创建interface继承ElasticsearchRepository

// ElasticsearchRepository<实体类,id类型>
public interface UserRepository extends ElasticsearchRepository<User,Long> {
}

第六步测试方法

@SpringBootTest
public class EsApplicationTest {

    @Autowired
    private UserRepository userRepository;

    @Autowired
    private ElasticsearchClient elasticsearchClient;

    //添加 / 修改
    @Test
    public void add() {
        User user = new User();
        user.setId(1l);
        user.setUsername("lucyatguigu");
        user.setAddress("China");
        user.setAge(20l);
        user.setCreateTime(new Date());

        userRepository.save(user);
    }

    //删除
    @Test
    public void delete() {
        User user = new User();
        user.setId(1l);
        userRepository.delete(user);
    }
}