ElasticSearch——从初识到结识

飞鱼同学

已于 2022-06-01 22:52:55 修改

阅读量605

点赞数

分类专栏：笔记&Bug 文章标签： elasticsearch java 搜索引擎

于 2022-06-01 22:39:35 首次发布

本文链接：https://blog.csdn.net/qq_42487459/article/details/125079211

版权

笔记&Bug 专栏收录该内容

15 篇文章 1 订阅

订阅专栏

本文介绍了Elasticsearch的安装过程，包括可视化工具elasticsearch-head和kibana，以及IK分词器的安装。还详细讲解了REST风格操作，如索引、文档管理、SpringBoot整合等内容，适合Java开发者快速上手。

摘要由CSDN通过智能技术生成

特别说明：笔记记录 B站狂神说Java的ElasticSearch课程：https://www.bilibili.com/video/BV17a4y1x7zq

文章有点小长，没时间的建议 Ctrl+F 直接搜索

1、ElasticSearch概述

官网：https://www.elastic.co/cn/downloads/elasticsearch

Elaticsearch，简称为es，es是一个开源的高扩展的分布式全文检索引擎，它可以近乎实时的存储、检索数据;本身扩展性很好，可以扩展到上百台服务器，处理PB级别(大数据时代）的数据。es也使用java开发并使用Lucene作为其核心来实现所有索引和搜索的功能，但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性，从而让全文搜索变得简单。

2、安装ElasticSearch

要求：

JDK8，最低要求
使用Java开发，必须保证ElasticSearch的版本与Java的核心jar包版本对应！（Java环境保证没错）

以下为windows安装样例
1、安装
下载地址：https://www.elastic.co/cn/downloads/

历史版本下载：https://www.elastic.co/cn/downloads/past-releases/

解压即可（尽量将ElasticSearch相关工具放在统一目录下）
在这里插入图片描述
目录说明：

bin 启动文件目录
config 配置文件目录
    1og4j2 日志配置文件
    jvm.options java 虚拟机相关的配置(默认启动占1g内存，内容不够需要自己调整)
    elasticsearch.ym1 elasticsearch 的配置文件! 默认9200端口!跨域!
1ib 
    相关jar包
modules 功能模块目录
plugins 插件目录
    ik分词器

2、启动
一定要检查自己的java环境是否配置好
在这里插入图片描述
访问localhost:9200 出现下面数据，代表成功

Ⅰ 安装可视化界面elasticsearch-head

使用前提：

需要安装nodejs

1、下载地址
https://github.com/mobz/elasticsearch-head

2、安装
解压即可（尽量将ElasticSearch相关工具放在统一目录下）

3、启动

cd elasticsearch-head
#安装依赖
npm install
# 启动
npm run start
# 访问
http://localhost:9100/

4.访问
存在跨域问题（只有当两个页面同源，才能交互）

同源（端口，主机，协议三者都相同）

什么是跨域？跨域解决方法：https://blog.csdn.net/qq_38128179/article/details/84956552
在这里插入图片描述
开启跨域（在elasticsearch解压目录config下elasticsearch.yml中添加）

# 开启跨域
http.cors.enabled: true
# 所有人访问
http.cors.allow-origin: "*"

重启ElasticSeach
在这里插入图片描述
这个head，我们只是把它当做可视化数据展示工具，之后所有的查询都在kibana中进行

Ⅱ 安装kibana

Kibana是一个针对ElasticSearch的开源分析及可视化平台,用来搜索、查看交互存储在Elasticsearch索引中的数据。使用Kibana ,可以通过各种图表进行高级数据分析及展示。Kibana让海量数据更容易理解。它操作简单,基于浏览器的用户界面可以快速创建仪表板( dashboard )实时显示Elasticsearch查询动态。设置Kibana非常简单。无需编码或者额外的基础架构,几分钟内就可以完成Kibana安装并启动Elasticsearch索引监测。
1、下载地址:
注意：下载的版本需要与ElasticSearch版本对应
地址：
https://www.elastic.co/cn/downloads/

历史版本下载：
https://www.elastic.co/cn/downloads/past-releases/

2.安装

解压即可（尽量将ElasticSearch相关工具放在统一目录下）

3.启动
在这里插入图片描述
4.访问

开发工具：

5、kibana汉化
编辑器打开kibana解压目录/config/kibana.yml，添加

i18n.locale: "zh-CN"

重启kibana

在这里插入图片描述

Ⅲ 安装IK分词器

IK分词器：中文分词器

分词：即把一段中文或者别的划分成一个个的关键字，我们在搜索时候会把自己的信息进行分词，会把数据库中或者索引库中的数据进行分词，然后进行一一个匹配操作，默认的中文分词是将每个字看成一个词（不使用用IK分词器的情况下），比如“我爱狂神”会被分为”我”，”爱”，”狂”，”神” ，这显然是不符合要求的，所以我们需要安装中文分词器ik来解决这个问题。

IK提供了两个分词算法: ik_smart和ik_max_word ,其中ik_smart为最少切分, ik_max_word为最细粒度划分!

1、下载

版本要与ElasticSearch版本对应

下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

2、安装

解压即可（但是我们需要解压到ElasticSearch的plugins目录ik文件夹下）（ik自己新建的）

3、重启ElasticSearch

加载了IK分词器
在这里插入图片描述

使用 ElasticSearch安装补录/bin/elasticsearch-plugin 可以查看插件

在这里插入图片描述

5、使用kibana测试

**ik_smart：**最少切分
在这里插入图片描述
**ik_max_word：**最细粒度划分（穷尽词库的可能）

在这里插入图片描述

6、添加自定义的词添加到扩展字典中

elasticsearch目录/plugins/ik/config/IKAnalyzer.cfg.xml

在这里插入图片描述

创建字典文件，添加字典内容

在这里插入图片描述
重启服务测试

好了软件到此安装完成下面详细介绍

3.Rest风格说明

基本命令说明
在这里插入图片描述

4.详识elasticSearch

ElasticSearch是面向文档，关系行数据库和ElasticSearch客观对比！一切都是JSON！
在这里插入图片描述
elasticsearch（集群）中可以包含多个索引（数据库） ,每个索引中可以包含多个类型（表） ,每个类型下又包含多个文档（行） ,每个文档中又包含多个字段（列）。

1.物理设计:
elasticsearch在后台把每个索引划分成多个分片，每分分片可以在集群中的不同服务器间迁移。

一个人就是一个集群! ，即启动的ElasticSearch服务，默认就是一个集群，且默认****集群名为elasticsearch

2.倒排索引（Lucene索引底层）

简单说就是 按（文章关键字，对应的文档\<0个或多个\>）形式建立索引，根据关键字就可直接查询对应的文档（含关键字的），无需查询每一个文档，如下图

在这里插入图片描述

5.测试

Ⅰ 索引操作

1 创建一个索引，添加

PUT /test1/type1/1
{
  "name" : "飞鱼",
  "age" : 18
}

2 指定字段的类型（使用PUT）
类似于建库（建立索引和字段对应类型），也可看做规则的建立

PUT /test2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age":{
        "type": "long"
      },
      "birthday":{
        "type": "date"
      }
    }
  }
}

3 获取建立的规则

GET test2

4 获取默认信息
_doc 默认类型（default type），type 在未来的版本中会逐渐弃用，因此产生一个默认类型进行代替

PUT /test3/_doc/1
{
  "name": "飞鱼",
  "age": 18,
  "birth": "1999-10-10"
}
GET test3

如果自己的文档字段没有被指定，那么ElasticSearch就会给我们默认配置字段类型

扩展：通过get _cat/ 可以获取ElasticSearch的当前的很多信息！

GET _cat/indices
GET _cat/aliases
GET _cat/allocation
GET _cat/count
GET _cat/fielddata
GET _cat/health
GET _cat/indices
GET _cat/master
GET _cat/nodeattrs
GET _cat/nodes
GET _cat/pending_tasks
GET _cat/plugins
GET _cat/recovery
GET _cat/repositories
GET _cat/segments
GET _cat/shards
GET _cat/snapshots
GET _cat/tasks
GET _cat/templates
GET _cat/thread_pool

Ⅱ 文档操作

1 修改

POST /test3/_doc/1/_update
{
  "doc":{
    "name" : "post修改，version不会加一",
    "age" : 2
  }
}
GET /test3/_doc/1

2 删除

GET /test1
DELETE /test1

3 查询（简单条件）

GET /test3/_doc/_search?q=name:飞鱼

4 复杂查询

match：匹配（会使用分词器解析（先分析文档，然后进行查询））
_source：过滤字段
sort：排序
form、size 分页

  // 查询匹配
  GET /blog/user/_search
  {
    "query":{
      "match":{
        "name":"飞"
      }
    }
    ,
    "_source": ["name","desc"]
    ,
    "sort": [
      {
        "age": {
          "order": "asc"
        }
      }
    ]
    ,
    "from": 0
    ,
    "size": 1
  }

5 多条件查询（bool）

must 相当于 and
should 相当于 or
must_not 相当于 not (… and …)
filter 过滤

/// bool 多条件查询
 must <==> and
 should <==> or
 must_not <==> not (... and ...)
 filter数据过滤
 boost
 minimum_should_match
GET /blog/user/_search
{
  "query":{
    "bool": {
      "must": [
        {
          "match":{
            "age":3
          }
        },
        {
          "match": {
            "name": "飞"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gte": 1,
            "lte": 3
          }
        }
      }
    }
  }
}

6 匹配数组

貌似不能与其它字段一起使用
可以多关键字查（空格隔开）— 匹配字段也是符合的
match 会使用分词器解析（先分析文档，然后进行查询）
搜词

// 匹配数组 貌似不能与其它字段一起使用
// 可以多关键字查（空格隔开）
// match 会使用分词器解析（先分析文档，然后进行查询）
GET /blog/user/_search
{
  "query":{
    "match":{
      "desc":"年龄 牛 大"
    }
  }
}

7 精确查询

term 直接通过倒排索引指定词条查询
适合查询 number、date、keyword ，不适合text

// 精确查询（必须全部都有，而且不可分，即按一个完整的词查询）
// term 直接通过 倒排索引 指定的词条 进行精确查找的
GET /blog/user/_search
{
  "query":{
    "term":{
      "desc":"年 "
    }
  }
}

8 text和keyword

①. text

支持分词，全文检索、支持模糊、精确查询,不支持聚合,排序操作;
text类型的最大支持的字符长度无限制,适合大字段存储；
②. keyword
不进行分词，直接索引、支持模糊、支持精确匹配，支持聚合、排序操作。
keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度，超过给定长度后的数据将不被索引，无法通过term精确匹配检索返回结果。

// 测试keyword和text是否支持分词
// 设置索引类型
PUT /test
{
  "mappings": {
    "properties": {
      "text":{
        "type":"text"
      },
      "keyword":{
        "type":"keyword"
      }
    }
  }
}
// 设置字段数据
PUT /test/_doc/1
{
  "text":"测试keyword和text是否支持分词",
  "keyword":"测试keyword和text是否支持分词"
}
// text 支持分词
// keyword 不支持分词
GET /test/_doc/_search
{
  "query":{
   "match":{
      "text":"测试"
   }
  }
}// 查的到
GET /test/_doc/_search
{
  "query":{
   "match":{
      "keyword":"测试"
   }
  }
}// 查不到，必须是 "测试keyword和text是否支持分词" 才能查到
GET _analyze
{
  "analyzer": "keyword",
  "text": ["测试liu"]
}// 不会分词，即 测试liu
GET _analyze
{
  "analyzer": "standard",
  "text": ["测试liu"]
}// 分为 测 试 liu
GET _analyze
{
  "analyzer":"ik_max_word",
  "text": ["测试liu"]
}// 分为 测试 liu

9 高亮查询

/// 高亮查询
GET blog/user/_search
{
  "query": {
    "match": {
      "name":"流"
    }
  }
  ,
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}
// 自定义前缀和后缀
GET blog/user/_search
{
  "query": {
    "match": {
      "name":"流"
    }
  }
  ,
  "highlight": {
    "pre_tags": "<p class='key' style='color:red'>",
    "post_tags": "</p>", 
    "fields": {
      "name": {}
    }
  }
}

6.SpringBoot整合

Ⅰ 基本环境

1 统一版本

<properties>
    <java.version>1.8</java.version>
    <!-- 统一版本 -->
    <elasticsearch.version>7.6.1</elasticsearch.version>
</properties>

导入elasticsearch

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>


<!-- 提前导入fastjson、lombok -->
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.70</version>
</dependency>
<!-- lombok需要安装插件 -->
<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <optional>true</optional>
</dependency>

2 创建并编写配置类

@Configuration
public class ElasticSearchConfig {
    // 注册 rest高级客户端 
    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1",9200,"http")
                )
        );
        return client;
    }
}

3 创建并编写实体类

@Data
@NoArgsConstructor
@AllArgsConstructor
public class User implements Serializable {
    private static final long serialVersionUID = -3843548915035470817L;
    private String name;
    private Integer age;
}

4 测试
全部在测试包里面进行
注入RestHighLevelClient

@Autowired
public RestHighLevelClient restHighLevelClient;

Ⅱ 索引操作

** 1索引的创建**

// 测试索引的创建， Request PUT liuyou_index
@Test
public void testCreateIndex() throws IOException {
    CreateIndexRequest request = new CreateIndexRequest("liuyou_index");
    CreateIndexResponse response = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
    System.out.println(response.isAcknowledged());// 查看是否创建成功
    System.out.println(response);// 查看返回对象
    restHighLevelClient.close();
}

2 索引的获取，并判断其是否存在

// 测试获取索引，并判断其是否存在
@Test
public void testIndexIsExists() throws IOException {
    GetIndexRequest request = new GetIndexRequest("index");
    boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);
    System.out.println(exists);// 索引是否存在
    restHighLevelClient.close();
}

3 索引的删除

// 测试索引删除
@Test
public void testDeleteIndex() throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("liuyou_index");
    AcknowledgedResponse response = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
    System.out.println(response.isAcknowledged());// 是否删除成功
    restHighLevelClient.close();
}

Ⅲ 文档的操作

** 1 文档的添加**

// 测试添加文档(先创建一个User实体类，添加fastjson依赖)
@Test
public void testAddDocument() throws IOException {
    // 创建一个User对象
    User liuyou = new User("liuyou", 18);
    // 创建请求
    IndexRequest request = new IndexRequest("liuyou_index");
    // 制定规则 PUT /liuyou_index/_doc/1
    request.id("1");// 设置文档ID
    request.timeout(TimeValue.timeValueMillis(1000));// request.timeout("1s")
    // 将我们的数据放入请求中
    request.source(JSON.toJSONString(liuyou), XContentType.JSON);
    // 客户端发送请求，获取响应的结果
    IndexResponse response = restHighLevelClient.index(request, RequestOptions.DEFAULT);
    System.out.println(response.status());// 获取建立索引的状态信息 CREATED
    System.out.println(response);// 查看返回内容 IndexResponse[index=liuyou_index,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
}

2 文档信息的获取

// 测试获得文档信息
@Test
public void testGetDocument() throws IOException {
    GetRequest request = new GetRequest("liuyou_index","1");
    GetResponse response = restHighLevelClient.get(request, RequestOptions.DEFAULT);
    System.out.println(response.getSourceAsString());// 打印文档内容
    System.out.println(request);// 返回的全部内容和命令是一样的
    restHighLevelClient.close();
}

3 文档的获取，并判断其是否存在

// 获取文档，判断是否存在 get /liuyou_index/_doc/1
@Test
public void testDocumentIsExists() throws IOException {
    GetRequest request = new GetRequest("liuyou_index", "1");
    // 不获取返回的 _source的上下文了
    request.fetchSourceContext(new FetchSourceContext(false));
    request.storedFields("_none_");
    boolean exists = restHighLevelClient.exists(request, RequestOptions.DEFAULT);
    System.out.println(exists);
}

4文档的更新

// 测试更新文档内容
@Test
public void testUpdateDocument() throws IOException {
    UpdateRequest request = new UpdateRequest("liuyou_index", "1");
    User user = new User("lmk",11);
    request.doc(JSON.toJSONString(user),XContentType.JSON);
    UpdateResponse response = restHighLevelClient.update(request, RequestOptions.DEFAULT);
    System.out.println(response.status()); // OK
    restHighLevelClient.close();
}

5 文档的删除

// 测试删除文档
@Test
public void testDeleteDocument() throws IOException {
    DeleteRequest request = new DeleteRequest("liuyou_index", "1");
    request.timeout("1s");
    DeleteResponse response = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
    System.out.println(response.status());// OK
}

6 文档的查询

// 查询
// SearchRequest 搜索请求
// SearchSourceBuilder 条件构造
// HighlightBuilder 高亮
// TermQueryBuilder 精确查询
// MatchAllQueryBuilder
// xxxQueryBuilder ...
@Test
public void testSearch() throws IOException {
    // 1.创建查询请求对象
    SearchRequest searchRequest = new SearchRequest();
    // 2.构建搜索条件
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    // (1)查询条件 使用QueryBuilders工具类创建
    // 精确查询
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "liuyou");
    //        // 匹配查询
    //        MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
    // (2)其他<可有可无>：（可以参考 SearchSourceBuilder 的字段部分）
    // 设置高亮
    searchSourceBuilder.highlighter(new HighlightBuilder());
    //        // 分页
    //        searchSourceBuilder.from();
    //        searchSourceBuilder.size();
    searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    // (3)条件投入
    searchSourceBuilder.query(termQueryBuilder);
    // 3.添加条件到请求
    searchRequest.source(searchSourceBuilder);
    // 4.客户端查询请求
    SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    // 5.查看返回结果
    SearchHits hits = search.getHits();
    System.out.println(JSON.toJSONString(hits));
    System.out.println("=======================");
    for (SearchHit documentFields : hits.getHits()) {
        System.out.println(documentFields.getSourceAsMap());
    }
}

7 注意
前面的操作都无法批量添加数据，而已多次操作，只会保留最后一个

// 上面的这些api无法批量增加数据（只会保留最后一个source）
@Test
public void test() throws IOException {
    IndexRequest request = new IndexRequest("bulk");// 没有id会自动生成一个随机ID
    request.source(JSON.toJSONString(new User("liu",1)),XContentType.JSON);
    request.source(JSON.toJSONString(new User("min",2)),XContentType.JSON);
    request.source(JSON.toJSONString(new User("kai",3)),XContentType.JSON);
    IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT);
    System.out.println(index.status());// created
}

8 批量添加数据

// 特殊的，真的项目一般会 批量插入数据
@Test
public void testBulk() throws IOException {
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.timeout("10s");
    ArrayList<User> users = new ArrayList<>();
    users.add(new User("liuyou-1",1));
    users.add(new User("liuyou-2",2));
    users.add(new User("liuyou-3",3));
    users.add(new User("liuyou-4",4));
    users.add(new User("liuyou-5",5));
    users.add(new User("liuyou-6",6));
    // 批量请求处理
    for (int i = 0; i < users.size(); i++) {
        bulkRequest.add(
                // 这里是数据信息
                new IndexRequest("bulk")
                        .id(""+(i + 1)) // 没有设置id 会自定生成一个随机id
                        .source(JSON.toJSONString(users.get(i)),XContentType.JSON)
        );
    }
    BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
    System.out.println(bulk.status());// ok
}