搜索和分析引擎Elasticsearch

最新推荐文章于 2022-12-18 20:43:22 发布

陈慌拥啊

最新推荐文章于 2022-12-18 20:43:22 发布

阅读量267

点赞数 1

分类专栏： java 文章标签： elasticsearch 搜索引擎 java

本文链接：https://blog.csdn.net/c4013/article/details/118404638

版权

java 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

最近公司项目需要用到Elasticsearch，老大可能也是看我们都是刚毕业的新人，提前让我们学习。最近也是搜集各种资料文档，学习了一番。受教于 狂神说和 尚硅谷，感谢他们的资源分享。下面开始步入正题。

Elasticsearch是什么？

在这里拓展一些内容：Lucene,用于文本搜索的函数库，由Doug Cutting的美国工程师开发出来，Lucene是用JAVA写成的，目标是为各种中小型应用软件加入全文检索功能。因为好用而且开源（代码公开），非常受程序员们的欢迎。Lucene是一套信息检索工具包，并不包含搜索引擎系统，它包含了索引结构、读写索引工具、相关性工具、排序等功能，因此在使用Lucene时仍需要关注搜索引擎系统，例如数据获取、解析、分词等方面的东西。
为什么要给大家介绍下Lucene呢，因为即将要学习的elasticsearch都是基于该工具包做的一些封装和增强罢了~

The Elastic Stack

包括 Elasticsearch、Kibana、Beats 和 Logstash（也称为 ELK Stack）。能够安全可靠地获取任何来源、任何格式的数据，然后实时地对数据进行搜索、分析和可视化。Elaticsearch，简称为 ES， ES 是一个，开源的高扩展的分布式全文搜索引擎，是整个 ElasticStack 技术栈的核心。它可以近乎实时的存储、检索数据；本身扩展性很好，可以扩展到上百台服务器，处理 PB 级别的数据。es也使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能，但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性，从而让全文搜索变得简单。

ES所需工具安装及用法

这里我用的是windows版，目前最新版7.13.2版本

工具	作用
elasticsearch-7.13.2-windows-x86_64	elasticsearch客户端和服务端
elasticsearch-analysis-ik-7.13.2	分词器
elasticsearch-head-master	图形化界面插件
kibana-7.13.2-windows-x86_64	可视化平台

提供网盘下载链接:链接：https://pan.baidu.com/s/1Z_khsfCwb8dPL3CJoqAYzg 提取码：cily

es安装使用:

解压下载好的压缩包双击bin目录下的elasticsearch.bat文件启动即可
控制台显示的日志: 在这里插入图片描述
然后在浏览器访问：http://localhost:9200 得到如下信息，说明安装成功了:

安装es的图形化界面插件客户端

注意：需要NodeJS的环境
解压elasticsearch-head-master.zip之后安装依赖！

cnpm install
npm run start

这将启动在端口9100上运行的本地web服务器，为elasticsearch-head服务！访问测试：
在这里插入图片描述
注意图中篮圈中信息：集群健康值：绿色：集群完整，黄色：单点正常，集群不完整，红色：单点不正常。
首次操作可能会出现：集群健康值：未连接的现象。此原因是因为由于es进程和客户端进程端口号不同，存在跨域问题，所以我们要在es的配置文件中配置下跨域问题：
在elasticsearch.yml文件中添加跨域配置：

http.cors.enabled: true
http.cors.allow-origin: "*"

安装可视化平台Kibana

解压下载好的压缩包双击bin目录下的kibana.bat文件启动即可(当然es也必须启动着）
此时访问localhost:5601 打开是英文界面,只需要在配置文件 kibana.yml 中加入

i18n.locale: "zh-CN"

重启再访问，此时界面就是中文了，如下图：
在这里插入图片描述

ES核心

Elasticsearch 是面向文档型数据库，一条数据在这里就是一个文档。便于理解可以与关系型数据库相比较

ES	MySQL
Index(索引）	Database(数据库）
	Table(表)
Documents(文档)	Row(行)
Fields(字段)	Column(列)

从表中能发现，与数据库Table(表) 相对应的数据格式我没写，是因为ES之前的版本（7.X）中删除了Type的概念。

es倒排索引

elasticsearch使用的是一种称为倒排索引的结构，采用Lucene倒排索作为底层。这种结构适用于快速的全文搜索，一个索引由文档中所有不重复的列表构成，对于每一个词，都有一个包含它的文档列表。
看一个示例，比如我们通过博客标签来搜索博客文章。那么倒排索引列表就是这样的一个结构 :
在这里插入图片描述如果要搜索含有 python 标签的文章，那相对于查找所有原始数据而言，查找倒排索引后的数据将会快的多。只需要查看标签这一栏，然后获取相关的文章ID即可。

ES基础操作

ik分词器插件

ik分词器是什么？
分词：即把一段中文或者别的划分成一个个的关键字，我们在搜索时候会把自己的信息进行分词，会把数据库中或者索引库中的数据进行分词，然后进行一个匹配操作，默认的中文分词是将每个字看成一个词。比如 “我爱慌拥” 会被分为"我",“爱”,“慌”,“拥”，这显然是不符合要求的，所以我们需要安装中文分词器ik来解决这个问题。

IK提供了两个分词算法：ik_smart 和 ik_max_word，其中 ik_smart 为最少切分，ik_max_word为最细粒度划分。

ik分词器的安装
下载后解压，并将目录拷贝到ElasticSearch根目录下的 plugins 目录中。

在kibana中测试ik分词器

k_max_word : 细粒度分词，会穷尽一个语句中所有分词可能

ik_smart : 粗粒度分词，优先匹配最长词。

自定义词库，让系统识别我们想要的词
（1）进入elasticsearch/plugins/ik/config目录
（2）新建一个my.dic文件，编辑内容：
（3）修改IKAnalyzer.cfg.xml（在ik/config目录下）

<properties>
  <comment>IK Analyzer 扩展配置</comment>
 <!-- 用户可以在这里配置自己的扩展字典 -->
  <entry key="ext_dict">my.dic</entry>
  <!-- 用户可以在这里配置自己的扩展停止词字典 -->
  <entry key="ext_stopwords"></entry>
</properties>

Rest风格说明

一种软件架构风格，而不是标准，只是提供了一组设计原则和约束条件。它主要用于客户端和服务器交互类的软件。基于这个风格设计的软件可以更简洁，更有层次，更易于实现缓存等机制。
基本Rest命令说明：

method	url地址	描述
PUT	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST	localhost:9200/索引名称/类型名称	创建文档（随机文档id）
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
POST	localhost:9200/索引名称/类型名称/_search	查询所有数据
GET	localhost:9200/索引名称/类型名称/文档id	通过文档id查询文档

数据类型

字符串类型：text、keyword
数值类型：long, integer, short, byte, double, float, half_float, scaled_float
日期类型：date
布尔值类型：boolean
二进制类型：binary
…

增删改查

创建数据：

//命令解释： 
//PUT创建命令   /索引/类型/id
PUT /user/_doc/1007
{
"name":"慌拥",
 "sex":"男",
 "age":15
}

返回结果

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could
be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/security
minimal-setup.html to enable security.
{
"_index" : "user",     //索引
"_type" : "_doc",      //类型
"_id" : "1007",        //id
"_version" : 1,        //版本
"result" : "created",  //操作信息
"_shards" : {          //分片信息
		"total" : 2,
		"successful" : 1,
		"failed" : 0
	},
	"_seq_no" : 17,
	"_primary_term" : 8
}

查看es中索引的情况

GET _cat/indices?v

在这里插入图片描述

删除索引

DELETE /user

{
"acknowledged" : true  # 表示删除成功！
}

查看索引

注意：当执行命令时，如果数据不存在，则新增该条数据，如果数据存在则修改该条数据

更新数据POST

POST /user/_doc/1006/_update
{
 "doc":{
   "name":"胡适之",
   "age":18
 }
}

返回结果

{
 "_index" : "user",
 "_type" : "_doc",
 "_id" : "1006",
 "_version" : 2,
 "result" : "updated",
 "_shards" : {
   "total" : 2,
   "successful" : 1,
   "failed" : 0
 },
 "_seq_no" : 19,
 "_primary_term" : 8
}

条件查询：

GET /user/_doc/_search?q=name:李白
通过 _serarch?q=name:李白查询条件是name属性有李白的那些数据。

构建查询：

返回的是一个 hits ，还有 _score得分，根据算法算出和查询条件匹配度高的得分就高。

查询全部：

match_all的值为空，表示没有查询条件
GET /user/_doc/_search
{
  "query":{
    "match_all":{}
   }
}

查询指定数据信息：

GET /user/_doc/_search
{
		"query":{
			"match_all":{ }
		},
		"_source":["name" ，"age"]
}

返回数据：
在这里插入图片描述

排序查询
GET /user/_doc/_search
{
		"query":{
			"match_all":{}
		},
		"sort":[
			{
				"age":{
					"order":"asc/desc"  //  正序/倒序
				}
			}
		]
}
注意:在排序的过程中，只能使用可排序的属性进行排序。
可排序的属性有：
1.数字
2.日期
3.ID

分页查询

GET /user/_doc/_search
{
	"query":{
 		"match_all": {}
	},
	"sort": [
		{
  		"age": {
   			"order": "asc"
 		}
		}
	],
	"from": 0, # 从第n条开始
	"size": 1  # 返回n条数据
}

Filter条件过滤查询

过滤条件的范围用 range 表示
查询name为XX的，age大于10的数据：

GET /user/_doc/_search
{
	"query":{
		"bool": {
			"must": [
				{
					"match": {
						"name": "XX"
					}
				}
			],
			"filter": {
				"range": {
					"age": {
						"gt": 20
					}
				}
			}
		}
	}
}

gt 表示大于
gte 表示大于等于
lt 表示小于
lte 表示小于等于

高亮显示

GET /user/_doc/_search
{
	"query":{
		"match": {
			"name": "杜甫"
		}
	},
	"highlight" :{
		"fields": {
			"name":{}
		}
	}
}

在这里插入图片描述
可以看到图中es已经帮我们加上了em标签，标签我们可以自定义：

>GET /user/_doc/_search
{
	"query":{
		"match": {
			"name": "杜甫"
		}
	},
	"highlight" :{
		"pre_tags":"<b class = 'key' style='color:red'>",
		"post_tags":"</b>",
		"fields": {
			"name":{}
		}
	}
}

Api创建索引及文档

找文档

1.进入官网指导文档：https://www.elastic.co/guide/index.html
2.找到ES Clients（客户端api文档）
在这里插入图片描述
3.使用Java REST风格api

4.REST又分为High Level和Low Level，我们直接选择High Level下面的 Getting started

5.找到Maven依赖

构建项目

1.新建SpringBoot(2.5.2版)项目，导入web依赖

2.配置es的依赖

    <properties>
        <java.version>1.8</java.version>
        <!--		自定义es版本依赖，保证和本地一致-->
        <elasticsearch.version>7.13.2</elasticsearch.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>

SpringBoot内置的es版本可能和所使用的版本不一致，所以这里自定义es版本依赖，避免不必要的错误

3.构建RestHighLevelClient对象

//构建客户端对象
RestHighLevelClient client = new RestHighLevelClient(
  RestClient.builder(
      new HttpHost("localhost", 9200, "http"),
      
 client.close();//关闭

4.编写配置类，提供这个bean来进行操作

package com.chen.esjd.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @ProjectName:es-api
 * @author:陈慌拥
 * @date:2021/6/29 17:59
 * Description:
 */
@Configuration
public class ElasticsearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")));
        return client;
    }
}

5.Api测试

package com.chen;

import com.alibaba.fastjson.JSON;
import com.chen.pojo.User;
import com.chen.util.ESConst;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

@SpringBootTest
class EsApiApplicationTests {

    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;

    /**
     * 测试索引的创建 Request
     */
    @Test
    void testCreateIndex() throws IOException {
        //1.创建索引的请求
        CreateIndexRequest request = new CreateIndexRequest("chen_index");
        //2.客户端执行请求
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);

        System.out.println("response = " + response);
    }


    /**
     * 测试获取索引
     * 只能判断其是否存在
     */
    @Test
    void testExistIndex() throws IOException {
        GetIndexRequest request = new GetIndexRequest("chen_index");
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println("exists = " + exists);
    }

    /**
     * 删除索引
     */
    @Test
    void testDeleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("chen_index");
        AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT);
        System.out.println(response.isAcknowledged());
    }


    /**
     * 添加文档
     */
    @Test
    void testAddDoc() throws IOException {
        User user = new User("阿里巴巴", 3);

        IndexRequest request = new IndexRequest("chen_index");

        //规则 put /index/_doc/id
        request.id("1");
        request.timeout(TimeValue.timeValueSeconds(1));
        request.timeout("1s");

        //将数据放入请求  json
        request.source(JSON.toJSONString(user), XContentType.JSON);

        //客户端发送请求,获取相应的结果
        IndexResponse response = client.index(request, RequestOptions.DEFAULT);

        System.out.println("response.toString() = " + response.toString());
        System.out.println("状态 = " + response.status());

    }

    /**
     * 获取文档，判断是否存在  get/index/_doc/id
     */
    @Test
    void testIndexExists() throws IOException {
        GetRequest request = new GetRequest("chen_index", "1");
        //不获取返回的_source 的上下文了。效率高
        request.fetchSourceContext(new FetchSourceContext(false));
        request.storedFields("_none_");

        boolean exists = client.exists(request, RequestOptions.DEFAULT);
        System.out.println("exists = " + exists);
    }


    /**
     * 获取文档的信息
     */
    @Test
    void testGetDoc() throws IOException {
        GetRequest request = new GetRequest("chen_index", "1");

        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        System.out.println(response.getSourceAsString());

    }

    /**
     * 更新文档信息
     *
     */
    @Test
    void testUpdateDoc() throws IOException {
        UpdateRequest request = new UpdateRequest("chen_index", "1");
        request.timeout("1s");

        User user = new User("菜鸟", 15);
        request.doc(JSON.toJSONString(user), XContentType.JSON);

        UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
        System.out.println(response.status());

    }


    /**
     * 删除文档
     */
    @Test
    void testDeleteDoc() throws IOException {
        DeleteRequest request = new DeleteRequest("chen_index", "1");
        request.timeout("1s");

        DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
        System.out.println(response.status());
    }

    /**
     * 批量插入数据
     */
    @Test
    void testBulkDoc() throws IOException {

        BulkRequest request = new BulkRequest();
        request.timeout("10s");

        ArrayList<User> list = new ArrayList<>();
        list.add(new User("a" , 5));
        list.add(new User("b" , 5));
        list.add(new User("c" , 5));
        list.add(new User("d" , 5));
        list.add(new User("e" , 5));
        list.add(new User("f" , 5));

        for (int i = 0; i < list.size(); i++) {
            request.add(
                    new IndexRequest("chen_index")
                            .id("" + (i + 1))//设置id，不设置随机生成
                            .source(JSON.toJSONString(list.get(i)), XContentType.JSON));
        }
        BulkResponse response = client.bulk(request, RequestOptions.DEFAULT);
        System.out.println(response.hasFailures());//是否失败 返回false表示成功
    }


    /**
     * 查询
     * SearchRequest 搜索请求
     *  SearchSourceBuilder 条件构造
     *  HighlightBuilder  高亮
     *  .....
     */
    @Test
    void testSearch() throws IOException {

        SearchRequest request = new SearchRequest(ESConst.ES_INDEX);//不建议写变量，建议定义常量

        //构建搜索条件
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

        //查询条件，使用QueryBuilders工具来实现
        TermQueryBuilder query = QueryBuilders.termQuery("name", "a");

        sourceBuilder.query(query);

        sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        request.source(sourceBuilder);

        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        System.out.println(JSON.toJSONString(response.getHits()));

        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }
    }
}