elasticearch

我好帅啊~

已于 2023-02-23 10:51:54 修改

阅读量193

点赞数

分类专栏：搜索引擎文章标签： elasticsearch

于 2022-10-24 13:34:35 首次发布

本文链接：https://blog.csdn.net/m0_56808407/article/details/125977927

版权

搜索引擎专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

一、elasticearch是什么
二、安装elasticearch、kibana
三、索引库操作
四、文档操作
五、RestClient操作索引库
部署es集群

一、elasticearch是什么

1. 介绍

elasticearch是一个强大的开源搜索引擎，可以帮助我们从海里的数据中快速的找到需要的内容。
elasticearch结合 kibana、Logstash、Beats，也就是 elasit-stack (ELK)。被广泛应用在日志数据分析、实时监控等领域
elasticearch 是 elasit-stack (ELK)的核心，负责存储，搜索，分析数据
Lucene是一个Java语言的搜索引擎类库，是Apache公司的顶级项目。点击跳转：官网地址
Lucene的优势：

易扩展
高性能（基于倒排索引）
Lucene的缺点：

只限于Java语言开发
学习曲线陡峭
不支持水平扩展
elasticearch（点击跳转官网地址）相比于Lucene具有下列优势：

支持分布式，可水平扩展
提供Restful接口，可以被任何语言调用

2. 正向索引和倒排索引

正向索引

传统数据库（如MySql）采用正向索引，列如下表（tb_goods）中的id创建索引
在这里插入图片描述

倒排索引

elasticearch采用倒排索引：
对文档内容分词，对词条创建索引，并记录词条所在文档的信息，查询时先更具词条查询到文档id而后获取文档
文档（doucument）:每条数据就是一个文档
词条（term）：对应文档中的内容分词，得到的词语就是词条

关系如下图：
在这里插入图片描述
搜索时的逻辑：

文档

elasticearch是面向文档存储的，可以是数据库中的一条商品数据，一个订单信息
文档数据会被序列化为json格式后存储在elasticearch中。

索引（index）

索引（index）：相同类型的文档的集合
映射（mapping）：索引中文档的字段信息约束，类似表的结构约束

概念对比

在这里插入图片描述

3. 架构

Mysql：擅长事务类型操作，可以确保数据的安全性和一致性
Elasticearch：擅长海量数据的搜索，分析，计算
Mysql和Elasticearch是互补的。

二、安装elasticearch、kibana

1.部署单点es

1.1.创建网络

因为需要部署kibana容器，因此需要让es和kibana容器互联。这里先创建一个网络：

	docker network create es-net

1.2.加载镜像

这里我们采用elasticsearch的7.12.1版本的镜像，这个镜像体积非常大，接近1G。不建议pull，可以先从其他资料网站下载。
-导入到虚拟机中，然后运行命令加载即可：

# 导入es
docker load -i es.tar
#导入kibana
 docker load -i kibana.tar

1.3.运行

运行docker命令，部署单点es：

docker run -d \
	--name myes \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" #配置堆内存
-e "discovery.type=single-node" \ 配置运行模式(单点)
-v es-plugins:/usr/share/elasticsearch/plugins \ #数据卷挂载
--privileged \ #授予逻辑卷访问权
--network es-net \ #加入一个名为es-net的网络中
-p 9200:9200 \ #端口映射配置，用户访问
-p 9300:9300 \ #端口映射配置，互联端口
在浏览器中输入：虚拟机IP:9200 即可看到elasticsearch的响应结果

2.部署kibana

kibana可以给我们提供一个elasticsearch的可视化界面，便于我们学习。

2.1.部署kibana

运行docker命令，部署kibana

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://myes:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1

--network es-net ：加入一个名为es-net的网络中，与elasticsearch在同一个网络中
-e ELASTICSEARCH_HOSTS=http://es:9200"：设置elasticsearch的地址，因为kibana已经与elasticsearch在一个网络，因此可以用容器名直接访问elasticsearch
-p 5601:5601：端口映射配置

kibana启动一般比较慢，需要多等待一会，可以通过命令查看运行日志：

docker logs -f kibana

2.2.DevTools

kibana中提供了一个DevTools界面：

访问 xxx.xxx.xxx.xxx::5601(此处端口为doker安装kibana时所映射的端口)
这个界面中可以编写DSL来操作elasticsearch。并且对DSL语句有自动补全功能。

3.安装IK分词器

由于默认分词器无法对中午进行准确分词，会按照单个汉字分词。所以这里需要按照IK分词器对中文进行分词

3.1.在线安装ik插件（较慢）

# 进入容器内部
docker exec -it es/bin/bash
# 在线下载并安装
./bin/elasticsearch-plugin  install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip
#退出
exit
#重启容器
docker restart es
# 查看es日志
docker logs -f es

3.2.测试：

IK分词器包含两种模式：

ik_max_word：最细切分

GET /_analyze
{
  "analyzer": "ik_max_word",
  "text": "无情啊无情的程序员"
}

结果：

{
  "tokens" : [
    {
      "token" : "无情",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "啊",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "无情",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "的",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "程序员",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "程序",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "员",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 6
    }
  ]
}

ik_smart：最少切分

GET /_analyze
{
  "analyzer": "ik_smart",
  "text": "无情啊无情的程序员"
}

结果：

{
  "tokens" : [
    {
      "token" : "无情",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "啊",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "无情",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "的",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "程序员",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

3.3 扩展词词典

随着互联网的发展，“造词运动”也越发的频繁。出现了很多新的词语，在原有的词汇列表中并不存在。
所以我们的词汇也需要不断的更新，IK分词器提供了扩展词汇的功能。

打开IK分词器的config目录中的IKAnalyzer.cfg.xml(PS:如果没有则创建)：
在配置文件内容添加（注意当前文件的编码必须是 UTF-8 格式，严禁使用Windows记事本编辑）：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--配置扩展字典 ext_dict-->
	<entry key="ext_dict">ext.dic</entry>
	 <!--配置扩展停止词字典 ext_stopwords-->
	<entry key="ext_stopwords">stop.dic</entry>
	<!--远程扩展字典 remote_ext_dict-->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--配置远程扩展停止词字典 remote_ext_stopwords-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

在IKAnalyzer.cfg.xml的同级目录新建一个 ext.dic，可以参考config目录下复制一个配置文件进行修改

无情
哈拉少

重启elasticsearch

docker restart es
# 查看 日志
docker logs -f elasticsearch

测试效果：

POST /_analyze
{
  "text": "无情且哈拉少的程序员", 
  "analyzer": "ik_smart"
}

三、索引库操作

mapping属性

mapping是对索引中文档对的约束，常见的mapping属性包括

type：字段数据类型
1. 字符串：text(可分词的文本)、keyword（精确值，列如：品牌，国家，ip地址，邮箱地址）
2. 数值：long、integer、short、byte、double、float
3. 布尔：boolean
4. 日期：date
5. 对象：object
ES中支持两种地理坐标数据类型：
1. geo_point ：由维度(latitude)和经度(longitude)确定的一个点。例如：“-77.03653,38.897676”
2. geo_shape：由多个geo_point 组成的复杂集合图形，列如一条直线（“LINESTRING(-77.03653 38.897676,-77.009051 38.889939)”）

{
  "age": 21,
  "weight": 52.1,
  "isMarried": false,
  "info": "这是一句没有任何作用的话",
  "email": "zy@itcast.cn",
  "score": [99.1,99.5,98],
  "name": {
    "firstName": "云",
    "lastName": "赵"
  }
}

index：是否创建索引，默认为true
analyzer：使用哪种分词器
properties：该字段的子字段
copy_to：可以将当前字段拷贝到指定字段(应用场景：同时根据多个字段搜索且不降低性能的时候)

创建索引库

ES通过Restful请求操作索引库、文档。请求内容用DSL语句来标识。创建索引库和mapping的DSL语法如下：

# 创建索引库
PUT /索引库名称 
{
  "mappings": {
    "properties": {
      "字段名1": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "字段名2": {
        "type": "keyword",
        "index": false
      },
      "字段名3": {
        "type": "object",
        "properties": {
          "子字段1": {
            "type": "keyword"
          },
          "子字段2": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

查看、删除索引库

查看索引库：

GET   /索引库名称

删除索引库

DELETE   /索引库名称

修改索引库

索引库和mapping一旦创建无法修改、但是可以添加新的字段
字段名不可重复，一旦重复就会认为在修改已创建好的索引库或mapping，就会报错！

PUT /索引库名称/_mapping
{
 "properties": {
      "新字段名": {
        "type": "text",
        "analyzer": "ik_smart"
      },
}

四、文档操作

新增文档

POST /索引库名称/_doc/文档ID
{
  "字段1":"值1",
  "字段2":"值2",
  "字段3" : {
     "子属性1" : "值3",
     "子属性2" : "值4"
  }
}

查询文档

通过文档Id查询

GET  /索引库名称/_doc/文档ID

定义查询

DSL quert的分类，es提供了基于JSON的DLS来定义查询，常见的查询类型如下：

查询所有：

查询处所有数据，一般测试用。列如：

match_all

	GET /索引库名称/_search
	{
 	 "query": {
 	   "match_all": {}
 	 	}
	}

全文检索（full text）查询：

利用分词器对用户输入内容分词，然后去倒排索引库中匹配。例如：

match

	GET /索引库名称/_search
	{
 	 "query": {
 	   "match": {
 	   		"字段名称":"查询值"
 	   		}
 	 	}
	}

multi_match

	GET /索引库名称/_search
	{
 	 "query": {
 	   "multi_match": {
 	   		"query":"查询值",
 	   		"fields"["字段1","字段2","字段3"]
 	   		}
 	 	}
	}

精确查询

根据精确词条查找数据，一般是查找keyword，数值，日期，boolean等类型字段。例如：

ids
range
term

	GET /索引库名称/_search
	{
 	 "query": {
 	   "term": {
 	   		"字段名称":{}
 	   		}
 	 	}
	}

地理（geo）查询

根据近卫笃查询。例如：

geo_distance
geo_bounding_box

复合（compound）查询：

复核查询可以将上述各种查询条件组合起来，合并查询条件。例如：

bool
funcion_score

删除文档

语法：

DELETE /索引库名称/_doc/文档ID

示例：

DELETE /myindex/_doc/1

修改文档

全量修改

全量修改，会先删除旧文档，添加新文档
如果修改时文档ID对应的文档不存在的话，会无法执行删除操作，但会执行添加操作

PUT /索引库名称/_doc/文档ID
{
  "字段1":"值1",
  "字段2":"值2",
  "字段3" : {
     "子属性1" : "值3",
     "子属性2" : "值4"
  }
}

局部修改（增量修改，修改指定字段值）

POST /索引库名称/_update/文档ID
{
  "doc": {
    "字段":"值"
  }
}

五、RestClient操作索引库

ES官方提供了不同语言的客户端，用来操作ES。这些客户端本质就是组装DSL语句，通过HTTP请求发送给ES。
官方文档：点击跳转
java语言客户端文档地址：点击跳转

初始化Java API Client

引入es和jackson的依赖

	<!--elasticsearch-->
   <dependency>
      <groupId>co.elastic.clients</groupId>
      <artifactId>elasticsearch-java</artifactId>
      <version>8.3.3</version>
    </dependency>
	<!--jackson-->
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>2.12.3</version>
    </dependency>

创建API客户端

    void setClient() {
        //创建底层客户端
        RestClient restClient = RestClient.builder(
                new HttpHost("localhost", 9200)).build();

        //使用Jackson映射器创建传输
        ElasticsearchTransport transport = new RestClientTransport(
                restClient, new JacksonJsonpMapper());

        // 创建API客户端
        ElasticsearchClient client = new ElasticsearchClient(transport);
    }

Java API Client

操作索引库（index）

创建索引库

 void testCreatIndex() throws IOException {
        //索引库名称
        String indexName="";
        //映射字段mappings（JSON结构）
        String mappings="";
        StringReader stringReader = new StringReader(mappings);
        CreateIndexRequest hotel = CreateIndexRequest.of(
                c -> c.index(indexName).withJson(stringReader));
        client.indices().create(hotel);
    }

判断索引库是否存在

    void testExistsIndex() throws IOException {
        //索引库名称
        String indexName="";
        //映射字段mappings（JSON结构）
        BooleanResponse exists = client.indices().exists(c -> c.index(indexName));
        System.out.println(exists.value());
    }

删除索引库

    void testDeleteIndex() throws IOException {
        //索引库名称
        String indexName="";
        //映射字段mappings（JSON结构）
        client.indices().delete( c -> c.index(indexName));
    }

操作文档

新增文档

单条插入

     void testCreatDocuments() throws IOException {
        //获取需要写入到es中的数据
        Hotel byId = hotelService.getById(39141L);
        //构建 IndexRequest 对象
        IndexRequest.Builder builder = new IndexRequest.Builder<>();
        //索引库名称
        String indexName = "hotel";
        //需要访问那个索引库
        builder.index(indexName);
        //定义所创建文档的ID
        builder.id(byId.getId().toString());
        //文档内容
        builder.document(byId);
        //构建完成
        IndexRequest build = builder.build();
        //执行
        IndexResponse index = client.index(build);
    }

多条插入

    void testBathCreatDocuments() throws IOException {
        //获取需要写入到es中的数据
        List<Hotel> list = hotelService.list(new LambdaQueryWrapper<Hotel>().last("limit 10"));
        //索引库名称
        String indexName = "hotel";
        //使用 BulkRequest 批量插入
        BulkRequest.Builder br = new BulkRequest.Builder();
        for (Hotel hotel : list) {
            br.operations(op -> op
                    .index(idx -> idx
                            .index(indexName)
                            .id(hotel.getId().toString())
                            .document(hotel)
                    )
            );
        }
        //执行批量插入
        BulkResponse result = client.bulk(br.build());
    }

查询文档

通过文档ID获取查询文档

       void testGetDocuments() throws IOException {
        //索引库名称
        String indexName = "hotel";
        //执行
        GetResponse<Hotel> hotelDocGetResponse = client.get(g -> g
                        .index(indexName)
                        //文档ID
                        .id(String.valueOf(39141)),
                Hotel.class
        );
        System.out.println(hotelDocGetResponse.source());
    }

通过文档内容查询（单一条件）

    @Test
    void testSearchDocuments() throws IOException {
        //索引库名称
        String indexName = "hotel";
        //执行
        SearchResponse<Hotel> response = client.search(g -> g
                        .index(indexName)
                        .query(q->q
                                .match(t->t.field("business").query("长风公园地区"))),
                Hotel.class
        );

通过文档内容查询（多个条件）

    void testMultiSearchDocuments() throws IOException {
        String searchText = "上海西藏大厦万怡酒店";
        double maxPrice = 500;
        String indexName = "hotel";

        //定义查询条件
        //通过名称查询 匹配值
        Query byName = MatchQuery.of(m -> m
                .field("all")
                .query(searchText)
        )._toQuery();

        //价格最大的  范围查询
        Query byMaxPrice = RangeQuery.of(r -> r
                .field("price")
                .gte(JsonData.of(maxPrice))
        )._toQuery();

        //结合名称和价格查询酒店信息
        SearchResponse<HotelDoc> response = client.search(s -> s
                        .index(indexName)
                        .query(q -> q
                                .bool(b -> b
                                        .must(byName)
                                        .must(byMaxPrice)
                                )
                        ),
                HotelDoc.class
        );
```速8酒店
#### 更新文档
```java
    void testUpdateDocuments() throws IOException {
        //从数据库获取数据
        Hotel hotel = hotelService.getById(39141L);
        //构建 IndexRequest 对象
        IndexRequest.Builder builder = new IndexRequest.Builder<>();
        //索引库名称
        String indexName = "hotel";
        //修改内容
        //需要修改那个值，就set那个值
        hotel.setAddress("中国北京");
        IndexRequest build = builder
                .index(indexName)
                .id(hotel.getId().toString())
                .document(hotel).build();
        //执行
        IndexResponse index = client.index(build);
    }

删除文档

  void testDeleteDocuments() throws IOException {
        //索引库名称
        String indexName = "hotel";
        client.delete(x ->
                x.index(indexName)
                        //删除文档的ID
                        .id(String.valueOf(39141)));
    }

部署es集群

部署es集群可以直接使用docker-compose来完成，不过要求你的Linux虚拟机至少有4G的内存空间

首先编写一个docker-compose文件，内容如下：

version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.12.1
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic
  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.12.1
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data02:/usr/share/elasticsearch/data
    networks:
      - elastic
  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.12.1
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data03:/usr/share/elasticsearch/data
    networks:
      - elastic

volumes:
  data01:
    driver: local
  data02:
    driver: local
  data03:
    driver: local

networks:
  elastic:
    driver: bridge

Run docker-compose to bring up the cluster:

docker-compose up

我好帅啊~

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
elasticearch

elasticearch是一个强大的开源搜索引擎，可以帮助我们从海里的数据中快速的找到需要的内容。elasticearch结合 kibana、Logstash、Beats，也就是 elasit-stack (ELK)。被广泛应用在日志数据分析、实时监控等领域elasticearch 是 elasit-stack (ELK)的核心，负责存储，搜索，分析数据Lucene是一个Java语言的搜索引擎类库，是Apache公司的顶级项目
复制链接

扫一扫