ElasticSearch

bibibili

于 2024-09-02 23:00:56 发布

阅读量1.4k

点赞数 15

文章标签： es java

本文链接：https://blog.csdn.net/weixin_44098715/article/details/141826389

版权

面向文档存储的，文档数据会被序列化为json格式。倒排索引：

文档：每条数据就是一个文档

词条：文档按照语义分成的词语

字段：Json文档中的字段

索引：同类型文档的集合

映射：索引中文档的约束,比如字段名称、类型

数据库负责事务类型操作

es负责海量数据的搜索，分析，计算

ElasticSearch安装

因为我们还需要部署kibana容器，因此需要让es和kibana容器互联。这里先创建一个网络：

docker network create es-net

将es.tar到虚拟机中，运行命令加载：

# 导入数据
docker load -i es.tar

同理还有kibana的tar包也需要这样做。

运行docker命令，部署单点es：

docker run -d \
    --name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

命令解释：

-e "cluster.name=es-docker-cluster"：设置集群名称
-e "http.host=0.0.0.0"：监听的地址，可以外网访问
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"：内存大小
-e "discovery.type=single-node"：非集群模式
-v es-data:/usr/share/elasticsearch/data：挂载逻辑卷，绑定es的数据目录
-v es-logs:/usr/share/elasticsearch/logs：挂载逻辑卷，绑定es的日志目录
-v es-plugins:/usr/share/elasticsearch/plugins：挂载逻辑卷，绑定es的插件目录
--privileged：授予逻辑卷访问权
--network es-net ：加入一个名为es-net的网络中
-p 9200:9200：端口映射配置

在浏览器中输入：http://你的ip:9200

部署kibana

运行docker命令，部署kibana

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1

--network es-net ：加入一个名为es-net的网络中，与elasticsearch在同一个网络中
-e ELASTICSEARCH_HOSTS=http://es:9200"：设置elasticsearch的地址，因为kibana已经与elasticsearch在一个网络，因此可以用容器名直接访问elasticsearch
-p 5601:5601：端口映射配置

kibana启动一般比较慢，需要多等待一会，可以通过命令：

docker logs -f kibana

查看运行日志，

ik分词器

安装：

# 进入容器内部
docker exec -it elasticsearch /bin/bash

# 在线下载并安装
./bin/elasticsearch-plugin  install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip

#退出
exit
#重启容器
docker restart elasticsearch

本地load

安装插件需要知道elasticsearch的plugins目录位置，而我们用了数据卷挂载，因此需要查看elasticsearch的数据卷目录，通过下面命令查看:

docker volume inspect es-plugins

显示结果：

[
    {
        "CreatedAt": "2022-05-06T10:06:34+08:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/es-plugins/_data",
        "Name": "es-plugins",
        "Options": null,
        "Scope": "local"
    }
]

说明plugins目录被挂载到了：/var/lib/docker/volumes/es-plugins/_data这个目录中。

把ik文件夹上传到-data目录

重启容器

# 4、重启容器
docker restart es
# 查看es日志
docker logs -f es

测试：

IK分词器包含两种模式：

ik_smart：最少切分，智能切分，粗粒度
ik_max_word：最细切分，细粒度

在目录：/var/lib/docker/volumes/es-plugins/_data/ik/config下，IKAnalyzer.cfg.xml文件可以配置扩展词及停用词条文件，在文件中设置要扩展或者停用的词条

索引库操作

type：字段数据类型

字符串：text，keyword

数值：long、integer、shot、byte、double、float

布尔：boolean

日期：date

对象：object

index：是否创建索引，默认为true

analyzer：使用那种分词器

properties：该字段的子字段

创建索引库：

#创建索引库
PUT /索引库名称
{
    "mappings":{
        "properties":{
            "info":{
                "type":"text",
                "analyzer":"ik_smart"
            },
            "email":{
                "type":"keyword",
                "index":false
            },
            "name":{
                "type":"object",
                "properties":{
                    "firstname":{
                        "type":"keyword"
                    },
                    "lastName":{
                        "type":"keyword"
                    }
                }
            }
        }
    }
}
#查询索引库
GET /索引库名称

#删除索引库
DELETE /索引库名称
eg：DELETE /zz

#往索引库新添字段
PUT /zz/_mapping
{
  "properties":{
    "age":{
      "type":"long"
    }
  }
}
eg：
PUT /zz/_mapping
{
  "properties":{
    "age":{
      "type":"long"
    }
  }
}

新增文档

DSL语法：

#新增文档索引
POST /索引库名/_doc/文档id
{
    "字段1":"值1",
    "字段2":"值2",
    "字段3":{
        "子属性1":"值3",
        "子属性2":"值4",
    }
    //...
}
eg：
POST /zz/_doc/1
{
    "info":"码农的历程",
    "email":"23651@12",
    "name":{
        "firstName":"云",
        "lastName":"赵"
    }
}
#查看文档语法
GET /索引库名/_doc/文档id
eg：GET /zz/_doc/1

#删除文档语法
DELETE /索引库名/_doc/文档id
eg：DELETE /zz/_doc/1

修改文档

方式一：全量修改，会删除旧文档，添加新文档

PUT /索引库名/_doc/文档id
{
    "字段1":"值1",
    "字段2":"值2",
}

eg：
PUT /zz/_doc/1
{
    "info":"码农的33历程",
    "email":"23651@12",
    "name":{
        "firstName":"云",
        "lastName":"赵"
    }
}

方式二：增量修改，修改指定字段值

POST /索引库名/_update/文档id
{
    "doc":{
        "字段名":"新的值"
    }
}

eg：
POST /zz/_update/1
{
    "doc":{
        "email":"3652621@ddd"
    }
}

RestClient操作索引库

1.引入es的restHighlevel依赖

<properties>
        <java.version>1.8</java.version>
        <elasticsearch.version>7.12.1</elasticsearch.version>
</properties>

<dependency>
   <groupId>org.elasticsearch.client</groupId>
   <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
</dependency>
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
</dependency>

2.创建客户端

package cn.itcast.hotel;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

/**
 * @ClassName HotelIndexTest
 * @Description TODO
 * @date 2024/9/2 16:52
 * @Version 1.0
 */
public class HotelIndexTest {
    private RestHighLevelClient client;
    
    @Test
    void testInit(){
        System.out.println(client);
    }
    @BeforeEach
    void setUp() throws Exception {
        this.client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://192.168.216.132:9200")));
    }
    
    @AfterEach
    void tearDown() throws Exception {
        this.client.close();
    }
}

创建索引

@Test
    void createHotelIndex() throws IOException {
        //创建request对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");

        //2.准备请求的参数：dsl语句
        request.source(MAPPING_TEMPLATE, XContentType.JSON);
        //3.发起请求
        client.indices().create(request, RequestOptions.DEFAULT);
    }

删除索引

@Test
    void createIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("hotel");
        //3.发起请求
        client.indices().delete(request, RequestOptions.DEFAULT);
    }

查看索引是否存在

@Test
    void testExistsIndex() throws IOException {
        GetIndexRequest request = new GetIndexRequest("hotel");
        //3.发起请求
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.err.println(exists?"索引库已经存在":"索引库已经不存在");
    }

创建文档索引

@Test
    void testAddDocument() throws Exception {
        Hotel hotel = hotelService.getById(61083L);
        HotelDoc hotelDoc = new HotelDoc(hotel);
        IndexRequest request=new IndexRequest("hotel").id(hotel.getId().toString());
        request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
        client.index(request, RequestOptions.DEFAULT);
    }

查询文档索引

@Test
    void testGetDocument() throws Exception {
        GetRequest request=new GetRequest("hotel","61083");
        GetResponse response=client.get(request, RequestOptions.DEFAULT);
        String sourceAsString = response.getSourceAsString();
        //结果反序列化
        HotelDoc jsonObject = JSON.parseObject(sourceAsString,HotelDoc.class);
        System.out.println(jsonObject);
    }

更新文档

@Test
    void testUpdateDocument() throws Exception {
        UpdateRequest request=new UpdateRequest("hotel","61083");
        request.doc(
                "price","666"
        );
        client.update(request, RequestOptions.DEFAULT);

    }

删除文档

@Test
    void testDeleteDocument() throws Exception {
        client.delete(new DeleteRequest("hotel","61083"), RequestOptions.DEFAULT);
    }

DSL查询语法

查询所有：查询出所有数据，一般测试用

全文检索(full text)查询：利用分词器对用户输入内容分词，然后倒排索引到库中匹配，例如：ids、range、term

地理查询：根据经纬度查询，例如：geo_distance、geo_bounding_box

复合（compound）查询：复合查询可以将上述各种查询条件组合起来，合并查询条件。例如：bool、function_score

查询基本语法：

GET /hotel/_search
{
  "query": {
    "查询类型"：{
      "查询条件"："条件值"
    }
  }
}

#match单字段查询
GET /hotel/_search
{
  "query": {
    "match": {
      "查询字段"："条件值"
    }
  }
}
#多字段multi_match查询
GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "如家",
      "fields": ["brand","name","business"]
    }
  }
}

#精确查询
GET /hotel/_search
{
  "query": {
    "term": {
      "city": {
        "value": "上海"
      }
    }
  }
}

#范围查询
GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 2000
      }
    }
  }
}
#经纬度查询：根据范围查
GET /hotel/_search
{
  "query": {
    "geo_bounding_box": {
      "FIELD": {
        "top_left": {
          "lat":31.1,
          "lon":121.5
        },
        "bottom_right":{
          "lat":30.9,
          "lon":121.7
        }
      }
    }
  }
}
#经纬度查询：根据距离查
GET /hotel/_search
{
  "query": {
    "geo_distance":{
      "distance":"2km",
      "location": "31.21,121.5"
    }
  }
}

#复合查询

ES相关性打分

三要素：过滤条件，算分函数，加权方式

TF（词条频率）=词条出现次数/文档中词条总数

TF-IDF(逆文档频率)：log（文档总数/包含词条的文档总数）

BM25算法

score=词条频率*IDF

FunctionScoreQuery，可以修改文档的相关性算分

script_score(自定义脚本打分，通过写脚本的方法自定义打分)
weight（权重，符合某条件时会打多少分）
random_score(随机)
field_value_factor(字段因子)
decay functions: gauss, linear, exp(越近越好)

GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "all": "外滩"
        },
        "functions": [
            {
              "filter":{
                "term":{
                  "brand":"如家"
                }
              },
              "weight":10  #加分
            }
          ],
          "boost_mode":"sum"
        }
    }
  }
}

复合BooleanQuery查询：

(1)must：必须匹配每个子查询，类似“与”

(2)should：选择性匹配子查询，类似“或”

(3)must_not：必须不匹配，不参与算分，类似“非”

(4)filter：必须匹配，不参与算分

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "如家"
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "price": {
              "gt": 400
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

排序：

#常规排序
GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "score": {
        "order": "desc"
      },
      "price": {
        "order": "asc"
      }
    }
  ]
}

按照某个地址距离排序
GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 31.03,
          "lon": 121.61228
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

es默认返回10条数据。

可以通过修改from（文档开始的位置），size（期望获取的文档总数）参数控制

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ],
  "from": 10,
  "size": 10
}

es是分布式的，所以会面临深度分页的问题。ES设定结果集上限是10000条，解决方案：

search after：分页时需要排序，原理是从上一次的排序值开始，查询下一页数据。

scroll：原理将排序数据形成快照，保存在内存

高亮

在搜索结果中把搜索关键字突出显示

将搜索结果中的关键字用标签标记出来，在页面中给标签添加css

GET /hotel/_search
{
  "query": {
    "match": {
      "name": "如家"#这个name默认要和下边的name一致，如不一致，require_field_match改false
    }
  },
  "highlight": {
    "fields": {
      "name": {
        "require_field_match": "true"
      }
    }
  }
}

RestClient查询文档

查询文档

   @Test
    void testMatchAll() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        request.source().query(QueryBuilders.matchAllQuery());
        SearchResponse response=client.search(request, RequestOptions.DEFAULT);
        SearchHits hit=response.getHits();
        long total=hit.getTotalHits().value;
        System.out.println("共搜索到"+total+"条数据");
        SearchHit[] hits=hit.getHits();
        Arrays.stream(hits).map(item->item.getSourceAsString()).forEach(System.out::println);
    }

分页

@Test
    void testPageAndSort() throws IOException {
        int page=1,size=5;
        SearchRequest request = new SearchRequest("hotel");
       request.source().query(QueryBuilders.matchAllQuery());
        request.source().sort("price", SortOrder.ASC);//设置价格排序
        request.source().from((page-1)*size).size(5);//页码，每页大小
        SearchResponse response=client.search(request, RequestOptions.DEFAULT);
        SearchHits hit=response.getHits();
        long total=hit.getTotalHits().value;
        System.out.println("共搜索到"+total+"条数据");
        SearchHit[] hits=hit.getHits();
        Arrays.stream(hits).map(item->item.getSourceAsString()).forEach(System.out::println);
    }

高亮

@Test
    void testHighlight() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        request.source().query(QueryBuilders.matchQuery("name","如家"));
        request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));
        SearchResponse response=client.search(request, RequestOptions.DEFAULT);
        testHighlight(response);
    }


    public void testHighlight(SearchResponse response) throws IOException {
        SearchHits searchHits=response.getHits();
        long total=searchHits.getTotalHits().value;
        System.out.println("共搜索到"+total+"条数据");
        SearchHit[] hits=searchHits.getHits();
        for (SearchHit hit:hits){
            String json=hit.getSourceAsString();
            HotelDoc hotelDoc= JSON.parseObject(json, HotelDoc.class);
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            if(!CollectionUtils.isEmpty(new Map[]{highlightFields})){
                HighlightField highlightField = highlightFields.get("name");
                String name=highlightField.getFragments()[0].toString();
                hotelDoc.setName(name);
            }
            System.out.println("hotelDoc:"+hotelDoc);
        }
    }

bibibili

关注

15
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch

面向文档存储的，文档数据会被序列化为json格式。倒排索引：文档：每条数据就是一个文档词条：文档按照语义分成的词语字段：Json文档中的字段索引：同类型文档的集合映射：索引中文档的约束,比如字段名称、类型数据库负责事务类型操作es负责海量数据的搜索，分析，计算。
复制链接

扫一扫