ES知识点

最新推荐文章于 2024-07-21 14:35:54 发布

小鱼做了就会

最新推荐文章于 2024-07-21 14:35:54 发布

阅读量1.7k

点赞数 1

分类专栏：开发框架及各种插件文章标签： elasticsearch 搜索引擎大数据

本文链接：https://blog.csdn.net/qq_41884396/article/details/125382466

版权

本文详细介绍了ES（Elasticsearch）的核心概念，包括倒排索引、下载安装、ik分词器的使用，以及ES的基本操作如创建索引、文档操作。还深入讲解了ES的查询方式，如term、terms、match查询，以及java集成ES的方法。最后涵盖了聚合查询、地图经纬度搜索等进阶功能。

摘要由CSDN通过智能技术生成

ES知识点

一、ES简介

ES是使用java 语言并且基于lucence编写的搜索引擎框架，他提供了分布式的全文搜索功能，提供了一个统一的基于restful风格的web 接口。

lucence:一个搜索引擎底层

分布式：突出ES的横向扩展能力

全文检索：将一段词语进行分词，并将分出的词语统一的放在一个分词库中，再搜索时，根据关键字取分词库中检索，找到匹配的内容（倒排索引）。

restful风格的web 接口：只要发送一个http请求，并且根据请求方式的不同，携带参数的不同，执行相应的功能。

应用广泛：WIKI, github,Gold man

1.2 倒排索引

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bVqSZJV6-1655738080673)(img/image-20200727144457339.png)]

将存放的数据以一定的方式进行分词，并将分词的内容存放到一个单独的分词库中。
当用户取查询数据时，会将用户的查询关键字进行分词，然后去分词库中匹配内容，最终得到数据的id标识
根据id标识去存放数据的位置拉去指定数据

二、下载安装ES、kibana

下载elasticsearch ：https://mirrors.huaweicloud.com/elasticsearch/ 和 kibana ：https://mirrors.huaweicloud.com/kibana/ ，两版本最好一致。
注意：如果是win系统最好将文件放在根目录下，不要放在 F:\Program Files；因为 F:\Program Files文件夹下的都是默认的只读权限
kibana 使用

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8Sd90e6Q-1655738080677)(img/image-20201213184452851.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uiU0dJkj-1655738080678)(img/image-20201213184526472.png)]

三、ik分词器

下载ik分词器：https://github.com/medcl/elasticsearch-analysis-ik/releases ，版本一定要与elasticsearch 一致。
将下载的压缩文件解压复制到 es的安装目录/plugin/ik下面即可，效果如下：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qW7UxIZd-1655738080680)(img/image-20201213180853136.png)]

没有使用分词器时：

POST _analyze
{
  "text": "我是中国人"
}

结果：
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "中",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "国",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "人",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    }
  ]
}

使用分词器时：

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "我是中国人"
}

结果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

四、es基本操作

4.1 ES结构

4.1.1 索引indx，分片，备份

ES服务中会创建多个索引
每个索引默认被分成5个分片
每个分片存在至少一个备份分片
备份分片不会帮助检索数据（当ES检索压力特别大的时候才，备份分片才会帮助检索数据）
备份的分片必须放在不同的服务器中

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-L1vtAQZg-1655738080681)(img/image-20200727174836230.png)]

4.1.3 类型type

一个索引下可以创建多个类型
PS:版本不同,类型的创建也不同

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ElsjzPga-1655738080682)(img/image-20200727175427524.png)]

4.1.4 文档document

一个类型下可以有多个文档，这个文档就相当于mysql表中的多行数据

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0OU07Yb0-1655738080684)(img/image-20200727175655572.png)]

4.1.5 属性field

一个文档中可以包含多个属性，类似于mysql 表中的一行数据有多个列

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-41HN0Jed-1655738080685)(img/image-20200727180642583.png)]

4.2 操作ES的restful语法

GET请求：
http://ip:port/index :查询索引信息
http://ip:port/index/type/doc_id :查询指定的文档信息
POST请求：
http://ip:port/index/type/_search: 查询文档，可以在请求体中添加json字符串来代表查询条件
http://ip:port/index/type/doc_id/_update: 修改文档，在请求体中添加json字符串来代表修改的信息
PUT请求：
http://ip:port/index : 创建一个索引，需要在请求体中指定索引的信息
http://ip:port/index/type/_mappings:代表创建索引时，指定索引文档存储属性的信息
DELETE 请求：
http://ip:port/index：删除跑路
http://ip:port/index/type/doc_id: 删除指定的文档

4.3 创建一个索引

4.3.1 创建一个索引

#创建一个索引
#number_of_shards  分片
#number_of_replicas 备份

PUT /person
{
   
  "settings": {
   
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

4.3.2 ES中Field可以指定的类型

字符串类型:
  text: 一般用于全文检索，将当前field 进行分词
  keyword:当前field  不会进行分词
数值类型：
  long:
  Intger:
  short:
  byte:
  double:
  float:
  half_float: 精度比float 小一半
  scaled_float:根据一个long 和scaled 来表达一个浮点型 long-345, -scaled 100 ->3.45
时间类型：
  date类型,根据时间类型指定具体的格式
    PUT my_index
    {
      "mappings": {
        "_doc": {
          "properties": {
            "date": {
              "type":   "date",
              "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
            }
          }
        }
      }
    }
布尔类型：
  boolean 类型，表达true 和false
二进制类型：
  binary类型暂时支持Base64编码的字符串
范围类型：
  integer_range：
  float_range：
  long_range：赋值时，无需指定具体的内容，只需存储一个范围即可，gte,lte,gt,lt,
  double_range：
  date_range：
  ip_range：

    PUT range_index
    {
      "settings": {
        "number_of_shards": 2
      },
      "mappings": {
        "_doc": {
          "properties": {
            "expected_attendees": {
              "type": "integer_range"
            },
            "time_frame": {
              "type": "date_range", 
              "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
            }
          }
        }
      }
    }

    PUT range_index/_doc/1?refresh
    {
      "expected_attendees" : { 
        "gte" : 10,
        "lte" : 20
      },
      "time_frame" : { 
        "gte" : "2015-10-31 12:00:00", 
        "lte" : "2015-11-01"
      }
    }
经纬度类型：
  geo_point:用来存储经纬度
IP类型：
  ip:可以存储IPV4 和IPV6
其他的数据类型，参考官网

4.3.3 创建索引并指定数据结构

#创建索引，指定数据类型
PUT /book
{
   
  "settings": {
   
    #分片数
    "number_of_shards": 5,
    #备份数
    "number_of_replicas": 1
  },
    #指定数据类型
 "mappings": {
   
    #类型 Type
   "novel":{
   
    #文档存储的field
     "properties":{
   
       #field属性名
       "name":{
   
         #类型
         "type":"text",
         #指定分词器
         "analyzer":"ik_max_word",
         #指定当前的field可以被作为查询的条件
         "index":true,
         #是否需要额外存储
         "store":false
       },
       "author":{
   
         "type":"keyword"
       },
       "count":{
   
         "type":"long"
       },
       "on-sale":{
   
         "type":"date",
           #指定时间类型的格式化方式
         "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
       },
        "descr":{
   
          "type":"text",
          "analyzer":"ik_max_word"
       }
     }
   }
 }
}

4.4 查看索引信息

#查看索引
1 management

2 GET /person

4.5 删除索引

#删除索引
1 management

2 DELETE /person

4.6 新建文档

#添加文档
#自动生成id
POST /book/novel
{
   
  "name":"盘龙",
  "author":"我吃西红柿",
  "count":100000,
  "on-sale":"2001-01-01",
  "descr":"大小的血睛鬃毛……"
}

#手动指定id
PUT /book/novel/1
{
   
  "name":"红楼梦",
  "author":"曹雪芹",
  "count":10000000,
  "on-sale":"2501-01-01",
  "descr":"中国古代章回体长篇小说，中国古典四大名著之一"
}

4.7 修改文档

4.7.1 覆盖式修改

#添加文档,手动指定id
PUT /book/novel/1
{
   
  "name":"红楼梦",
  "author":"曹雪芹",
  "count":1000444,
  "on-sale":"2501-01-01",
  "descr":"中国古代章回体长篇小说，中国古典四大名著之一"
}

4.7.2 使用doc修改方式

#修改文档，使用doc 方式
POST /book/novel/1/_update
{
   
  "doc":{
   
      #指定需要修改的field和对应的值
    "count":566666
  }
}

4.8 删除文档

#根据id删除文档
DELETE /book/novel/3mEnk3MBaSKoGN4T2olw

五、java集成ES

5.1 maven 依赖

 <!-- https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch -->
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>6.5.4</version>
</dependency>

<!-- elasticsearch的高级API-->
<!--https://mvnrepository.com/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client -->
<dependency>
     <groupId>org.elasticsearch.client</groupId>
     <artifactId>elasticsearch-rest-high-level-client</artifactId>
     <version>6.5.4</version>
</dependency>

5.2 创建client连接

public class EsClient {
   

    public static RestHighLevelClient getClient(){
   
        //  创建 HttpHost
        HttpHost httpHost = new HttpHost("127.0.0.1", 9200, "http");

        // 创建 RestClientBuilder
        RestClientBuilder builder = RestClient.builder(httpHost);

        // 创建 RestHighLevelClient
        RestHighLevelClient client = new RestHighLevelClient(builder);

        return client;
    }
}

5.3 java中ES的操作

公共部分：

public class EsOperation {
   
    RestHighLevelClient client =  EsClient.getClient();
    String index = "person";
    String type="man";
}

5.3.1 创建索引

public void createIndx() throws Exception{
   
        // 1.准备关于索引的setting
        Settings.Builder settings = Settings.builder()
                .put("number_of_shards", 2)
                .put("number_of_replicas", 1);

        // 2.准备关于索引的mapping
        XContentBuilder mappings = JsonXContent.contentBuilder()
                .startObject()
                    .startObject("properties")
                        .startObject("name")
                            .field("type", "text")
                        .endObject()
                        .startObject("age")
                            .field("type", "integer")
                        .endObject()
                        .startObject("birthday")
                            .field("type", "date")
                            .field("format", "yyyy-MM-dd")
                        .endObject()
                    .endObject()
                .endObject();
        // 3.将settings和mappings 封装到到一个Request对象中
        CreateIndexRequest request = new CreateIndexRequest(index)
                .settings(settings)
                .mapping(type,mappings);
        // 4.使用client 去连接ES
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);

        System.out.println("response:"+response.toString());
}

5.3.2 判断索引是否存在

 public void existTest() throws IOException {
   

        //  1.准备request 对象
        GetIndexRequest request = new GetIndexRequest(index);

        // 2.通过client 去 操作
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        // 3输出结果
        System.out.println(exists);
    }

5.3.3 删除索引

public void testDelete() throws IOException {
   
        // 1.获取request

        DeleteIndexRequest request = new DeleteIndexRequest(index);

        //  2.使用client 操作request
        AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
        //  3.输出结果
        System.out.println(delete.isAcknowledged());
    }

5.3.4 添加文档

public void createDocTest() throws IOException {
   
        //  1.准备一个json数据
        Person person  = new Person(1,"张三",33,new Date());
        String json = mapper.writeValueAsString(person);
        //  2.创建一个request对象(手动指定的方式创建)
        IndexRequest request = new IndexRequest(index,type,person.getId().toString());
        request.source(json, XContentType.JSON);
        // 3.使用client 操作request对象生成doc
        IndexResponse response = client.index(request, RequestOptions.DEFAULT);
        // 4.输出返回结果
        System.out.println(response.getResult().toString());

    }

批量新增

public void bulkCreateDoc() throws  Exception{
   
        // 1.准备多个json 对象
        Person p1 = new Person(1,"张三",23,new Date());
        Person p2 = new Person(2,"里斯",24,new Date());
        Person p3 = new Person(3,"王武",24,new Date());

        String json1  = mapper.writeValueAsString(p1);
        String json2  = mapper.writeValueAsString(p2);
        String json3  = mapper.writeValueAsString(p3);

        // 2.创建request

        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.add(new IndexRequest(index,type,p1.getId().toString()).source(json1,XContentType.JSON))
                .add(new IndexRequest(index,type,p2.getId().toString()).source(json2,XContentType.JSON))
                .add(new IndexRequest(index,type,p3.getId().toString()).source(json3,XContentType.JSON));

        // 3.client 执行
        BulkResponse responses = client.bulk(bulkRequest, RequestOptions.DEFAULT);

        // 4.输出结果
        System.out.println(responses.getItems().toString());
    }

5.3.5 修改文档

public void updateDocTest() throws Exception{
   
        // 1.创建要跟新的Map
        Map<String,Object>  doc = new HashMap<>();
        doc.put("name","张三三");

        // 2.创建request, 将doc 封装进去
        UpdateRequest request = new UpdateRequest(index,type,"1");
        request.doc(doc);

        // 3. client 去操作 request
        UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
        // 4.输出 更新结果
        System.out.println(response.getResult());
    }

5.3.6 删除文档

public void deleteDocTest() throws  Exception{
   
        //  1.封装删除对象
        DeleteRequest request = new DeleteRequest(index,type,"1");

        //  2 client 操作 request对象
        DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
        //  3.输出结果
        System.out.println(response.getResult().toString());
    }

批量删除

public void bulkDelete() throws Exception{
   
    // 1.创建Request 对象
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.add(new DeleteRequest(index,type,"1"));
    bulkRequest.add(new DeleteRequest(index,type,"2"));
    bulkRequest.add(new DeleteRequest(index,type,"3"));
    // 2.执行
    BulkResponse re = client.bulk(bulkRequest, RequestOptions.DEFAULT);
    // 3.输出结果
    System.out.println(re.toString());

}

六、ES的各种查询

6.1 term 和terms 查询

6.1.1 term 查询

term 查询是代表完全匹配，搜索之前不会对你搜索的关键字进行分词，直接拿关键字去文档分词库中匹配内容

#term查询
POST /sms-logs-index/sms-logs-type/_search
{
   
  #limit ?
  "from": 0,  
  #limit x,?
  "size":5,
  "query": {
   
    "term": {
   
      "province": {
   
        "value": "北京"
      }
    }
  }
}

public void termSearchTest() throws IOException {
   
        // 1.创建request对象
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        //  2.创建查询条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        builder.from(0);
        builder.size(5);
        builder.query(QueryBuilders.termQuery("province","北京"));

        request.source(builder);

        //  3.执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.输出查询结果
        for (SearchHit hit : response.getHits().getHits()) {
   
            Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            System.out.println(sourceAsMap);

        }
    }

6.1.2 terms 查询

terms 和 term 查询的机制一样，搜索之前不会对你搜索的关键字进行分词，直接拿关键字去文档分词库中匹配内容
terms:是针对一个字段包含多个值

如：

term : where province =北京
terms: where province = 北京 or province = 湖北 (类似于mysql 中的 in)
也可针对 text, 只是在分词库中查询的时候不会进行分词

#terms 查询
POST /sms-logs-index/sms-logs-type/_search
{
   
  "query": {
   
    "terms": {
   
      "province": [
        "北京",
        "晋城"
      ]
    }
  }
}

public void termsSearchTest() throws IOException {
   
        // 1.创建request对象
        SearchRequest request = new SearchRequest(index);
        request.types(type);

        // 2.创建查询条件
        SearchSourceBuilder builder = new SearchSourceBuilder();
        builder.query(QueryBuilders.termsQuery("province","北京","晋城"));
        request.source(builder);

        // 3.执行查询
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //