Elasticsearch学习笔记

最新推荐文章于 2025-04-26 12:34:39 发布

qq_52735063

最新推荐文章于 2025-04-26 12:34:39 发布

阅读量783

点赞数

文章标签： elasticsearch 搜索引擎大数据

本文链接：https://blog.csdn.net/qq_52735063/article/details/122102062

版权

本文介绍了Elasticsearch的IK分词器使用，包括ik_smart和ik_max_word的测试。详细讲解了如何进行RESTful操作，如创建索引、更新文档、删除索引及各种查询方式，如match、term和keyword的区别。同时，还涵盖了Elasticsearch与SpringBoot的集成，包括依赖引入和配置文件编写。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

IK分词器

把一句中文分解成为一个个关键字，进行相关的匹配操作

如果使用中文：推荐使用IK分词器

两个分词算法：ik_smart（最少切分），ik_max_word（最细粒度划分）

ik_smart（最少切分）测试

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "我来自宝鸡文理学院"
}

result

{
    "tokens" : [
        {
            "token" : "我",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_CHAR",
            "position" : 0
        },
        {
            "token" : "来自",
            "start_offset" : 1,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 1
        },
        {
            "token" : "宝鸡",
            "start_offset" : 3,
            "end_offset" : 5,
            "type" : "CN_WORD",
            "position" : 2
        },
        {
            "token" : "文理学院",
            "start_offset" : 5,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 3
        }
    ]
}

ik_max_word（最细粒度划分）测试

GET _analyze
{
    "analyzer": "ik_max_word",
    "text": "我来自宝鸡文理学院"
}

result

{
    "tokens" : [
        {
            "token" : "我",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_CHAR",
            "position" : 0
        },
        {
            "token" : "来自",
            "start_offset" : 1,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 1
        },
        {
            "token" : "宝鸡",
            "start_offset" : 3,
            "end_offset" : 5,
            "type" : "CN_WORD",
            "position" : 2
        },
        {
            "token" : "文理学院",
            "start_offset" : 5,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 3
        },
        {
            "token" : "文理",
            "start_offset" : 5,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 4
        },
        {
            "token" : "理学院",
            "start_offset" : 6,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 5
        },
        {
            "token" : "理学",
            "start_offset" : 6,
            "end_offset" : 8,
            "type" : "CN_WORD",
            "position" : 6
        },
        {
            "token" : "学院",
            "start_offset" : 7,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 7
        }
    ]
}

Rest风格

基本的Rest命令

method	url	描述
PUT	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST	localhost:9200/索引名称/类型名称	创建文档（随机文档id）
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
GET	localhost:9200/索引名称/类型名称/文档id	查询文档（通过文档id）
POST	localhost:9200/索引名称/类型名称/_search	查询所有数据

索引的基本操作

创建一个索引

PUT /索引名/类型名（新版本逐步废弃）/文档ID
{
	字段名:字段值
}

PUT /text1/type1/1
{
  "name": "李永康",
  "age": 18
}

创建索引（数据库）和字段

PUT /索引名
{
  "mappings": {
    "properties": {
        属性名: {
            类型
        }
    }
  }
}

创建一个text2索引，name字段为text，address字段为text，age字段为long。

如果没有指定字段的类型，ES会帮我们指定字段类型！

PUT /text2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "long"
      }
    }
  }
}

获取索引的信息

GET 索引名

{
  "text1" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1633788361458",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "Qjhtls6BSh6pBmcw4XvCug",
        "version" : {
          "created" : "7060199"
        },
        "provided_name" : "text1"
      }
    }
  }
}

更新一个文档

方法一：覆盖

覆盖就是再put一个相同的文档。

方法二：更改

PUT /索引名/类型名/文档ID
{
  "doc":{
        要更改的字段:更新内容
  }
}

把text1索引的type1类型1号文档的name属性改为：“李永康12138”

POST text1/type1/1/_update
{
  "doc":{
    "name":"李永康12138"
  }
}

删除索引

DELETE /索引名/类型名/文档ID

根据要求可以删除索引或者文档

文档的相关操作

添加文档

PUT /索引名/类型名（新版本逐步废弃）/文档ID
{
	字段名:字段值
}

PUT /text1/type1/1
{
  "name": "李永康",
  "age": 18
}

获取文档信息

GET 索引名/类型名/文档ID
GET text1/type1/2

简单查询

GET 索引名/_search?q=属性名:值

复杂查询

match匹配

match查询会使用分词器解析，先解析文档，然后通过分析的文档进行查询。

判断字段name含有"穿越"的文档

GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  }
}

结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,//查询数量
      "relation" : "eq"//查询条件 eq
    },
    "max_score" : 1.3097506,//最大分数（权重）
    "hits" : [
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "2",
        "_score" : 1.3097506,
        "_source" : {
          "name" : "穿越火线",
          "age" : 13
        }
      },
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "4",
        "_score" : 1.179499,
        "_source" : {
          "name" : "穿越火线HD",
          "age" : 13
        }
      }
    ]
  }
}

“_source”: [“XXX”]查询指定值

GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  }
  , "_source": ["name"]
}

只查询索引的name列（如果要查询多个列，直接在[]里面写）

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.3097506,
    "hits" : [
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "2",
        "_score" : 1.3097506,
        "_source" : {
          "name" : "穿越火线"
        }
      },
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "4",
        "_score" : 1.179499,
        "_source" : {
          "name" : "穿越火线HD"
        }
      }
    ]
  }
}

sort排序

GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  },
  "sort": [
    {
      "age": {          //根据age字段进行排序
        "order": "asc"  //asc升序 desc降序
      }
    }
  ]
}

分页查询

from 从第几条数据开始
size 每页显示多少条数据

GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ], 
  "from": 0,
  "size": 2
}

布尔值查询

must查询

需要两个条件都匹配上。相当于and查询，（where name=XXX and age=18）

GET text1/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "穿越"
          }
        },
        {
          "match": {
            "age": "18"
          }
        }
      ]
    }
  }
}

should查询

相当于or操作，同上。

must_not操作

等价于!= , 同上上。

filter查询

查询age>15的文档

gt gte 大于，大于等于
lt lte 小于，小于等于

GET text1/_search
{
    "query": {
        "bool": {
            "filter": {
                "range": {
                    "age": {
                        "gt": 15
                    }
                }
            }
        }
    }
}

可以使用多个条件形成区间

GET text1/_search
{
    "query": {
        "bool": {
            "filter": {
                "range": {
                    "age": {
                        "gte": 5,
                        "lte": 18
                    }
                }
            }
        }
    }
}

匹配多个条件

GET text2/_search
{
  "query": {
    "match": {
      "tags": "女 唱"
    }
  }
}

说明：tags是一个数组，使用多个条件用空格隔开，返回匹配的值，匹配值越多，score越高

{
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 3,
            "relation" : "eq"
        },
        "max_score" : 1.0304216,
        "hits" : [
            {
                "_index" : "text2",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 1.0304216,
                "_source" : {
                    "name" : "万维网索王",
                    "tags" : [
                        "为歌",
                        "和牛",
                        "女"
                    ]
                }
            },
            {
                "_index" : "text2",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 0.4589591,
                "_source" : {
                    "name" : "李永康大魔王",
                    "tags" : [
                        "唱歌",
                        "跳舞",
                        "宅男"
                    ]
                }
            },
            {
                "_index" : "text2",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 0.4589591,
                "_source" : {
                    "name" : "最终搜索王",
                    "tags" : [
                        "唱歌",
                        "和牛",
                        "直男"
                    ]
                }
            }
        ]
    }
}

精确查找

term是通过倒排索引的词条进行精确查找的

分词的两种情况：

term，直接查询精确值
match，会使用分词器先进行解析，（先分析文档，然后通过分析的文档进行查询）

两个类型不同的情况

keyword
- 该类型不会被分词器分析
text
- 该类型会先被分词器分析

高亮

POST /text1/_search
{
    "query": {
        "term": {
            "address": "宝"
        }
    },"highlight": {
        "fields": {
            "address": {}
        }
    }
}

{
    "took" : 89,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 0.9227538,
        "hits" : [
            {
                "_index" : "text1",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 0.9227538,
                "_source" : {
                    "name" : "李永康",
                    "address" : "宝光路44号"
                },
                "highlight" : {
                    "address" : [
                        "<em>宝</em>光路44号"
                    ]
                }
            }
        ]
    }
}

也可以通过标签自定义高亮类型

pre_tags 前缀
post_tags 后缀

POST /text1/_search
{
    "query": {
        "term": {
            "address": "宝"
        }
    },"highlight": {
        "pre_tags": "<p class='key'>",
        "post_tags": "</p>", 
        "fields": {
            "address": {}
        }
    }
}

term和keyword的区别

term查询keyword字段。

term不会分词。而keyword字段也不分词。需要完全匹配才可。

term查询text字段。

因为text字段会分词，而term不分词，所以term查询的条件必须是text字段分词后的某一个。

match查询keyword字段

match会被分词，而keyword不会被分词，match的需要跟keyword的完全匹配可以。

match查询text字段

match分词，text也分词，只要match的分词结果和text的分词结果有相同的就匹配。

集成SpringBoot

导入依赖

elasticsearch的版本要和自己的版本对应

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

编写配置文件

@Configuration
public class ElasticSearchClientConfig {
    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1", 9200, "http"))
        );
        return client;
    }
}

索引相关API

创建索引

@Test
void createIndex() throws IOException {
    CreateIndexRequest kang_index = new CreateIndexRequest("kang_index");
    CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(kang_index, RequestOptions.DEFAULT);
    System.out.println(createIndexResponse.index());
}

查看索引

@Test
void ExistIndex() throws IOException {
    GetIndexRequest getIndexRequest = new GetIndexRequest("kang_index");
    boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
    System.out.println(exists);
}

删除索引

@Test
void delIndex() throws IOException {
    DeleteIndexRequest kang_index = new DeleteIndexRequest("kang_index");
    AcknowledgedResponse delete = restHighLevelClient.indices().delete(kang_index, RequestOptions.DEFAULT);
    System.out.println(delete.isAcknowledged());
}

文档的相关API

添加文档

@Test
void createdDocument() throws IOException {
    User user = new User("李永康", 19);
    IndexRequest request = new IndexRequest("kang_index");
    request.id("2");
    request.timeout("1s");
    request.source(JSON.toJSONString(user), XContentType.JSON);
    IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT);
    System.out.println(index.status()); //CREATED
}

测试文档是否存在

@Test
void testExistDocument() throws IOException {
    //测试文档的 没有index
    GetRequest request = new GetRequest("kang_index", "1");
    boolean exist = restHighLevelClient.exists(request, RequestOptions.DEFAULT);
    System.out.println("测试文档是否存在-----" + exist);
}

获取文档

@Test
void testGetDocument() throws IOException {
    //获取指定文档
    GetRequest request = new GetRequest("kang_index", "1");
    GetResponse documentFields = restHighLevelClient.get(request, RequestOptions.DEFAULT);
    System.out.println(documentFields.getSourceAsString());
}

修改文档

@Test
void testUpdateDocument() throws IOException {
    UpdateRequest updateRequest = new UpdateRequest("kang_index", "1");
    User user = new User("赵丽颖", 32);
    updateRequest.doc(JSON.toJSONString(user), XContentType.JSON);
    UpdateResponse update = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
    System.out.println(update.status());
}

删除文档

@Test
void testDeleteDocument() throws IOException {
    DeleteRequest deleteIndexRequest = new DeleteRequest("kang_index", "2");
    DeleteResponse delete = restHighLevelClient.delete(deleteIndexRequest, RequestOptions.DEFAULT);
    System.out.println(delete.status());
}

批量添加文档

//    测试批量添加
    @Test
    void testBulkAddRequest() throws IOException {
        ArrayList<User> users = new ArrayList<>();
        users.add(new User("lyk1", 18));
        users.add(new User("lyk2", 18));
        users.add(new User("lyk3", 18));
        users.add(new User("lyk4", 18));

        BulkRequest bulkRequest = new BulkRequest();

        for (int i = 0; i < users.size(); i++) {
            bulkRequest.add(
                    new IndexRequest("kang_index").id("" + (i + 1))
                            .source(JSON.toJSONString(users.get(i)), XContentType.JSON)
            );
        }

        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        System.out.println(bulk.status());
    }

查询文档

@Test
void testQuery() throws IOException {
    SearchRequest searchRequest = new SearchRequest("kang_index");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //构建搜索条件
    MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", "lyk"); //match n
    searchSourceBuilder.query(matchQueryBuilder);
    searchRequest.source(searchSourceBuilder);
    SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    for (SearchHit documentFields : search.getHits().getHits()) {
        System.out.println("测试查询文档--遍历参数--" + documentFields.getSourceAsMap());
    }
}