Elasticsearch学习笔记

本文介绍了Elasticsearch的IK分词器使用,包括ik_smart和ik_max_word的测试。详细讲解了如何进行RESTful操作,如创建索引、更新文档、删除索引及各种查询方式,如match、term和keyword的区别。同时,还涵盖了Elasticsearch与SpringBoot的集成,包括依赖引入和配置文件编写。
摘要由CSDN通过智能技术生成

IK分词器

把一句中文分解成为一个个关键字,进行相关的匹配操作

如果使用中文:推荐使用IK分词器

  • 两个分词算法:ik_smart(最少切分),ik_max_word(最细粒度划分)

ik_smart(最少切分)测试

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "我来自宝鸡文理学院"
}

result

{
    "tokens" : [
        {
            "token" : "我",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_CHAR",
            "position" : 0
        },
        {
            "token" : "来自",
            "start_offset" : 1,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 1
        },
        {
            "token" : "宝鸡",
            "start_offset" : 3,
            "end_offset" : 5,
            "type" : "CN_WORD",
            "position" : 2
        },
        {
            "token" : "文理学院",
            "start_offset" : 5,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 3
        }
    ]
}

ik_max_word(最细粒度划分)测试

GET _analyze
{
    "analyzer": "ik_max_word",
    "text": "我来自宝鸡文理学院"
}

result

{
    "tokens" : [
        {
            "token" : "我",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_CHAR",
            "position" : 0
        },
        {
            "token" : "来自",
            "start_offset" : 1,
            "end_offset" : 3,
            "type" : "CN_WORD",
            "position" : 1
        },
        {
            "token" : "宝鸡",
            "start_offset" : 3,
            "end_offset" : 5,
            "type" : "CN_WORD",
            "position" : 2
        },
        {
            "token" : "文理学院",
            "start_offset" : 5,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 3
        },
        {
            "token" : "文理",
            "start_offset" : 5,
            "end_offset" : 7,
            "type" : "CN_WORD",
            "position" : 4
        },
        {
            "token" : "理学院",
            "start_offset" : 6,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 5
        },
        {
            "token" : "理学",
            "start_offset" : 6,
            "end_offset" : 8,
            "type" : "CN_WORD",
            "position" : 6
        },
        {
            "token" : "学院",
            "start_offset" : 7,
            "end_offset" : 9,
            "type" : "CN_WORD",
            "position" : 7
        }
    ]
}

Rest风格

基本的Rest命令

methodurl描述
PUTlocalhost:9200/索引名称/类型名称/文档id创建文档(指定文档id)
POSTlocalhost:9200/索引名称/类型名称创建文档(随机文档id)
POSTlocalhost:9200/索引名称/类型名称/文档id/_update修改文档
DELETElocalhost:9200/索引名称/类型名称/文档id删除文档
GETlocalhost:9200/索引名称/类型名称/文档id查询文档(通过文档id)
POSTlocalhost:9200/索引名称/类型名称/_search查询所有数据

索引的基本操作

创建一个索引

PUT /索引名/类型名(新版本逐步废弃)/文档ID
{
	字段名:字段值
}
PUT /text1/type1/1
{
  "name": "李永康",
  "age": 18
}

创建索引(数据库)和字段

PUT /索引名
{
  "mappings": {
    "properties": {
        属性名: {
            类型
        }
    }
  }
}

创建一个text2索引,name字段为text,address字段为text,age字段为long。

如果没有指定字段的类型,ES会帮我们指定字段类型!

PUT /text2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "long"
      }
    }
  }
}

获取索引的信息

GET 索引名	
{
  "text1" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1633788361458",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "Qjhtls6BSh6pBmcw4XvCug",
        "version" : {
          "created" : "7060199"
        },
        "provided_name" : "text1"
      }
    }
  }
}

更新一个文档

方法一:覆盖

覆盖就是再put一个相同的文档。

方法二:更改
PUT /索引名/类型名/文档ID
{
  "doc":{
        要更改的字段:更新内容
  }
}

把text1索引的type1类型1号文档的name属性改为:“李永康12138”

POST text1/type1/1/_update
{
  "doc":{
    "name":"李永康12138"
  }
}

删除索引

DELETE /索引名/类型名/文档ID

根据要求可以删除索引或者文档

文档的相关操作

添加文档

PUT /索引名/类型名(新版本逐步废弃)/文档ID
{
	字段名:字段值
}
PUT /text1/type1/1
{
  "name": "李永康",
  "age": 18
}

获取文档信息

GET 索引名/类型名/文档ID
GET text1/type1/2

简单查询

GET 索引名/_search?q=属性名:

复杂查询

match匹配

match查询会使用分词器解析,先解析文档,然后通过分析的文档进行查询。

判断字段name含有"穿越"的文档

GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  }
}

结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,//查询数量
      "relation" : "eq"//查询条件 eq
    },
    "max_score" : 1.3097506,//最大分数(权重)
    "hits" : [
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "2",
        "_score" : 1.3097506,
        "_source" : {
          "name" : "穿越火线",
          "age" : 13
        }
      },
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "4",
        "_score" : 1.179499,
        "_source" : {
          "name" : "穿越火线HD",
          "age" : 13
        }
      }
    ]
  }
}
“_source”: [“XXX”]查询指定值
GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  }
  , "_source": ["name"]
}

只查询索引的name列(如果要查询多个列,直接在[]里面写)

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.3097506,
    "hits" : [
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "2",
        "_score" : 1.3097506,
        "_source" : {
          "name" : "穿越火线"
        }
      },
      {
        "_index" : "text1",
        "_type" : "type1",
        "_id" : "4",
        "_score" : 1.179499,
        "_source" : {
          "name" : "穿越火线HD"
        }
      }
    ]
  }
}
sort排序
GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  },
  "sort": [
    {
      "age": {          //根据age字段进行排序
        "order": "asc"  //asc升序 desc降序
      }
    }
  ]
}
分页查询
  • from 从第几条数据开始
  • size 每页显示多少条数据
GET text1/_search
{
  "query": {
    "match": {
      "name": "穿越"
    }
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ], 
  "from": 0,
  "size": 2
}

布尔值查询

must查询

需要两个条件都匹配上。相当于and查询,(where name=XXX and age=18)

GET text1/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "穿越"
          }
        },
        {
          "match": {
            "age": "18"
          }
        }
      ]
    }
  }
}
should查询

相当于or操作,同上。

must_not操作

等价于!= , 同上上。

filter查询

查询age>15的文档

  • gt gte 大于,大于等于
  • lt lte 小于,小于等于
GET text1/_search
{
    "query": {
        "bool": {
            "filter": {
                "range": {
                    "age": {
                        "gt": 15
                    }
                }
            }
        }
    }
}

可以使用多个条件形成区间

GET text1/_search
{
    "query": {
        "bool": {
            "filter": {
                "range": {
                    "age": {
                        "gte": 5,
                        "lte": 18
                    }
                }
            }
        }
    }
}

匹配多个条件

GET text2/_search
{
  "query": {
    "match": {
      "tags": "女 唱"
    }
  }
}

说明:tags是一个数组,使用多个条件用空格隔开,返回匹配的值,匹配值越多,score越高

{
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 3,
            "relation" : "eq"
        },
        "max_score" : 1.0304216,
        "hits" : [
            {
                "_index" : "text2",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 1.0304216,
                "_source" : {
                    "name" : "万维网索王",
                    "tags" : [
                        "为歌",
                        "和牛",
                        "女"
                    ]
                }
            },
            {
                "_index" : "text2",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 0.4589591,
                "_source" : {
                    "name" : "李永康大魔王",
                    "tags" : [
                        "唱歌",
                        "跳舞",
                        "宅男"
                    ]
                }
            },
            {
                "_index" : "text2",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 0.4589591,
                "_source" : {
                    "name" : "最终搜索王",
                    "tags" : [
                        "唱歌",
                        "和牛",
                        "直男"
                    ]
                }
            }
        ]
    }
}

精确查找

term是通过倒排索引的词条进行精确查找的

分词的两种情况:

  • term,直接查询精确值
  • match,会使用分词器先进行解析,(先分析文档,然后通过分析的文档进行查询)

两个类型不同的情况

  • keyword
    • 该类型不会被分词器分析
  • text
    • 该类型先被分词器分析

高亮

POST /text1/_search
{
    "query": {
        "term": {
            "address": "宝"
        }
    },"highlight": {
        "fields": {
            "address": {}
        }
    }
}
{
    "took" : 89,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 0.9227538,
        "hits" : [
            {
                "_index" : "text1",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 0.9227538,
                "_source" : {
                    "name" : "李永康",
                    "address" : "宝光路44号"
                },
                "highlight" : {
                    "address" : [
                        "<em>宝</em>光路44号"
                    ]
                }
            }
        ]
    }
}

也可以通过标签自定义高亮类型

  • pre_tags 前缀
  • post_tags 后缀
POST /text1/_search
{
    "query": {
        "term": {
            "address": "宝"
        }
    },"highlight": {
        "pre_tags": "<p class='key'>",
        "post_tags": "</p>", 
        "fields": {
            "address": {}
        }
    }
}

term和keyword的区别

  • term查询keyword字段。

term不会分词。而keyword字段也不分词。需要完全匹配才可

  • term查询text字段。

因为text字段会分词,而term不分词,所以term查询的条件必须是text字段分词后的某一个

  • match查询keyword字段

match会被分词,而keyword不会被分词,match的需要跟keyword的完全匹配可以

  • match查询text字段

match分词,text也分词,只要match的分词结果和text的分词结果有相同的就匹配

集成SpringBoot

导入依赖

elasticsearch的版本要和自己的版本对应

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

编写配置文件

@Configuration
public class ElasticSearchClientConfig {
    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1", 9200, "http"))
        );
        return client;
    }
}

索引相关API

创建索引

@Test
void createIndex() throws IOException {
    CreateIndexRequest kang_index = new CreateIndexRequest("kang_index");
    CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(kang_index, RequestOptions.DEFAULT);
    System.out.println(createIndexResponse.index());
}

查看索引

@Test
void ExistIndex() throws IOException {
    GetIndexRequest getIndexRequest = new GetIndexRequest("kang_index");
    boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
    System.out.println(exists);
}

删除索引

@Test
void delIndex() throws IOException {
    DeleteIndexRequest kang_index = new DeleteIndexRequest("kang_index");
    AcknowledgedResponse delete = restHighLevelClient.indices().delete(kang_index, RequestOptions.DEFAULT);
    System.out.println(delete.isAcknowledged());
}

文档的相关API

添加文档

@Test
void createdDocument() throws IOException {
    User user = new User("李永康", 19);
    IndexRequest request = new IndexRequest("kang_index");
    request.id("2");
    request.timeout("1s");
    request.source(JSON.toJSONString(user), XContentType.JSON);
    IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT);
    System.out.println(index.status()); //CREATED
}

测试文档是否存在

@Test
void testExistDocument() throws IOException {
    //测试文档的 没有index
    GetRequest request = new GetRequest("kang_index", "1");
    boolean exist = restHighLevelClient.exists(request, RequestOptions.DEFAULT);
    System.out.println("测试文档是否存在-----" + exist);
}

获取文档

@Test
void testGetDocument() throws IOException {
    //获取指定文档
    GetRequest request = new GetRequest("kang_index", "1");
    GetResponse documentFields = restHighLevelClient.get(request, RequestOptions.DEFAULT);
    System.out.println(documentFields.getSourceAsString());
}

修改文档

@Test
void testUpdateDocument() throws IOException {
    UpdateRequest updateRequest = new UpdateRequest("kang_index", "1");
    User user = new User("赵丽颖", 32);
    updateRequest.doc(JSON.toJSONString(user), XContentType.JSON);
    UpdateResponse update = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
    System.out.println(update.status());
}

删除文档

@Test
void testDeleteDocument() throws IOException {
    DeleteRequest deleteIndexRequest = new DeleteRequest("kang_index", "2");
    DeleteResponse delete = restHighLevelClient.delete(deleteIndexRequest, RequestOptions.DEFAULT);
    System.out.println(delete.status());
}

批量添加文档

//    测试批量添加
    @Test
    void testBulkAddRequest() throws IOException {
        ArrayList<User> users = new ArrayList<>();
        users.add(new User("lyk1", 18));
        users.add(new User("lyk2", 18));
        users.add(new User("lyk3", 18));
        users.add(new User("lyk4", 18));

        BulkRequest bulkRequest = new BulkRequest();

        for (int i = 0; i < users.size(); i++) {
            bulkRequest.add(
                    new IndexRequest("kang_index").id("" + (i + 1))
                            .source(JSON.toJSONString(users.get(i)), XContentType.JSON)
            );
        }

        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        System.out.println(bulk.status());
    }

查询文档

@Test
void testQuery() throws IOException {
    SearchRequest searchRequest = new SearchRequest("kang_index");
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //构建搜索条件
    MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("name", "lyk"); //match n
    searchSourceBuilder.query(matchQueryBuilder);
    searchRequest.source(searchSourceBuilder);
    SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    for (SearchHit documentFields : search.getHits().getHits()) {
        System.out.println("测试查询文档--遍历参数--" + documentFields.getSourceAsMap());
    }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值