ElasticSearch实现分词模糊查询

一 ES分词模糊查询

1 什么是分词

在es中我们通过查询某个关键字,从而查询到关键字相关的数据。那么他是怎么去找的?
ES默认的支持对英文的分词,因为英文都是以空格分词,而对于中文的分词效果并不太好
也就是对一句话进行分词的叫做分词器。

2 ik分词器

2.1 安装ik分词器

##1. 将下载好的zip的压缩包拷贝到es的plugins目录下
##2. 在此目录下创建一个ik的目录
##3. 在ik目录下将刚才zip压缩包解压
[root@hadoop plugins]# mkdir ik
[root@hadoop plugins]# yum -y install unzip
[root@hadoop plugins]# mv elasticsearch-analysis-ik-6.5.3.zip ik/
[root@hadoop ik]# unzip elasticsearch-analysis-ik-6.5.3.zip && rm -f elasticsearch-analysis-ik-6.5.3.zip

##4. 如果是全分布式的话,所有的节点都得拷贝
##5. 重启es

3.2 模糊查询

  • 创建索引

    curl -HContent-Type:application/json -XPUT ‘http://hadoop:9200/chinese?pretty’ -d

    {
    “settgings”:{
    “number_of_shards”:3,
    “number_of_replicas”:1,
    “analysis”:{
    “analyzer”:{
    “ik”:{
    “tokenizer”:“ik_max_word”
    }
    }
    }
    },
    “mappings”:{
    “test”:{
    “properties”:{
    “content”:{
    “type”:“text”,
    “analyzer”:“ik_max_word”,
    “search_analyzer”:“ik_max_word”
    }
    }
    }
    }
    }

  • 导入数据

    curl -HContent-Type:application/json -XPUT ‘http://hadoop:9200/chinese/test/7?pretty’ -d

    {
    “content”:“SSM框架简要介绍_Mr.zhou_Zxy-CSDN博客_简要介绍ssm框架”
    }

    curl -HContent-Type:application/json -XPUT ‘http://hadoop:9200/chinese/test/8?pretty’ -d

    {
    “content”:“Mr.zhou_Zxy-CSDN博客”
    }

    curl -HContent-Type:application/json -XDELETE ‘http://hadoop:9200/chinese/test/10?pretty’ -d

    {
    “content”:“大数据之布隆过滤器学习_Mr.zhou_Zxy-CSDN博客”
    }

  • 测试

    curl -HContent-Type:application/json -XGET ‘http://hadoop:9200/chinese/_search?pretty’ -d

    {
    “query”:{
    “match”:{
    “content”:“Zxy”
    }
    }
    }

二 ES的API

1 导入依赖

<!-- ElasticSearch -->
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>6.5.3</version>
</dependency>

<!-- fastjson -->
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.71</version>
</dependency>

<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <version>1.18.8</version>
</dependency>

2 连接ES并实现增删改查

package com.bigdata.es;

import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

import java.net.InetAddress;
import java.net.UnknownHostException;

public class Demo1_QuickStart {
    public static void main(String[] args) throws UnknownHostException {
        //1. 获取客户端对象
        Settings settings = Settings.builder()
                .put("cluster.name", "zxy")
                .build();
        TransportClient client = new PreBuiltTransportClient(settings);
        //2. 设置连接到集群
        client.addTransportAddresses(
                new TransportAddress(InetAddress.getByName("***.***.***.**"), 9300)
        );

        //3. 查询
        GetResponse getResponse = client.prepareGet("zxy", "doc", "1").get();
        String sourceAsString = getResponse.getSourceAsString();
        System.out.println(sourceAsString);

        //4. 插入
        String json = "{"username":"Mr.zhou"}";
        IndexResponse indexResponse = client.prepareIndex("zxy", "doc", "5").setSource(json, XContentType.JSON).get();
        System.out.println(indexResponse.getIndex());
        System.out.println(indexResponse.getId());
        System.out.println(indexResponse.getType());

        //5. 删除
        DeleteResponse deleteResponse = client.prepareDelete("zxy", "doc", "5").get();
    }
}

三 Java代码实现模糊查询

package com.bigdata.es;

import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

import java.net.InetAddress;
import java.net.UnknownHostException;

public class Demo2_Search {
    public static void main(String[] args) throws UnknownHostException {
        //1. 获取客户端对象
        Settings settings = Settings.builder()
                .put("cluster.name", "zxy")
                .build();
        TransportClient client = new PreBuiltTransportClient(settings);
        //2. 设置连接到集群
        client.addTransportAddresses(
                new TransportAddress(InetAddress.getByName("***.***.***.**"), 9300)
        );

        //3. 模糊查询
        /**
         * 1. SearchType
         * DFS_QUERY_THEN_FETCH:会直接在es所在的节点直接匹配数据
         * QUERY_THEN_FETCH:在分布式环境中匹配数据
         * QUERY_AND_FETCH(过时)
         *
         * 2. QueryBuilders
         * MatchAllQueryBuilder:select * from
         * MatchQueryBuilder:select * from xxx where name like xxx
         * CommonTermsQueryBuilder:select * from xxx where name = xxx
         */
        SearchResponse searchResponse = client.prepareSearch("chinese")
                .setSearchType(SearchType.QUERY_THEN_FETCH) // 检索范围
                .setQuery(QueryBuilders.matchQuery("content", "Zxy"))
                .get();

        //4. 展示
        SearchHits hits = searchResponse.getHits(); // 获取到命中的数据集
        long totalHits = hits.totalHits; // 总的命中数
        float maxScore = hits.getMaxScore(); // 最大的分数
        System.out.println("totalHits :" + totalHits);
        System.out.println("maxScore :" + maxScore);
        SearchHit[] searchHits = hits.getHits(); // 获取命中的数据
        for (SearchHit hit : searchHits) {
            System.out.println("index :" + hit.getIndex());
            System.out.println("type :" + hit.getType());
            System.out.println("docId :" + hit.getId());
            System.out.println("content :" + hit.getSourceAsString());
        }
    }
}
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
RestHighLevelClient 是 Elasticsearch 官方提供的 Java 客户端,可以通过它来实现分词模糊、相似度查询。其中,分词查询可以使用 match 查询模糊查询可以使用 fuzzy 查询,相似度查询可以使用 more_like_this 查询。 例如,使用 match 查询实现分词查询: ```java SearchRequest searchRequest = new SearchRequest("index"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("field", "query"); searchSourceBuilder.query(matchQueryBuilder); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); ``` 使用 fuzzy 查询实现模糊查询: ```java SearchRequest searchRequest = new SearchRequest("index"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); FuzzyQueryBuilder fuzzyQueryBuilder = QueryBuilders.fuzzyQuery("field", "query"); searchSourceBuilder.query(fuzzyQueryBuilder); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); ``` 使用 more_like_this 查询实现相似度查询: ```java SearchRequest searchRequest = new SearchRequest("index"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); MoreLikeThisQueryBuilder moreLikeThisQueryBuilder = QueryBuilders.moreLikeThisQuery(new String[]{"field1", "field2"}, new String[]{"like_text1", "like_text2"}, null); searchSourceBuilder.query(moreLikeThisQueryBuilder); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值