springboot整合elasticsearch5.x以及IK分词器做全文检索

文章我会分三部分来讲解:

第一部分,window下搭建elasticsearch的环境,以及其他插件

第二部分,springboot整合elasticsearch(有一定的分词能力)

第三部分,springboot整合elasticsearch以及ik分词器,做全字段检索(完全分词)

(我的第二篇,《springboot2.x 整合 elasticsearch 创建索引的方式》有更实用的意义,弃用postman创建索引方式,附上项目GitHub地址

一:window安装部署elasticsearch和elasticsearch-head

现在直接在window下安装部署elasticsearch5.5.3。

这里是下载elasticsearch的官网网址:https://www.elastic.co/cn/downloads/elasticsearch

步骤如下:

接着到github主页,下载相应的版本的zip压缩包。

然后解压到磁盘的某个路径,路径不要包含中文字符。

然后进入bin目录下,双击elasticsearch.bat文件,启动elasticsearch。也可以用控制台,输入elasticsearch.bat,然后按回车键,启动elasticsearch。

启动完成后,在浏览器,输入localhost:9200。查看启动效果,如下图,则表示安装启动elasticsearch5.5.3成功。

接下来,对elasticsearch5.5.3进行一些配置,让elasticsearch-head可以访问elasticsearch。elasticsearch-head是让elasticsearch数据可视化的插件。

在window下使用elasticsearch-head插件,要有node.js环境

安装node.js: node.js下载:https://nodejs.org/en/download/ 

安装grunt: 

  • grunt是一个很方便的构建工具,可以进行打包压缩、测试、执行等等的工作,5.x里的head插件就是通过grunt启动的。因此需要安装grunt
  • 注意:路径切到nodejs安装目录下,[我安装在C:\Program Files\nodejs],输入命令: npm install -g grunt-cli
  • -g代表全局安装
  • 查看版本号, 输入命令:grunt -version

elasticsearch-head的下载官网网址:https://github.com/mobz/elasticsearch-head

同样下载zip包,然后指定路径解压,路径也不要有中文字符。(我是将elasticsearch和elasticsearch-head放在E盘路径下)。

①:修改elasticsearch-head配置,找到 elasticsearch-head-master下的Gruntfile.js,文件, 新增:hostname: '0.0.0.0'.

connect: {        	
           server: {           		 
              options: {               			
                 hostname: '0.0.0.0',               			
                 port: 9100,                		 
                 base: '.',                		 
                 keepalive: true           			 
                   }       			
             }  		
      }  

②: 修改elasticsearch的配置文件,进入elasticsearch的config目录下,修改elasticsearch.yml文件里面的配置。

cluster.name: elasticsearch
node.name: node-es

network.host: 127.0.0.1
http.port: 9200
transport.tcp.port: 9300

# 增加新的参数,这样head插件就可以访问es
http.cors.enabled: true
http.cors.allow-origin: "*"

上面7行配置代码,全部都是需要往elasticsearch.yml新增的配置,我特意将它们分割三部分。第一部分的配置,可以在启动elasticsearch之后的页面直接能看到(如我上面启动后截图的①②所示)。 最后两个,http.cors.enabled: true, 和 http.cors.allow-origin: "*", 是让head插件访问链接用的配置,不可缺少。

重启elasticsearch。接着在elasticsearch-head-master的目录下,输入命令:grunt server,启动head插件

然后在浏览器输入: localhost:9100

到这里,elasticsearch和elasticsearch-head的安装与配置已经完成。

elasticsearch以及elasticsearch-head的安装与配置,推荐一个网址,这个哥们写得很好,我的配置都是按他的来配置的:https://www.cnblogs.com/puke/p/7145687.html?utm_source=itdadao&utm_medium=referral

 

二:springboot整合elasticsearch

到springboot官网看你就知道,整合elasticsearch有三种方式:

  1. Connecting to Elasticsearch by REST clients
  2. Connecting to Elasticsearch by Using Jest
  3. Connecting to Elasticsearch by Using Spring Data

这里我是使用第三种方式,spring Data elasticsearch

新建一个springboot工程,我的springboot版本是2.0.0

我的pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        // 可以直接在这里修改你的springboot版本,我的springboot新建时是2.1.6的,然后在这里修改为2.0.0
        <version>2.0.0.RELEASE</version>     
        <relativePath/>
    </parent>
    <groupId>com.gosuncn</groupId>
    <artifactId>esdemo</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>esdemo</name>
    <description>Demo project for Spring Boot</description>

    <properties>
        <java.version>1.8</java.version>
        <elasticsearch.version>5.5.3</elasticsearch.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>5.5.3</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

然后可以看你springboot导入的elasticsearch.jar的版本

因为spring data elasticsearch 与elasticsearch是要进行版本适配的,不匹配,是不能进行连接的。 

细心的朋友,可能已经发现,我的elasticsearch已经存在一个local_library索引了。那我们就以这个索引为例。

这里,我先说明一下,我的local_library索引是怎样创建的。使用postman,提交PUT请求,即可创建local_library索引,以及这个索引下的book类型。

body实体里面的内容如下:在local_library索引下,创建一个book类型。

{
    "mappings":{
        "book":{
            "properties":{
                "book_id":{
                    "type":"long",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "book_code":{
                    "type":"text",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "book_name":{
                    "type":"text",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "book_price":{
                    "type":"integer",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "book_author":{
                    "type":"text",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "book_desc":{
                    "type":"text",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                }
            }
        }
    }
}

在elasticsearch-head查看效果如下:(备注:)

接着,有索引,有类型了,可以开始插入数据。插入数据有两种方式,第一种是用postman, 第二种是在springboot工程的测试类中插入。

①:用postman插入数据示例:

插入的效果,可以看上面数据截图。

②:用实体类插入数据(包括新增、查询)

新建一个Library实体类:(备注,实体类已经使用lombok插件了)

@Data
@NoArgsConstructor
@AllArgsConstructor
@Document(indexName = "local_library", type = "book")
public class Library {
    /**
     *     index:是否设置分词
     *     analyzer:存储时使用的分词器
     *          ik_max_word
     *          ik_word
     *     searchAnalyze:搜索时使用的分词器
     *     store:是否存储
     *     type: 数据类型
     */

    @Id
    private Integer book_id;
    private String book_code;    
    private String book_name;   
    private Integer book_price;   
    private String book_author;    
    private String book_desc;

}

然后编写一个LibraryRepository:

@Repository
public interface LibraryRepository extends ElasticsearchRepository<Library, Long> {
}

然后在测试类中,插入数据:

import com.gosuncn.esdemo.domin.Library;
import com.gosuncn.esdemo.repository.LibraryRepository;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryStringQueryBuilder;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Sort;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.test.context.junit4.SpringRunner;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

@RunWith(SpringRunner.class)
@SpringBootTest
public class EsdemoApplicationTests {

    @Autowired
    ElasticsearchTemplate elasticsearchTemplate;
    @Autowired
    LibraryRepository libraryRepository;

    /**
     * 插入数据
     */
    @Test
    public void testInsert(){
        libraryRepository.save(new Library(42, "A00042", "明史简述", 59, "吴晗", "吴晗背景uniworsity厉害"));
        libraryRepository.save(new Library(43, "A00043", "傅雷家书", 99, "傅聪", "都是NB,class大家u"));
        libraryRepository.save(new Library(24, "A00942", "时间简史", 169, "霍金", "教授宇宙大爆发的59年历史"));
        libraryRepository.save(new Library(25, "A00925", "我的前半生", 39, "方舟89子", "都是生活,每晚9点"));
        libraryRepository.save(new Library(29, "A00029", "围9城", 139, "钱钟书", "你想出城?不存在的"));
    }
}

到此,这是第二种插入数据的方式。

 

然后数据有了,就可以进行查询了。查询代码如下:

    // 全字段查询,不分页
    @Test
    public void testSearch(){
        try {
            String searchStr = "三西阿";
            QueryStringQueryBuilder builder = new QueryStringQueryBuilder(searchStr);
            Iterable<Library> search = libraryRepository.search(builder);
            Iterator<Library> iterator = search.iterator();
            while (iterator.hasNext()){
                System.out.println("--> 数据:"+iterator.next());
            }
        }catch (Exception e){
            System.out.println("---> 异常信息: "+e);
        }
    }
    // 全字段查询, 已经分页
    @Test
    public void testSearchByPage(){
        try {
            String searchStr = "三西阿";
            QueryStringQueryBuilder builder = new QueryStringQueryBuilder(searchStr);
            Iterable<Library> search = libraryRepository.search(builder, PageRequest.of(0,2));
            Iterator<Library> iterator = search.iterator();
            while (iterator.hasNext()){
                System.out.println("--> 数据:"+iterator.next());
            }
        }catch (Exception e){
            System.out.println("---> 异常信息: "+e);
        }
    }

运行第一个test的结果是:

--> 数据:Library{book_id=3, book_code='A0003', book_name='西游记', book_price=99, book_author='吴承恩', book_desc='冒险小说!'}
--> 数据:Library{book_id=6, book_code='A0006', book_name='欢乐颂', book_price=399, book_author='阿耐', book_desc='都是生活真实隐射!'}
--> 数据:Library{book_id=7, book_code='A0007', book_name='都挺好', book_price=299, book_author='阿耐', book_desc='折射现实家庭矛盾!'}
--> 数据:Library{book_id=5, book_code='A0005', book_name='三国演义', book_price=198, book_author='罗贯中', book_desc='三国霸王游戏!'}
--> 数据:Library{book_id=18, book_code='A00018', book_name='编译原理', book_price=79, book_author='赵建华', book_desc='三十多小时,98.5'}
--> 数据:Library{book_id=1, book_code='A0001', book_name='琴帝', book_price=180, book_author='唐家三少', book_desc='最好看的玄幻小说!'}
--> 数据:Library{book_id=12, book_code='A00012', book_name='三重门', book_price=69, book_author='韩寒', book_desc='这是一个批评现实的真实故事'}

三:IK分词器的使用

前面的都是小case,到这里才是我们做全文检索的重点。

3.1:为什么要使用IK分词器

ik分词器,会将你的查询条件语句拆分成一个一个词。比如:查询语句是“我爱中华人民共和国”,使用ik_max_word来分词,则会出现下面的结果: 【“我”,“爱”,“中华人民共和国”,“中华人民”,“中华”,“华人”,“人民共和国”,“人民”,“人民共和国”,“共和”,“国”】。具体看下图:(使用postman)

body结果代码如下:

{
    "tokens": [
        {
            "token": "我",
            "start_offset": 0,
            "end_offset": 1,
            "type": "CN_CHAR",
            "position": 0
        },
        {
            "token": "爱",
            "start_offset": 1,
            "end_offset": 2,
            "type": "CN_CHAR",
            "position": 1
        },
        {
            "token": "中华人民共和国",
            "start_offset": 2,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "中华人民",
            "start_offset": 2,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "中华",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "华人",
            "start_offset": 3,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "人民共和国",
            "start_offset": 4,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "人民",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "共和国",
            "start_offset": 6,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 8
        },
        {
            "token": "共和",
            "start_offset": 6,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 9
        },
        {
            "token": "国",
            "start_offset": 8,
            "end_offset": 9,
            "type": "CN_CHAR",
            "position": 10
        }
    ]
}

补充说明一下的区别:ik_max_word 和 ik_smart 

ik_max_word 会将文本做最细粒度的拆分。
ik_smart 会做最粗粒度的拆分。

ik_max_word  和 ik_smart 是两个不同类型的分词器。

3.2 安装ik分词器

ik分词器的下载与安装:(备注:ik分词器的版本,必须与elasticsearch版本一致

GitHub下载地址: https://github.com/medcl/elasticsearch-analysis-ik/releases

找到5.5.3版本的

点击下载elasticsearch-analysis-ik-5.5.3.zip 包,然后解压,打开解压后的文件夹,里面有如下文件

接着,把这里的所有文件复制到elasticsearch的plugins目录下,自己新建的ik文件夹里面(说明: ik文件夹是自己新建的,至于为什么要新建一个ik文件夹。其实到后来,你会在使用elasticsearch时添加很多的插件,每一个插件都是放在plugins目录下的,所以建一个文件夹,单独存放某个插件,会有一个很清晰的文件目录)

然后重启elasticsearch。

出现红圈所示,则说明插件安装成功。

检验效果:启动elasticsearch成功后,直接在浏览器输入:http://localhost:9200/_analyze?analyzer=ik_max_word&pretty=true&text=我爱中华人民共和国

备注:

如果是ik分词插件是6.x版本的,只能用postman测试,而且查询条件要放在body体内,如下图:

如果直接在url加上查询条件,如: http://localhost:9200/_analyze?analyzer=ik_max_word&pretty=true&text=大胆爱,不用地阿达女,加

会抛出异常:

{
  "error": {
    "root_cause": [
      {
        "type": "parse_exception",
        "reason": "request body or source parameter is required"
      }
    ],
    "type": "parse_exception",
    "reason": "request body or source parameter is required"
  },
  "status": 400
}

3.3 自定义分词器

上面说的ik_max_word 和ik_smart 这两种分词器,其实不是我想要的效果。比如我要查询“cla好好学习,15成功”。

浏览器输入:http://localhost:9200/_analyze?analyzer=ik_max_word&pretty=true&text=cla好好学习,15成功

 

我想要将查询条件分拆成:【"c", "l", "a", "好", "好", "学", "习", ",", "1", "5", "成", "功"】这种,单个字,单个词,单个数字的最小粒度。但是ik_max_word 已经是不能满足我的需求了,那我们就自定义一个分词器(analyzer)

使用NGram 来定义一个分词器: 官网:https://www.elastic.co/guide/en/elasticsearch/reference/6.4/analysis-ngram-tokenizer.html

这个网址里面的内容很重要,看不懂英文的小伙伴,可以用浏览器的插件,将它翻译成中文。

这句话:They are useful for querying languages that don’t use spaces or that have long compound words, like German.

意思是:它们对于查询不使用空格或具有长复合词的语言非常有用,比如德语。

我就直接使用官网的例子进行说明:输入的查询条件是:“Quick Fox”.

POST _analyze
{
  "tokenizer": "ngram",
  "text": "Quick Fox"
}

分词后的结果是:

[ Q, Qu, u, ui, i, ic, c, ck, k, "k ", " ", " F", F, Fo, o, ox, x ]

可以给 ngram 配置下面三个参数:

①:min_gram :字符的最小长度,默认是1.

②:max_gram:字符的最大长度,默认是2.

③:token_chars: 可以理解为elasticsearch分割字符的点。其中又可以包含5个属性。letter ,digit ,whitespace ,punctuation ,symbol 。

知道这些配置后,开始实例:(实例里面,自定义一个 myanalyzer 分词器,并创建my_user索引,和user类型)

PUT localhost:9200/my_user/

{
	"settings": {
    	"analysis": {
    		"analyzer": {
        		"myanalyzer": {
        			"tokenizer": "mytokenizer"
        		}
    		},
    		"tokenizer": {
        		"mytokenizer": {
        		"type": "ngram",
        		"min_gram": 1,
        		"max_gram": 2,
        		"token_chars": [
            		"letter",
            		"digit",
            		"whitespace",
            		"punctuation",
            		"symbol"
        		]
        		}
    		}
    	}
	},
    "mappings":{
        "user":{
            "properties":{
                "id":{
                    "type":"long",
                    "store": true,
                    "index": false,
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "username":{
                    "type":"text",
                    "store": true,
                    "index": true,
                    "analyzer": "myanalyzer",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "password":{
                    "type":"text",
                    "store": true,
                    "index": true,
                    "analyzer": "myanalyzer",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "age":{
                    "type":"integer",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                },
                "ip":{
                    "type":"text",
                    "store": true,
                    "index": true,
                    "analyzer": "myanalyzer",
                    "fields":{
                        "keyword":{
                            "type":"keyword"
                        }
                    }
                }
            }
        }
    }
}

postman截图:

或者可以使用这种方式:

POST localhost:9200/my_user/_analyze?analyzer=myanalyzer&pretty=true

{
  "text": "2 Quick Fo18陈xes姥爷."
}

接着用postman查询构建的这个my_user索引,类型为user。

POST localhost:9200/my_user/_analyze

{
  "analyzer": "myanalyzer",
  "text": "2 Quick Fo18陈xes姥爷."
}


// 结果
{
    "tokens": [
        {
            "token": "2",
            "start_offset": 0,
            "end_offset": 1,
            "type": "word",
            "position": 0
        },
        {
            "token": "2 ",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 1
        },
        {
            "token": " ",
            "start_offset": 1,
            "end_offset": 2,
            "type": "word",
            "position": 2
        },
        {
            "token": " Q",
            "start_offset": 1,
            "end_offset": 3,
            "type": "word",
            "position": 3
        },
        {
            "token": "Q",
            "start_offset": 2,
            "end_offset": 3,
            "type": "word",
            "position": 4
        },
        {
            "token": "Qu",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 5
        },
        {
            "token": "u",
            "start_offset": 3,
            "end_offset": 4,
            "type": "word",
            "position": 6
        },
        {
            "token": "ui",
            "start_offset": 3,
            "end_offset": 5,
            "type": "word",
            "position": 7
        },
        {
            "token": "i",
            "start_offset": 4,
            "end_offset": 5,
            "type": "word",
            "position": 8
        },
        {
            "token": "ic",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 9
        },
        {
            "token": "c",
            "start_offset": 5,
            "end_offset": 6,
            "type": "word",
            "position": 10
        },
        {
            "token": "ck",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 11
        },
        {
            "token": "k",
            "start_offset": 6,
            "end_offset": 7,
            "type": "word",
            "position": 12
        },
        {
            "token": "k ",
            "start_offset": 6,
            "end_offset": 8,
            "type": "word",
            "position": 13
        },
        {
            "token": " ",
            "start_offset": 7,
            "end_offset": 8,
            "type": "word",
            "position": 14
        },
        {
            "token": " F",
            "start_offset": 7,
            "end_offset": 9,
            "type": "word",
            "position": 15
        },
        {
            "token": "F",
            "start_offset": 8,
            "end_offset": 9,
            "type": "word",
            "position": 16
        },
        {
            "token": "Fo",
            "start_offset": 8,
            "end_offset": 10,
            "type": "word",
            "position": 17
        },
        {
            "token": "o",
            "start_offset": 9,
            "end_offset": 10,
            "type": "word",
            "position": 18
        },
        {
            "token": "o1",
            "start_offset": 9,
            "end_offset": 11,
            "type": "word",
            "position": 19
        },
        {
            "token": "1",
            "start_offset": 10,
            "end_offset": 11,
            "type": "word",
            "position": 20
        },
        {
            "token": "18",
            "start_offset": 10,
            "end_offset": 12,
            "type": "word",
            "position": 21
        },
        {
            "token": "8",
            "start_offset": 11,
            "end_offset": 12,
            "type": "word",
            "position": 22
        },
        {
            "token": "8陈",
            "start_offset": 11,
            "end_offset": 13,
            "type": "word",
            "position": 23
        },
        {
            "token": "陈",
            "start_offset": 12,
            "end_offset": 13,
            "type": "word",
            "position": 24
        },
        {
            "token": "陈x",
            "start_offset": 12,
            "end_offset": 14,
            "type": "word",
            "position": 25
        },
        {
            "token": "x",
            "start_offset": 13,
            "end_offset": 14,
            "type": "word",
            "position": 26
        },
        {
            "token": "xe",
            "start_offset": 13,
            "end_offset": 15,
            "type": "word",
            "position": 27
        },
        {
            "token": "e",
            "start_offset": 14,
            "end_offset": 15,
            "type": "word",
            "position": 28
        },
        {
            "token": "es",
            "start_offset": 14,
            "end_offset": 16,
            "type": "word",
            "position": 29
        },
        {
            "token": "s",
            "start_offset": 15,
            "end_offset": 16,
            "type": "word",
            "position": 30
        },
        {
            "token": "s姥",
            "start_offset": 15,
            "end_offset": 17,
            "type": "word",
            "position": 31
        },
        {
            "token": "姥",
            "start_offset": 16,
            "end_offset": 17,
            "type": "word",
            "position": 32
        },
        {
            "token": "姥爷",
            "start_offset": 16,
            "end_offset": 18,
            "type": "word",
            "position": 33
        },
        {
            "token": "爷",
            "start_offset": 17,
            "end_offset": 18,
            "type": "word",
            "position": 34
        },
        {
            "token": "爷.",
            "start_offset": 17,
            "end_offset": 19,
            "type": "word",
            "position": 35
        },
        {
            "token": ".",
            "start_offset": 18,
            "end_offset": 19,
            "type": "word",
            "position": 36
        }
    ]
}

postman截图:

得到的结果是:分词成功。

3.4 重新构建springboot使用myanalyzer分词器,做全文检索

新建一个User实体:(对应elasticsearch中的my_user索引,和user类型)

@Data
@NoArgsConstructor
@AllArgsConstructor
@Document(indexName = "my_user", type = "user")
public class User {
    /**
     *     index:是否设置分词
     *     analyzer:存储时使用的分词器
     *     searchAnalyze:搜索时使用的分词器
     *     store:是否存储
     *     type: 数据类型
     */

    @Id
    @Field(store = true, index = false, type =FieldType.Integer)
    private Integer id;
    @Field(store = true, index = true, type = FieldType.keyword, analyzer = "myanalyzer", searchAnalyzer = "myanalyzer")
    private String username;
    @Field(store = true, index = true, type = FieldType.keyword, analyzer = "myanalyzer", searchAnalyzer = "myanalyzer")
    private String password;
    @Field(store = true, index = true, type = FieldType.Integer, analyzer = "myanalyzer", searchAnalyzer = "myanalyzer")
    private Integer age;
    @Field(store = true, index = true, type = FieldType.keyword, analyzer = "myanalyzer", searchAnalyzer = "myanalyzer")
    private String ip;
    
}

新建UserRepository接口。

@Repository
public interface UserRepository extends ElasticsearchRepository<User, Long> {
}

在测试类中进行测试: (插入数据)

@RunWith(SpringRunner.class)
@SpringBootTest
public class UserTest {

    @Autowired
    UserRepository userRepository;

    // 插入数据
    @Test
    public void testInsert(){
        userRepository.save(new User(3, "高新兴", "gao45", 18, "我登录的ip地址是:127.145.0.11"));
        userRepository.save(new User(4, "神州@数码", "shen18", 18, "我登录的ip地址是:127.124.0.11"));
        userRepository.save(new User(6, "西南大学", "xida", 18, "我登录的ip地址是:127.126.0.11"));
        userRepository.save(new User(7, "北京大学", "beida", 18, "我记录的ip地址是:127.127.0.11"));
        userRepository.save(new User(8, "姚#明", "yao210", 18, "我登录的@#%ip地址是:127.248.0.11"));
        userRepository.save(new User(9, "邓紫棋", "dengml", 18, "我使用的ip地址是:127.249.0.11"));
        userRepository.save(new User(10, "李荣浩", "li06", 18, "我使用的@ip地址是:127.234.0.11"));
        userRepository.save(new User(11, "陈奕迅", "19ch8en", 18, "我登录的ip地址是:127.219.0.11"));
        userRepository.save(new User(12, "周杰伦", "xiayu2014", 18, "我登录的ip地址是:127.0.0.11"));
        userRepository.save(new User(13, "林俊杰", "zho99", 18, "我登录,的ip地址是:127.111.0.11"));
        userRepository.save(new User(137, "林薇因", "zho99", 18, "我登录,的ip地址是:127.111.0.11"));
    }

插入数据后, elasticsearch里面的数据显示如下:

查询数据:

    @Test
    public void testQueryByStr(){
        try {
            String searchStr = "陈夏天u马立,@45";
            QueryStringQueryBuilder builder = new QueryStringQueryBuilder(searchStr);

            //  重点是下面这行代码
            builder.analyzer("myanalyzer").field("username").field("password").field("ip");
            Iterable<User> search = userRepository.search(builder);
            Iterator<User> iterator = search.iterator();
            while (iterator.hasNext()){
                System.out.println("---> 匹配数据: "+iterator.next());
            }
        }catch (Exception e){
            System.out.println("---> 异常信息 "+e);
        }
    }

查询结果:

---> 匹配数据: User(id=33, username=陈%喜华, password=gao45, age=18, ip=我登录的ip地址是:127.145.0.11)
---> 匹配数据: User(id=3, username=高新兴, password=gao45, age=18, ip=我登录的ip地址是:127.145.0.11)
---> 匹配数据: User(id=35, username=马@,#立志, password=ling009, age=18, ip=我记录的ip地址是:127.125.0.11)
---> 匹配数据: User(id=48, username=郭才莹, password=yao210, age=18, ip=我登录的,@#%ip地址是:127.248.0.11)
---> 匹配数据: User(id=126, username=夏雨, password=xiayu2014, age=18, ip=我登录的ip地址是:127.0.0.11)
---> 匹配数据: User(id=12, username=周杰伦, password=xiayu2014, age=18, ip=我登录的ip地址是:127.0.0.11)
---> 匹配数据: User(id=115, username=朱&@#%夏宇, password=19ch8en, age=18, ip=我登录的ip地址是:127.219.0.11)
---> 匹配数据: User(id=8, username=姚#明, password=yao210, age=18, ip=我登录的@#%ip地址是:127.248.0.11)
---> 匹配数据: User(id=10, username=李荣浩, password=li06, age=18, ip=我使用的@ip地址是:127.234.0.11)
---> 匹配数据: User(id=104, username=黄小群, password=li06, age=18, ip=我使用的@ip地址是:127.234.0.11)
---> 匹配数据: User(id=36, username=陈耀鹏, password=xida, age=18, ip=我登录的ip地址是:127.126.0.11)
---> 匹配数据: User(id=11, username=陈奕迅, password=19ch8en, age=18, ip=我登录的ip地址是:127.219.0.11)
---> 匹配数据: User(id=137, username=林薇因, password=zho99, age=18, ip=我登录,的ip地址是:127.111.0.11)
---> 匹配数据: User(id=13, username=林俊杰, password=zho99, age=18, ip=我登录,的ip地址是:127.111.0.11)
---> 匹配数据: User(id=4, username=神州@数码, password=shen18, age=18, ip=我登录的ip地址是:127.124.0.11)
---> 匹配数据: User(id=5, username=岭南师范, password=ling009, age=18, ip=我记录的ip地址是:127.125.0.11)
---> 匹配数据: User(id=9, username=邓紫棋, password=dengml, age=18, ip=我使用的ip地址是:127.249.0.11)
---> 匹配数据: User(id=34, username=钟楚莹, password=shen18, age=18, ip=我登录的ip地址是:127.124.0.11)
---> 匹配数据: User(id=49, username=黄群, password=dengml, age=18, ip=我使用的ip地址是:127.249.0.11)

到处,springboot整合elasticsearch以及ik分词器,做全文检索完成了。

先告一段落,文章写得不是很好。但希望你能看懂。如果有不懂的,底下留言。我会及时更新

©️2020 CSDN 皮肤主题: 游动-白 设计师:上身试试 返回首页