elasticsearch 查看分词结果_第4天分布式搜索引擎ElasticSearch-CSDN博客

本文链接：https://blog.csdn.net/weixin_31690347/article/details/113069255

本文详细介绍了ElasticSearch的特性、部署启动、使用Postman调用REST API，以及Head插件的安装与使用。讲解了IK分词器的安装与测试，展示了如何通过Logstash实现数据库与ElasticSearch的同步。同时，文章还涵盖了搜索微服务的开发流程，包括创建模块、添加文章到索引库、搜索文章等，并总结了ElasticSearch面试常见问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

dian上篇：

Gavin：第 3 天文档型数据库MongoDBzhuanlan.zhihu.com

一、ElasticSearch简介

1、什么是ElasticSearch

Elasticsearch是一个实时的分布式搜索和分析引擎。它可以帮助你用前所未有的速度去处理大规模数据。ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

2、ElasticSearch特点

（1）可以作为一个大型分布式集群（数百台服务器）技术，处理PB级数据，服务大公司；也可以运行在单机上

（2）将全文检索、数据分析以及分布式技术，合并在了一起，才形成了独一无二的ES；

（3）开箱即用的，部署简单

（4）全文检索，同义词处理，相关度排名，复杂数据分析，海量数据的近实时处理

3、 ElasticSearch体系结构

下表是Elasticsearch与MySQL数据库逻辑结构概念的对比

二、走进ElasticSearch

1、 ElasticSearch部署与启动

安装教程：https://zhuanlan.zhihu.com/p/134104899

访问：http://spark2.x:9200/

2、Postman调用RestAPI

新建索引（创建索引库）

例如我们要创建一个叫articleindex的索引 ,就以put方式提交

提交url：http://spark2.x:9200/articleindex01/

新建文档

以post方式提交：http://spark2.x:9200/articleindex01/article01

打印输出的json数据：

{
    "_index": "articleindex01",
    "_type": "article01 ",
    "_id": "gKkw5XQBjtst7OKrbi13",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

查询全部文档

查询某索引某类型的全部数据，以get方式请求

http://spark2.x:9200/articleindex01/article01/_search

修改文档

以put形式提交以下地址：

http://spark2.x:9200/articleindex01/article01/fakb5XQBjtst7OKr-y3A

另一种情况

如果我们在地址中的ID不存在，则会创建新文档

以put形式提交以下地址：http://spark2.x:9200/articleindex01/article01/1

基本匹配查询

根据某列进行查询 get方式提交下列地址：

http://192.168.184.134:9200/articleindex01/article01/_search?q=title:十次方课程

好给力

以上为按标题查询，返回结果如下：

{
    "took":10,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":2.0649285,
        "hits":[
            {
                "_index":"articleindex",
                "_type":"article",
                "_id":"1",
                "_score":2.0649285,
                "_source":{
                    "title":"十次方课程好给力",
                    "content":"知识点很多"
                }
            }

模糊查询

我们可以用*代表任意字符：

http://spark2.x:9200/articleindex01/article01/_search?q=title:*s*

删除文档

根据ID删除文档,删除ID为1的文档 DELETE方式提交

http://spark2.x:9200/articleindex01/article01/1

返回结果如下：

{
    "found":true,
    "_index":"articleindex",
    "_type":"article",
    "_id":"1",
    "_version":2,
    "result":"deleted",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    }
}

再次查看全部是否还存在此记录

3、Head插件的安装与使用

（1）Head插件安装

如果都是通过rest请求的方式使用Elasticsearch，未免太过麻烦，而且也不够人性化。我们一般都会使用图形化界面来实现Elasticsearch的日常管理，最常用的就是Head插件

下载head插件：https://github.com/mobz/elasticsearch-head

（2）解压到任意目录

但是要和elasticsearch的安装目录区别开

（3）安装node js ,安装cnpm

D:javanojselasticsearch-head-master>npm install -g cnpm --registry=https://registry.npm.taobao.org

npm WARN deprecated har-validator@5.1.5: this library is no longer supported
C:UsersmrmanAppDataRoamingnpmcnpm -> C:UsersmrmanAppDataRoamingnpmnode_modulescnpmbincnpm
+ cnpm@6.1.1
added 685 packages in 47.634s

（4）安装node js

下载地址：链接：https://pan.baidu.com/s/11fOQxZVAcRfSP2OnKXUZAg

提取码：72o3

一直下一步即可

将grunt安装为全局命令。Grunt是基于Node.js的项目构建工具。它可以自动运行你所设定的任务，在cmd窗口执行：

npm install ‐g grunt‐cli

如图所示：

（5）安装依赖

D:javanojselasticsearch-head-master>cnpm install

如图所示：

（6）进入head目录启动head，在命令提示符下输入命令

D:javanojselasticsearch-head-master>grunt server

如图所示：

访问：http://localhost:9100

解决es跨域访问问题

我们这时需要修改elasticsearch的配置，让其允许跨域访问。

修改elasticsearch配置文件：elasticsearch.yml，增加以下两句命令：

elasticsearch-6.3.1/config

[root@spark2 config]# pwd
/usr/local/etc/hadoop/module/elasticsearch-6.3.1/config

[root@spark2 config]# vim elasticsearch.yml 

//增加以下两句命令
http.cors.enabled: true 
http.cors.allow‐origin: "*"

重新启动elasticsearch

连接后，如图所示：

3、head插件的安装和使用

（1）创建索引库

新建或修改文档

在复合查询中提交地址，输入内容，提交方式为PUT ，点击数据浏览 ,点击要查询的索引名称，右侧窗格中显示文档信息

添加数据

点击文档信息：

浏览数据

修改文档

在复合查询中提交地址，输入内容，提交方式为PUT

http://spark2.x:9200/tensquare_elasticsearch/article/3/

点击数据浏览 ,点击要查询的索引名称，右侧窗格中显示文档信息

点击文档信息：

删除文档

4、ik分词器的使用

（1）什么是IK分词器

IK分词是一款国人开发的相对简单的中文分词器。虽然开发者自2012年之后就不在维护了，但在工程应用中IK算是比较流行的一款！我们今天就介绍一下IK中文分词器的使用。

（2）IK分词器安装

下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases 下载5.6.8版

本

链接：https://pan.baidu.com/s/1RjNkvL70Ak2ZA9J8UUY61A

提取码：ep4p

先将其解压，将解压后的elasticsearch文件夹重命名文件夹为ik
将ik文件夹拷贝到elasticsearch/plugins 目录下
重新启动，即可加载IK分词器

由于下载elasticsearch-analysis-ik-6.3.1.zip 的后缀名是zip格式，解压需要注意：

//解压命令“jar xvf 压缩包”
[root@spark2 plugins]# jar xvf elasticsearch-analysis-ik-6.3.1.zip 
  created: config/
 inflated: config/extra_single_word_low_freq.dic
 inflated: config/quantifier.dic
 inflated: config/extra_stopword.dic
 inflated: config/extra_main.dic
 inflated: config/preposition.dic
 inflated: config/extra_single_word_full.dic
 inflated: config/extra_single_word.dic
 inflated: config/IKAnalyzer.cfg.xml
 inflated: config/surname.dic
 inflated: config/main.dic
 inflated: config/suffix.dic
 inflated: config/stopword.dic
 inflated: plugin-descriptor.properties
 inflated: plugin-security.policy
 inflated: elasticsearch-analysis-ik-6.3.1.jar
 inflated: httpclient-4.5.2.jar
 inflated: httpcore-4.4.4.jar
 inflated: commons-logging-1.2.jar
 inflated: commons-codec-1.9.jar

//解压ok
[root@spark2 plugins]# ll
total 5828
-rw-r--r--. 1 root root  263965 May  6  2018 commons-codec-1.9.jar
-rw-r--r--. 1 root root   61829 May  6  2018 commons-logging-1.2.jar
drwxr-xr-x. 2 root root    4096 May  6  2018 config
-rw-r--r--. 1 root root   52399 Jul 18  2018 elasticsearch-analysis-ik-6.3.1.jar
-rw-r--r--. 1 root root 4502496 Oct  2 09:53 elasticsearch-analysis-ik-6.3.1.zip
-rw-r--r--. 1 root root  736658 May  6  2018 httpclient-4.5.2.jar
-rw-r--r--. 1 root root  326724 May  6  2018 httpcore-4.4.4.jar
-rw-r--r--. 1 root root    1805 Jul 18  2018 plugin-descriptor.properties
-rw-r--r--. 1 root root     125 Jul 18  2018 plugin-security.policy
[root@spark2 plugins]#

将ik文件夹拷贝到elasticsearch/plugins 目录下

重新启动，即可加载IK分词器

IK分词器测试

IK提供了两个分词算法ik_smart 和 ik_max_word ，其中 ik_smart 为最少切分，ik_max_word为最细粒度划分

我们分别来试一下

（1）最小切分：在浏览器地址栏输入地址

http://spark2.x:9200/_analyze?analyzer=ik_smart&pretty=true&text=我是程序员

输出的结果为：

{
    "tokens":[
        {
            "token":"我",
            "start_offset":0,
            "end_offset":1,
            "type":"CN_CHAR",
            "position":0
        },
        {
            "token":"是",
            "start_offset":1,
            "end_offset":2,
            "type":"CN_CHAR",
            "position":1
        },
        {
            "token":"程序员",
            "start_offset":2,
            "end_offset":5,
            "type":"CN_WORD",
            "position":2
        }
    ]
}

（2）最细切分：在浏览器地址栏输入地址

http://spark2.x:9200/_analyze?analyzer=ik_max_word&pretty=true&text=我是程序

员

输出的结果为：

{
    "tokens":[
        {
            "token":"我",
            "start_offset":0,
            "end_offset":1,
            "type":"CN_CHAR",
            "position":0
        },
        {
            "token":"是",
            "start_offset":1,
            "end_offset":2,
            "type":"CN_CHAR",
            "position":1
        },
        {
            "token":"程序员",
            "start_offset":2,
            "end_offset":5,
            "type":"CN_WORD",
            "position":2
        },
        {
            "token":"程序",
            "start_offset":2,
            "end_offset":4,
            "type":"CN_WORD",
            "position":3
        },
        {
            "token":"员",
            "start_offset":4,
            "end_offset":5,
            "type":"CN_CHAR",
            "position":4
        }
    ]
}

三、搜索微服务开发

1、创建模块tensquare_search

（1）创建模块tensquare_search，并pom.xml引入依赖

tensquare_parent2020tensquare_searchpom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>tensquare_parent2020</artifactId>
        <groupId>org.study</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>tensquare_search</artifactId>

    <dependencies>
        <dependency>
            <groupId>org.springframework.data</groupId>
            <artifactId>spring-data-elasticsearch</artifactId>
        </dependency>

        <dependency>
            <groupId>org.study</groupId>
            <artifactId>tensquare_common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
    </dependencies>

</project>

（2）application.yml

tensquare_parent2020tensquare_searchsrcmainresourcesapplication.yml

server:
  port: 9007
spring:
  application:
    name: tensquare-search
  data:
    elasticsearch:
      cluster-nodes: spark2.x:9300

（3）创建包com.tensquare.search ，包下创建启动类

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchSearchApplication.java

package com.tensquare.search;

import com.study.util.IdWorker;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

@SpringBootApplication
public class SearchApplication {
    public static void main(String[] args) {
        SpringApplication.run(SearchApplication.class,args);
    }

    @Bean
    public IdWorker idWorker(){
        return new IdWorker(1,1);
    }
}

2、添加文章（文章保存到索引库）

（1）创建实体类（创建com.tensquare.search.pojo包，包下建立类）

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchpojoArticle.java

package com.tensquare.search.pojo;


import com.sun.javafx.beans.IDProperty;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;

import java.io.Serializable;


@Document(indexName="tensquare_article",type="article")
public class Article implements Serializable {

    @Id
    private String id;

    //是否索引，就是看该域是否能被搜索
    //是否分词，就表示搜索的时候是整体匹配还是单词匹配
    //是否存储，就是是否在页面上显示
    @Field(index=true,analyzer = "ik_max_word",searchAnalyzer = "ik_max_word")
    private String title;

    @Field(index=true,analyzer = "ik_max_word",searchAnalyzer = "ik_max_word")
    private String content;

    private String state; //审核状态

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getContent() {
        return content;
    }

    public void setContent(String content) {
        this.content = content;
    }

    public String getState() {
        return state;
    }

    public void setState(String state) {
        this.state = state;
    }
}

（2）创建数据访问接口（创建com.tensquare.search.dao包，包下建立接口）

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchdaoArticleDao.java

package com.tensquare.search.dao;

import com.tensquare.search.pojo.Article;

import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

/**
 * 创建数据访问接口（文章数据访问层接口）文章保存到索引库
 */
public interface ArticleDao extends ElasticsearchRepository<Article,String> {
}

（3）创建业务逻辑类（创建com.tensquare.search.service包，包下建立类）

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchserviceArticleService.java

package com.tensquare.search.service;

import com.tensquare.search.dao.ArticleDao;
import com.tensquare.search.pojo.Article;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;


@Service
public class ArticleService {

    @Autowired
    private ArticleDao articleDao;

    /**
     *  文章保存到索引库
     * @param article
     */
    public void save(Article article){
        articleDao.save(article);
    }
}

（4）创建控制器类（创建com.tensquare.search.controller包，包下建立类）

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchcontrollerArticleController.java

package com.tensquare.search.controller;

import com.study.entity.Result;
import com.study.entity.StatusCode;
import com.tensquare.search.pojo.Article;
import com.tensquare.search.service.ArticleService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;


@RestController
@RequestMapping("/article")
@CrossOrigin
public class ArticleController {
    @Autowired
    private ArticleService articleService;

   /**
     * 文章保存到索引库
     * @param article
     * @return
     */
    @RequestMapping(method= RequestMethod.POST)
    public Result save(@RequestBody Article article){
        articleService.save(article);
        return new Result(true, StatusCode.OK,"添加成功");
    }
}

（5）启动主程序SearchApplication

（6）postman测试

3、从索引库中搜索文章

（1）controller层

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchcontrollerArticleController.java

  /**
     * 从索引库中搜索文章（文章搜索）
     * @param key  键值
     * @param page 页数
     * @param size 页码大小
     * @return
     */
    @RequestMapping(value="/{key}/{page}/{size}",method=RequestMethod.GET)
    public Result findByKey(@PathVariable String Key,@PathVariable int page,@PathVariable int size){
        Page<Article> pageData = articleService.findByKey(Key,page,size);
        return new Result(true,StatusCode.OK,"查询成功",new PageResult<Article>(pageData.getTotalElements(),pageData.getContent()));
    }
}

（2）service层

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchserviceArticleService.java

 /**
     * 从索引库中搜索文章（文章搜索）
     * @param key
     * @param page
     * @param size
     * @return
     */
    public Page<Article> findByKey(String key, int page, int size) {
        Pageable pageable = PageRequest.of(page - 1, size);
       return articleDao.findByTitleOrContentLike(key, key, pageable);
    }
}

（3）dao层

tensquare_parent2020tensquare_searchsrcmainjavacomtensquaresearchdaoArticleDao.java

package com.tensquare.search.dao;
import com.tensquare.search.pojo.Article;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

/**
 * 创建数据访问接口(文章数据访问层接口)
 */
public interface ArticleDao extends ElasticsearchRepository<Article,String> {

    /**
     *从索引库中搜索文章（文章搜索）
     * @param title    标题
     * @param content 文章正文
     * @param pageable
     * @return
     */
    public Page<Article> findByTitleOrContentLike(String title, String content, Pageable pageable);
}

（4）启动主程序SearchApplication

（5）postman测试

4、 logstash的安装

软件下载：链接：https://pan.baidu.com/s/1cNp6QC13dAtwyN8AhNhPZg

提取码：nbma

Logstash

（1）什么是Logstash

Logstash是一款轻量级的日志搜集处理框架，可以方便的把分散的、多样化的日志搜集起来，并进行自定义的处理，然后传输到指定的位置，比如某个服务器或者文件。

（2）Logstash安装与测试

解压文件

logstash  -e 'input { stdin { } } output { stdout {} }'

如图所示：

Logstash测通

参数说明：

stdin，表示输入流，指从键盘输入
stdout，表示输出流，指从显示器输出
命令行参数:
-e 执行
--config 或 -f 配置文件，后跟参数类型可以是一个字符串的配置或全路径文件名或全路径
路径(如：/etc/logstash.d/，logstash会自动读取/etc/logstash.d/目录下所有*.conf 的文
本文件，然后在自己内存里拼接成一个完整的大配置文件再去执行)