狂神说 ElasticSearch 笔记

RainHey

已于 2022-03-17 13:11:48 修改

阅读量940

点赞数 1

分类专栏： ElasticSearch 文章标签： elasticsearch java 搜索引擎

于 2022-02-17 00:28:05 首次发布

本文链接：https://blog.csdn.net/Voctorial/article/details/122951131

版权

ElasticSearch 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

简介
ES
ELK
Kibana
IK分词器（ES插件）
Restful风格
关于索引的基本操作
关于文档的基本操作
Springboot 集成ES
实战

简介

Elaticsearch是一个开源的高扩展的分布式全文检索引擎，它可以近乎实时的存储、检索数据;本身扩展性很好，可以扩展到上百台服务器，处理PB级别(大数据时代）的数据；ES使用java开发并使用Lucene（Lucene是一套信息检索工具包jar包，被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库，ElasticSearch对Lucene 进行了封装和增强）作为其核心来实现所有索引和搜索的功能，但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性，从而让全文搜索变得简单

ES

在虚拟机上安装ES
ES目录结构

bin 启动文件目录
config 配置文件目录
    1og4j2 日志配置文件
    jvm.options java 虚拟机相关的配置(默认启动占1g内存，内容不够需要自己调整)
    elasticsearch.ym1 elasticsearch 的配置文件! 默认9200端口
1ib 
    相关jar包
modules 功能模块目录
plugins 插件目录

本机上安装ES可视化界面elasticsearch-head，为了能正常访问ES，ES配置文件elasticsearch.yml，添加以下配置

# 开启跨域
http.cors.enabled: true
# 所有人访问
http.cors.allow-origin: "*"

ELK

ELK是Elasticsearch、Logstash、 Kibana三大开源框架首字母大写简称

Elasticsearch是一个基于Lucene、分布式、通过Restful方式进行交互的近实时搜索平台框架
Logstash是ELK的中央数据流引擎,用于从不同目标(文件/数据存储/MQ )收集的不同格式数据,经过过滤后支持输出到不同目的地(文件/MQ/redis/elasticsearch/kafka等)
Kibana可以将elasticsearch的数据通过友好的页面展示出来 ,提供实时分析的功能

Kibana

Kibana是一个针对ElasticSearch的开源分析及可视化平台,用来搜索、查看交互存储在Elasticsearch索引中的数据；使用Kibana ,可以通过各种图表进行高级数据分析及展示；Kibana让海量数据更容易理解；它操作简单,基于浏览器的用户界面可以快速创建仪表板( dashboard )实时显示Elasticsearch查询动态；设置Kibana非常简单；无需编码或者额外的基础架构,几分钟内就可以完成Kibana安装并启动Elasticsearch索引监测

本地下载–配置文件中配置ES地址–开箱即用，运行bat文件

IK分词器（ES插件）

分词即把一段中文或者别的划分成一个个的关键字，我们在搜索时候会把自己的信息进行分词，会把数据库中或者索引库中的数据进行分词，然后进行一一个匹配操作，默认的中文分词是将每个字看成一个词，这是不符合要求的，所以我们需要安装中文分词器ik来解决这个问题

IK提供了两个分词算法: ik_smart和ik_max_word ,其中ik_smart为最少切分, ik_max_word为最细粒度划分

测试：

安装IK分词器插件：Github下载并解压到ES中的plugins目录中
测试：启动ES、kibana，注意关闭ES服务器的防火墙，kibana中查看分词结果

添加自定义的词到扩展字典
在elasticsearch目录/plugins/ik/config/中创建.dic字典文件，如my.dic，在这个文件中添加自定义的单词，然后打开config/IKAnalyzer.cfg.xml，将my.dic添加到第一个entry中间
在这里插入图片描述

Restful风格

一种软件架构风格，而不是标准，只是提供了一组设计原则和约束条件；它主要用于客户端和服务器交互类的软件；基于这个风格设计的软件可以更简洁，更有层次，更易于实现缓存等机制

基本命令说明

method	URL地址	描述
PUT（创建、修改）	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST（创建）	localhost:9200/索引名称/类型名称	创建文档（随机文档id）
POST（修改）	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE（删除）	localhost:9200/索引名称/类型名称/文档id	删除文档
GET（查询）	localhost:9200/索引名称/类型名称/文档id	查询文档通过文档ID
POST（查询）	localhost:9200/索引名称/类型名称/文档id/_search	查询所有数据

关于索引的基本操作

创建

PUT /索引名/~类型名~/文档id
{
	请求体
}

在这里插入图片描述
字段数据类型

字符串类型
text、keyword
text：支持分词，全文检索,支持模糊、精确查询,不支持聚合,排序操作;text类型的最大支持的字符长度无限制,适合大字段存储；
keyword：不进行分词，直接索引、支持模糊、支持精确匹配，支持聚合、排序操作。keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度，超过给定长度后的数据将不被索引，无法通过term精确匹配检索返回结果。
数值型
long、Integer、short、byte、double、float、half float、scaled float
日期类型
date
te布尔类型
boolean
二进制类型
binary
等等…

指定字段类型
在这里插入图片描述
获得索引信息

默认信息

如果文档字段没有被指定类型，那么ElasticSearch就会默认配置字段类型
在这里插入图片描述
通过get _cat/ 可以获取ElasticSearch的当前的很多信息

GET _cat/indices
GET _cat/aliases
GET _cat/allocation
GET _cat/count
GET _cat/fielddata
GET _cat/health
GET _cat/indices
GET _cat/master
GET _cat/nodeattrs
GET _cat/nodes
GET _cat/pending_tasks
GET _cat/plugins
GET _cat/recovery

修改

直接覆盖–PUT

版本+1（_version）
但是如果漏掉某个字段没有写，那么更新时没有写的字段，会消失

在这里插入图片描述
2. POST

version不会改变
需要注意doc，doc里面就是要修改的
不会丢失字段

在这里插入图片描述

删除

在这里插入图片描述

关于文档的基本操作

简单查询

GET rainhey/user/_search?q=name:lisi

在这里插入图片描述

复杂查询

hits：可以得到索引和文档的信息；查询的结果总数；查询的具体文档，可以遍历；分数：判断哪个更符合结果

query：查询
_source：展示的字段
sort：排序
from、size 分页
highlight：要查询的内容每个字都高亮显示

GET rainhey/user/_search
{
  "query": {
    "match": {
      "name": "饶先生"
    }
  },
  "highlight": {
  	"pre_tags": "<p class='key' style='color:red'>",
    "post_tags": "</p>",   //自定义高亮条件
    "fields": {
      "name": {}  
    }
  }
  , "_source": ["age","name"]
  , "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
  , "from": 0
  , "size": 3
}

多条件查询

must 相当于 and
should 相当于 or
must_not 相当于 not (… and …)
filter 过滤

匹配数组字段
可以多关键字查（空格隔开
match 会使用分词器解析

精确查询

match：会使用分词器解析，对要查询的词进行分词，然后将只要包含一个分词的文档都查出来
term：不采用分词器解析，将包含完整的要查询的单词的文档查出来

term 直接通过 倒排索引 指定词条查询
适合查询 number、date、keyword ，不适合text

字段类型text和keyword
text：支持分词，text字段单词会分词后存储?
keyword：不进行分词，keyword字段单词不分词存储?

match、term、text、keyword区别

Springboot 集成ES

导入依赖

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        
		<dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.70</version>
        </dependency>

配置类

@Configuration
public class EsConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("192.168.0.100",9200,"http")
                )
        );
        return client;
    }
}

测试

    
class EsApiApplicationTests {
    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient restHighLevelClient;

    //测试索引的创建 Request
    @Test
    void testCreateIndex() throws IOException {
        //创建索引请求
        CreateIndexRequest request = new CreateIndexRequest("rainhey_index");
        //客户端执行请求
        CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
        System.out.println(createIndexResponse);
    }
    //测试获取索引,判断是否存在
    @Test
    void testExistIndex() throws IOException {
        GetIndexRequest request = new GetIndexRequest("rainhey_index");
        boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(exists);
    }
    //测试删除索引
    @Test
    void testDeleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("rainhey_index");
        AcknowledgedResponse delete = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
        System.out.println(delete.isAcknowledged());   //输出是否设置成功
    }
    //测试创建文档
    @Test
    void testAddDOcument() throws IOException {
        //创建对象
        User user = new User("rainhey", 3);
        //创建请求
        IndexRequest request = new IndexRequest("rainhey_index");
        //规则
        request.id("1");
        request.timeout(TimeValue.timeValueSeconds(1));
             //request.timeout("1");
        //将数据放入请求
        IndexRequest source = request.source(JSON.toJSONString(user), XContentType.JSON);
        //客户端发送请求,获取响应结果
        IndexResponse indexResponse = restHighLevelClient.index(request, RequestOptions.DEFAULT);
        System.out.println(indexResponse.toString());
        System.out.println(indexResponse.status());
    }
    //获取文档 判断是否存在 get index/doc/1
    @Test
    public void testIsExist() throws IOException {
        GetRequest request = new GetRequest("rainhey_index","1");
        //不获取_source的上下文
        request.fetchSourceContext(new FetchSourceContext(false));
        request.storedFields("_none_");
        boolean exists = restHighLevelClient.exists(request, RequestOptions.DEFAULT);
        System.out.println(exists);
    }
    //获得文档信息
    @Test
    public void testGetDocument() throws IOException {
        GetRequest request = new GetRequest("rainhey_index","1");
        GetResponse documentFields = restHighLevelClient.get(request, RequestOptions.DEFAULT);
        System.out.println(documentFields.getSourceAsString());
        System.out.println(documentFields);
    }
    //更新文档
    @Test
    public void testUpdateDocument() throws IOException {
        UpdateRequest request = new UpdateRequest("rainhey_index", "1");
        User user = new User("rao",11);
        request.doc(JSON.toJSONString(user),XContentType.JSON);
        UpdateResponse response = restHighLevelClient.update(request, RequestOptions.DEFAULT);
        System.out.println(response.status()); // OK
    }
    //删除文档
    @Test
    public void testDeleteDocument() throws IOException {
        DeleteRequest request = new DeleteRequest("rainhey_index", "1");
        request.timeout("1s");
        DeleteResponse response = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
        System.out.println(response.status());// OK
    }
    //批量导入数据
    @Test
    public void testBulk() throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("10s");
        ArrayList<User> users = new ArrayList<>();
        users.add(new User("liuyou-1",1));
        users.add(new User("liuyou-2",2));
        users.add(new User("liuyou-3",3));
        users.add(new User("liuyou-4",4));
        users.add(new User("liuyou-5",5));
        users.add(new User("liuyou-6",6));
        for (int i = 0; i < users.size(); i++) {
            // 批量更新和删除，在这里修改相应请求
            bulkRequest.add(
                    // 这里是数据信息
                    new IndexRequest("bulk")
                            .id(""+(i + 1)) // 没有设置id 会自定生成一个随机id
                            .source(JSON.toJSONString(users.get(i)),XContentType.JSON)
            );
        }
        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        System.out.println(bulk.status());// ok
    }
    // 查询
    @Test
    public void testSearch() throws IOException {
        SearchRequest request = new SearchRequest("rainhey_index");
        //构建搜索条件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        // 精确查询
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "rainhey");
        searchSourceBuilder.query(termQueryBuilder);
        // 分页
        //searchSourceBuilder.from();
        //searchSourceBuilder.size();

        // 设置高亮
        //searchSourceBuilder.highlighter(new HighlightBuilder());

        // 添加条件到请求
        request.source(searchSourceBuilder);

        //执行请求
        SearchResponse searchResponse = restHighLevelClient.search(request, RequestOptions.DEFAULT);
        SearchHits hits = searchResponse.getHits();
        System.out.println(JSON.toJSONString(hits));
        for (SearchHit documentFileds: hits.getHits()){
            System.out.println(documentFileds.getSourceAsMap());
        }
    }
}

实战

代码地址

导入依赖

		<dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-thymeleaf</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <scope>runtime</scope>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-configuration-processor</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.70</version>
        </dependency>

        <!--解析网页 jsoup-->
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.10.2</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-thymeleaf</artifactId>
        </dependency>

配置

server.port=9090
spring.thymeleaf.cache=false

编写爬虫爬取数据

@Component
public class HtmlPraseUtil {
    public List<Content> ParseJD(String keyword) throws Exception{
        //获得请求
        String url="https://search.jd.com/Search?keyword=java";
        //解析网页 返回的就是JS里的Document对象
        Document document = Jsoup.parse(new URL(url),150000);
        //所有JS中能用的方法，document都能用
        Element j_goodsList = document.getElementById("J_goodsList");
        Elements elements = j_goodsList.getElementsByTag("li");

        ArrayList<Content> goodsList = new ArrayList<>();

        for (Element element:elements) {
            String img = element.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = element.getElementsByClass("p-price").eq(0).text();
            String title = element.getElementsByClass("p-name").eq(0).text();

            Content content = new Content();
            content.setImg(img);
            content.setPrice(price);
            content.setTitle(title);

            goodsList.add(content);
        }
        return goodsList;
    }
}

业务层

@Service
public class ContentService {

    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient restHighLevelClient;

    // 1. 解析数据放入ES中
    public boolean praseContent(String keyword) throws Exception{
        List<Content> contents = new HtmlPraseUtil().ParseJD(keyword);
        // 2.把查询的数据放入ES
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i = 0; i < contents.size(); i++) {
            bulkRequest.add(new IndexRequest("jd_goods").source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
        }
        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulk.hasFailures();
    }

    // 2. 获取数据实现搜索功能
    public List<Map<String, Object>> searchPage(String keyword,int pageNo,int pageSize) throws IOException {
        if(pageNo<=1){
            pageNo=1;
        }

        //条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_goods");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //分页
        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);
        //精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyword);
        searchSourceBuilder.query(termQueryBuilder);
        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        //高亮展示
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false); //title中的第一个高亮
        highlightBuilder.preTags("<span style='color:red'>");
        highlightBuilder.postTags("</span>");
        searchSourceBuilder.highlighter(highlightBuilder);

        //执行搜索
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        //解析结果
        ArrayList<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit documentFields:searchResponse.getHits().getHits()) {
            //System.out.println(documentFields);
            /*{
                "_index" : "jd_goods",
                    "_type" : "_doc",
                    "_id" : "cFe-An8BOOSr3pLIOBvt",
                    "_score" : 0.49738416,
                    "_source" : {
                "img" : "//img11.360buyimg.com/n1/s200x200_jfs/t1/100039/39/5175/451039/5deb5e3fE6a203f6d/968099d48e403389.png",
                        "price" : "￥106.00",
                        "title" : "Java核心技术 卷I 基础知识（原书第11版） Core Java 卷I (全新第11版)"
            },
                "highlight" : {
                "title" : [
                "<span style='color:red'>Java</span>核心技术 卷I 基础知识（原书第11版） Core <span style='color:red'>Java</span> 卷I (全新第11版)"
    ]
            }
            }*/
            Map<String, HighlightField> highlightFields = documentFields.getHighlightFields();
            //System.out.println(highlightFields);
            /*{title=[title], fragments[[<span style='color:red'>Java</span>核心技术 卷I 基础知识（原书第11版） Core <span style='color:red'>Java</span> 卷I (全新第11版)]]} */
            HighlightField title = highlightFields.get("title"); // 高亮的字段
/*            System.out.println(title);
            [title], fragments[[<span style='color:red'>Java</span>核心技术 卷I 基础知识（原书第11版） Core <span style='color:red'>Java</span> 卷I (全新第11版)]]*/
            Map<String, Object> sourceAsMap = documentFields.getSourceAsMap(); // 原来的结果
            /*System.out.println(sourceAsMap);
            {img=//img11.360buyimg.com/n1/s200x200_jfs/t1/100039/39/5175/451039/5deb5e3fE6a203f6d/968099d48e403389.png, price=￥106.00, title=Java核心技术 卷I 基础知识（原书第11版） Core Java 卷I (全新第11版)}*/
            if (title != null) {
                Text[] fragments = title.fragments();
                String n_title = "";
                for (Text fragment : fragments) {
                    n_title += fragment;
                }
                sourceAsMap.put("title", n_title);
            }
            list.add(sourceAsMap);
        }
        return list;
    }
}

Controller

@RestController
public class ContentController {
    @Autowired
    private ContentService contentService;

    @GetMapping("/prase/{keyword}")
    public boolean prase(@PathVariable String keyword) throws Exception{
        return contentService.praseContent(keyword);
    }

    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List<Map<String, Object>> search(@PathVariable String keyword,@PathVariable int pageNo,@PathVariable int pageSize) throws Exception{
       return contentService.searchPage(keyword, pageNo, pageSize);
    }
}

@Controller
public class IndexController {
    @GetMapping({"/","/index"})
    public String index(){
        return "index";
    }
}

前后端交互
将vue 和 axios 的 js 文件放到 resource 下的 js 文件夹中，修改前端素材代码
测试

狂神elasticsearch视频地址

RainHey

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
狂神说 ElasticSearch 笔记

狂神说 ElasticSearch 笔记
复制链接

扫一扫

专栏目录

狂神说 ElasticSearch 笔记

文章目录

简介

ES

ELK

Kibana

IK分词器（ES插件）

Restful风格

关于索引的基本操作

创建

修改

删除

关于文档的基本操作

简单查询

复杂查询

多条件查询

精确查询

Springboot 集成ES

实战

“相关推荐”对你有帮助么？