SpringBoot+Es7.6.1+Jsoup+Vue+Docker打造古诗词实时搜索功能

最新推荐文章于 2023-06-20 15:06:11 发布

YoungJ5788

最新推荐文章于 2023-06-20 15:06:11 发布

阅读量905

点赞数

分类专栏： springboot es 文章标签： elasticsearch

本文链接：https://blog.csdn.net/zhaoyajie1011/article/details/108552579

版权

springboot 同时被 2 个专栏收录

4 篇文章 1 订阅

订阅专栏

1 篇文章 0 订阅

订阅专栏

服务安装

下载安装elasticsearch

Docker 安装 elasticsearch:7.6.1

docker pull elasticsearch:7.6.1

mkdir -p /Users/szcl/mydata/elasticsearch/config

mkdir -p /Users/szcl/mydata/elasticsearch/data

echo "http.host: 0.0.0.0" >> /Users/szcl/mydata/elasticsearch/config/elasticsearch.yml

chmod -R 777 /Users/szcl/mydata/elasticsearch/

docker run --name elasticsearch -p 9200:9200 -p 9300:9300  -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms64m -Xmx128m" -v /Users/szcl/mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v /Users/szcl/mydata/elasticsearch/data:/usr/share/elasticsearch/data -v /Users/szcl/mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins -d elasticsearch:7.6.1

查看是否启动成功

docker ps -a

如果未启动成功，通过以下命令查看日志：

docker logs -f b016c22606e1

访问服务器的9200端口：

安装elasticsearch head插件

docker pull mobz/elasticsearch-head:5

docker run -d -p 9100:9100 docker.io/mobz/elasticsearch-head:5

启动成功后访问：

刚安装的话可能存在跨域拒绝访问问题，需要修改配置，有两种方式：

直接修改elasticsearch外挂的配置

cd /mydata/elasticsearch/config

vim elasticsearch.yml

在配置中新增

http.cors.enabled: true
http.cors.allow-origin: "*"

重启容器

docker restart b016c22606e1

进入容器修改配置

docker exec -it b016c22606e1 /bin/bash

cd ./config

vim elasticsearch.yml

在配置中新增

http.cors.enabled: true
http.cors.allow-origin: "*"

重启容器

docker restart b016c22606e1

新建索引

发现点OK时，没有反应，查看控制台

发现返回406错误代码，点进去查看详情

发现不支持x-www-form-urlencoded

解决方法：

进入head容器
```
docker exec -it 62c5c56241ae /bin/bash
```
进入_site文件夹
编辑vendor.js
```
vim vendor.js
```
- 把容器的文件copy到宿主机中编辑
  
  参考：https://blog.csdn.net/zhaoyajie1011/article/details/98610002
- 安装vim
```
apt-get update
apt-get install vim
```

修改内容

contentType: "application/x-www-form-urlencoded
修改为：
contentType: "application/json;charset=UTF-8"

var inspectData = s.contentType === "application/x-www-form-urlencoded"
修改为：
var inspectData = s.contentType === "application/json;charset=UTF-8"

重启容器

这时候创建成功了！但是head这个插件主要用来数据展示，不适合做些复杂查询，我们做查询最好安装功能更强大的Kibana

安装Kibana

Docker 安装
```
docker pull kibana:7.6.1
```

启动镜像

docker run --name kibana -e ELASTICSEARCH_HOSTS=http://IP:9200 -p 5601:5601 -d kibana:7.6.1

修改配置

这里我把容器中的文件copy到宿主机上进行修改

docker cp 970f63f0babb:/usr/share/kibana/config/kibana.yml /mydata/kibana/config/

直接在宿主机编辑

vim kibana.yml

修改以下内容：

server.name: kibana
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: [ "http://IP:9200" ]
i18n.locale: "zh-CN"
xpack.monitoring.ui.container.elasticsearch.enabled: true

把修改好的配置copy到容器中

docker cp /mydata/kibana/config/kibana.yml 970f63f0babb:/usr/share/kibana/config/

重启容器
```
docker restart 970f63f0babb
```
浏览器访问5601端口

安装ik分词器

进入elasticsearch容器
```
docker exec -it 98d725e6291e /bin/bash
```

安装

elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.1/elasticsearch-analysis-ik-7.6.1.zip

重启所有容器

测试分词效果

打开kibana控制台http://localhost:5601/
侧边栏找到Dev Tools

测试ik_max_word(最细粒度拆分)

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "中国共产党"
}

测试ik_smart（最少切分）

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "中国共产党"
}

自定义分词
- 比如我要对“我爱赵亚杰”进行分词，不管是ik_smart 还是 ik_max_word，都会把名字拆分成单个字
- 这时候就需要用到自定义分词，进入容器，找到ik分词器的配置
```
exec -it 98d725e6291e /bin/bash

cd config/analysis-ik/

vi IKAnalyzer.cfg.xml 
```
  在<entry key="ext_dict"></entry>中配置自己的分词字典
```
<entry key="ext_dict">my.dic</entry>
```
  保存，新建my.dic词典
```
vi my.dic
```
  my.dic中输入赵亚杰三个字，保存
- 重启elasticsearch容器
- 测试自定义分词效果

ElasticSearch基本操作

操作说明

操作	method	URL地址
创建文档（指定文档ID）	PUT	localhost:9200/索引名称/类型名称/文档ID
创建文档（随机文档ID）	POST	localhost:9200/索引名称/类型名称
修改文档	POST	localhost:9200/索引名称/类型名称/文档ID/_update
删除文档	DELETE	localhost:9200/索引名称/类型名称/文档ID
查看文档（通过文档ID）	GET	localhost:9200/索引名称/类型名称/文档ID
查询所有数据	POST	localhost:9200/索引名称/类型名称/_search

常用操作

查看健康状态

GET _cat/health

1599732945 10:15:45 elasticsearch yellow 1 1 5 5 0 0 2 0 - 71.4%

查看_cat里包含哪些东西

GET _cat/indices

yellow open poem                     tWco8rUWQCS1YuMtkrCl4A 1 1  1 0  5.1kb  5.1kb
green  open .kibana_task_manager_1   1SxsVdvgSZOOQ3X9wKXJzQ 1 0  2 1 16.2kb 16.2kb
yellow open poem2                    xWMF79GYTaKco1Ljo2SmrA 1 1  0 0   283b   283b
green  open .apm-agent-configuration UGRU7tD0Tj-bOnmo-nfZrw 1 0  0 0   283b   283b
green  open .kibana_1                r65DwNYWSha1v7AW5v62QQ 1 0 20 6   48kb   48kb

…

通过_cat可以查看很多信息

###创建索引

默认字段类型

PUT /poem/poem/1
{
  "title": "相思",
  "author": "王维",
  "content": "红豆生南国，春来发几枝。愿君多采撷，此物最相思。"
}

执行

使用elasticsearch head插件查看index

通过数据浏览查看文档内容

指定字段类型（定义索引规则）

PUT /poem2
{
	"mappings": {
		"properties": {
			"title": {
      	"type": "text"
      },
      "date": {
      	"type": "date"
      },
      "content": {
      	"type": "text"
      }
		}
	}
}

使用head插件查看

查询

普通查询

GET /poem/_doc/1
或
GET /poem/poem/1

查询index为poem，_doc是默认的type，在elasticsearch8.x后，type会被淘汰，1是id为1的内容

{
  "_index" : "poem",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "相思",
    "author" : "王维",
    "content" : "红豆生南国，春来发几枝。愿君多采撷，此物最相思。"
  }
}

按条件查询

content包含“一”的：

GET /poem/_search?q=content:一

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.74386525,
    "hits" : [
      {
        "_index" : "poem",
        "_type" : "poem",
        "_id" : "2",
        "_score" : 0.74386525,
        "_source" : {
          "title" : "登鹳雀楼",
          "author" : "王之涣",
          "content" : "白日依山尽，黄河入海流。欲穷千里目，更上一层楼。"
        }
      },
      {
        "_index" : "poem",
        "_type" : "poem",
        "_id" : "3",
        "_score" : 0.6489038,
        "_source" : {
          "title" : "九月九日忆山东兄弟",
          "author" : "王维",
          "content" : "独在异乡为异客，每逢佳节倍思亲。遥知兄弟登高处，遍插茱萸少一人。"
        }
      }
    ]
  }
}

这里是否是模糊查询，取决于定义index的时候，字段的类型，如果是text类型，那么将会被分词，如果为keyword类型，将不会被分词。

查询指定字段

GET /poem/poem/_search
{
  "query": {
    "match": {
      "content": "一"
    }
  },
  "_source": ["title", "content"]
}

match会使用分词器解析

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.74386525,
    "hits" : [
      {
        "_index" : "poem",
        "_type" : "poem",
        "_id" : "2",
        "_score" : 0.74386525,
        "_source" : {
          "title" : "登鹳雀楼",
          "content" : "白日依山尽，黄河入海流。欲穷千里目，更上一层楼。"
        }
      },
      {
        "_index" : "poem",
        "_type" : "poem",
        "_id" : "3",
        "_score" : 0.6489038,
        "_source" : {
          "title" : "九月九日忆山东兄弟",
          "content" : "独在异乡为异客，每逢佳节倍思亲。遥知兄弟登高处，遍插茱萸少一人。"
        }
      }
    ]
  }
}

排序

GET /poem/poem/_search
{
  "query": {
    "match": {
      "content": "一"
    }
  },
  "_source": ["title", "content","date"],
  "sort": [
    {
      "date": {
        "order": "asc"
      }
    }
  ]
}

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "poem",
        "_type" : "poem",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "date" : "2020-09-10",
          "title" : "登鹳雀楼",
          "content" : "白日依山尽，黄河入海流。欲穷千里目，更上一层楼。"
        },
        "sort" : [
          1599696000000
        ]
      },
      {
        "_index" : "poem",
        "_type" : "poem",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "date" : "2020-09-11",
          "title" : "九月九日忆山东兄弟",
          "content" : "独在异乡为异客，每逢佳节倍思亲。遥知兄弟登高处，遍插茱萸少一人。"
        },
        "sort" : [
          1599782400000
        ]
      }
    ]
  }
}

分页

GET /poem/poem/_search
{
  "query": {
    "match": {
      "content": "一"
    }
  },
  "_source": ["title", "content","date"],
  "sort": [
    {
      "date": {
        "order": "asc"
      }
    }
  ],
  "from": 0,
  "size": 1
}

from: 从多少条开始查询；

size：查询条数

多条件查询

GET /poem/poem/_search
{
  "query": {
    "bool": {
      "must": [
        {
         "match": {
           "author": "王维"
         }
        },
        {
          "match": {
            "date": "2020-09-11"
          }
        }
      ]
    }
  }
}
或
GET /poem/poem/_search
{
  "query": {
    "bool": {
      "should": [
        {
         "match": {
           "author": "王维"
         }
        },
        {
          "match": {
            "date": "2020-09-12"
          }
        }
      ]
    }
  }
}

must 相当于mysql的and

must_not 相当于mysql的not

should 相当于mysql的or

匹配多条件查询，多个词用空格分开

GET /poem/poem/_search
{
  "query": {
    "match": {
      "content": "三 一"
    }
  }
}

范围查询

GET /poem/poem/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "author": "王维"
          }
        }
      ],
      "filter": {
        "range": {
          "index": {
            "gte": 1,
            "lt": 3
          }
        }
      }
    }
  }
}

gt 大于； gte大于等于；lt小于；lte小于等于

高亮显示

GET /poem/poem/_search
{
  "query": {
    "match": {
      "content": "一"
    }
  },
  "highlight": {
    "pre_tags": "<span style='color: red'>",
    "post_tags": "</span>",
    "fields": {
      "content": {}
    }
  }
}

使用highlight关键字

修改

POST /poem/_doc/1/_update
{
	"doc": {
		"date": "2020-09-10"
	}
}

{
  "_index" : "poem",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "相思",
    "author" : "王维",
    "content" : "红豆生南国，春来发几枝。愿君多采撷，此物最相思。",
    "date" : "2020-09-10"
  }
}

每次修改version都会自增

删除

DELETE /poem2/_doc/1（删除指定文档）
或
DELETE /poem2（删除index）

{
  "_index" : "poem2",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "result" : "not_found",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

{
  "acknowledged" : true
}

通过GET _cat/indices查看所有的index

yellow open poem                     tWco8rUWQCS1YuMtkrCl4A 1 1  1 0 12.2kb 12.2kb
green  open .kibana_task_manager_1   1SxsVdvgSZOOQ3X9wKXJzQ 1 0  2 1 16.2kb 16.2kb
green  open .apm-agent-configuration UGRU7tD0Tj-bOnmo-nfZrw 1 0  0 0   283b   283b
green  open .kibana_1                r65DwNYWSha1v7AW5v62QQ 1 0 23 3 73.7kb 73.7kb

发现poem2已经被删掉了

SpringBoot集成ES

官方文档：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.6/java-rest-high-document-index.html

引入maven依赖

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

不指定版本有可能引入的和实际使用的版本不一致

<properties>
	<java.version>1.8</java.version>
	<elasticsearch.version>7.6.1</elasticsearch.version>
</properties>

新建ElasticSearch配置类

package com.youngj.es.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * ElasticSearch配置文件
 * @author YoungJ
 */
@Configuration
public class ElasticSearchClientConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")));
        return client;
    }
}

测试相关API

创建测试类

package com.youngj.es.api;

import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

@SpringBootTest
class EsApiApplicationTests {
	private static final String INDEX = "youngj_poem";
	@Autowired
	private RestHighLevelClient restHighLevelClient;

	@Test
	void contextLoads() {
	}
}

创建index

@Test
void testCreateIndex() throws IOException {
	CreateIndexRequest request = new CreateIndexRequest(INDEX);
	CreateIndexResponse indexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
	System.out.println(indexResponse);
}

判断索引是否存在

/**
 * 判断索引是否存在
 * @throws IOException
 */
@Test
void getIndex() throws IOException {
	GetIndexRequest request = new GetIndexRequest(INDEX);
	boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);
	System.out.println(exists);
}

删除索引

/**
 * 删除索引
 * @throws IOException
 */
@Test
void delIndex() throws IOException {
	DeleteIndexRequest request = new DeleteIndexRequest(INDEX);
	AcknowledgedResponse response = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
	System.out.println(response.isAcknowledged());
}

创建文档

/**
 * 创建文档
 * @throws IOException
 */
@Test
void addDoc() throws IOException {
	IndexRequest request = new IndexRequest(INDEX);
	request.id("1");
	request.timeout(TimeValue.timeValueSeconds(1));
	request.source(JSON.toJSONString(new Poem("行宫", "元稹", "寥落古行宫，宫花寂寞红。白头宫女在，闲坐说玄宗。")), XContentType.JSON);
	IndexResponse indexResponse = restHighLevelClient.index(request, RequestOptions.DEFAULT);
	System.out.println(indexResponse);
	System.out.println(indexResponse.status());
}

IndexResponse[index=youngj_poem,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
CREATED

批量创建文档

/**
 * 批量创建文档
 * @throws IOException
 */
@Test
void addBatchDoc() throws IOException {
	BulkRequest request = new BulkRequest(INDEX);
	request.timeout(TimeValue.timeValueSeconds(10));

	List<Poem> list = new ArrayList<>();
	list.add(new Poem("行宫", "元稹", "寥落古行宫，宫花寂寞红。白头宫女在，闲坐说玄宗。"));
	list.add(new Poem("新嫁娘词", "王建", "三日入厨下，洗手作羹汤。未谙姑食性，先遣小姑尝。"));
	list.add(new Poem("相思", "王维", "红豆生南国，春来发几枝。愿君多采撷，此物最相思。"));
	list.add(new Poem("杂诗三首·其二", "王维", "君自故乡来，应知故乡事。来日绮窗前，寒梅著花未？"));
	list.add(new Poem("鹿柴", "王维", "空山不见人，但闻人语响。返景入深林，复照青苔上。"));
	list.add(new Poem("芙蓉楼送辛渐", "王昌龄", "寒雨连江夜入吴，平明送客楚山孤。洛阳亲友如相问，一片冰心在玉壶。"));
	list.add(new Poem("江雪", "柳宗元", "千山鸟飞绝，万径人踪灭。孤舟蓑笠翁，独钓寒江雪。"));

	for (int i = 0; i < list.size(); i++) {
		request.add(new IndexRequest(INDEX)
				.id((i+2)+"")
				.source(JSON.toJSONString(list.get(i)), XContentType.JSON)
		);
	}
	BulkResponse bulk = restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
	System.out.println(bulk.status());
	System.out.println(bulk.hasFailures());
}

判断文档是否存在

/**
 * 判断文档是否存在
 * @throws IOException
 */
@Test
void chkDocExist() throws IOException {
	GetRequest request = new GetRequest(INDEX);
	request.id("1");
	boolean exists = restHighLevelClient.exists(request, RequestOptions.DEFAULT);
	System.out.println(exists);
}

获取文档

/**
 * 获取文档
 * @throws IOException
 */
@Test
void getDoc() throws IOException {
	GetRequest request = new GetRequest(INDEX);
	request.id("1");
	GetResponse documentFields = restHighLevelClient.get(request, RequestOptions.DEFAULT);
	System.out.println(JSON.toJSONString(documentFields.getSource()));
}

结果：

{"author":"元稹","title":"行宫","content":"寥落古行宫，宫花寂寞红。白头宫女在，闲坐说玄宗。"}

更新文档

/**
 * 更新文档
 * @throws IOException
 */
@Test
void updateDoc() throws IOException {
	UpdateRequest request = new UpdateRequest(INDEX, "1");
	request.timeout(TimeValue.timeValueSeconds(1));
	Poem poem = new Poem("登鹳雀楼", "王之涣", "白日依山尽，黄河入海流。欲穷千里目，更上一层楼。");
	request.doc(JSON.toJSONString(poem), XContentType.JSON);
	UpdateResponse updateResponse = restHighLevelClient.update(request, RequestOptions.DEFAULT);
	System.out.println(JSON.toJSONString(updateResponse.status()));
	System.out.println(updateResponse.getGetResult());
}

删除文档

/**
 * 删除文档
 * @throws IOException
 */
@Test
void delDoc() throws IOException {
	DeleteRequest request = new DeleteRequest(INDEX, "2");
	DeleteResponse deleteResponse = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
	System.out.println(deleteResponse.status());
}

搜索

/**
 * 搜索
 * @throws IOException
 */
@Test
void search() throws IOException {
	SearchRequest request = new SearchRequest(INDEX);
	SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
	MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("content", "三");
	SearchSourceBuilder query = sourceBuilder.query(matchQueryBuilder);
	request.source(query);
	SearchResponse search = restHighLevelClient.search(request, RequestOptions.DEFAULT);
	System.out.println(search.status());
	System.out.println(JSON.toJSONString(search));
}

QueryBuilders 构建查询条件

使用Jsoup爬取网页数据写入ES

引入maven依赖

<dependency>
	<groupId>org.jsoup</groupId>
	<artifactId>jsoup</artifactId>
	<version>1.10.2</version>
</dependency>

新建解析Html工具类

public class HtmlParseUtil {
}

分析网页

审查元素，找到唐诗的主体部分，找到对应的html标签

我们发现，class=”sons“的标签下面有7个div，分别对应着无言绝句、七言绝句…

我们点开第一个div，也就是五言绝句，发现里面的span标签对应着五言绝句的标题，里面是个a标签，点击跳转到诗词体

思路：

通过循环所有的class="typecont"的div，拿到span标签下的a标签的链接，请求后拿到诗词体

public void parseHtml() throws Exception {
    // 最外层的URL
    String wrapUrl = "https://www.gushiwen.org/gushi/tangshi.aspx";
    // 使用Jsoup.parse，把HTML结果解析成Document对象，我们可以像js那样使用里面的方法
    Document document = Jsoup.parse(new URL(wrapUrl), 50000);
    Elements elements = document.getElementsByClass("typecont");
    for (int i = 0; i < elements.size(); i++) {
        Element element = elements.get(i);
        Elements spans = element.getElementsByTag("span");
        for (int j = 0; j < spans.size(); j++) {
            Element span = spans.get(j);
            String src = span.getElementsByTag("a").eq(0).attr("href");
            String title = span.getElementsByTag("a").eq(0).text();

            System.out.println("title: " + title + ", src: " + src);
        }
    }
}

public static void main(String[] args) throws Exception {
    new HtmlParseUtil().parseHtml();
}

拿到了链接之后，我们点进去分析诗词体的HTML

通过查看元素我们发现，在class="cont"里面，

h1的内容是标题，

标签里面的第二个标签内容是作者，

的内容是诗词内容，这个id是contson和url中的内容拼接

public void parseHtml() throws Exception {
    // 最外层的URL
    String wrapUrl = "https://www.gushiwen.org/gushi/tangshi.aspx";
    // 使用Jsoup.parse，把HTML结果解析成Document对象，我们可以像js那样使用里面的方法
    Document document = Jsoup.parse(new URL(wrapUrl), 50000);
    Elements elements = document.getElementsByClass("typecont");
    for (int i = 0; i < elements.size(); i++) {
        Element element = elements.get(i);
        Elements spans = element.getElementsByTag("span");
        for (int j = 0; j < spans.size(); j++) {
            Element span = spans.get(j);
            String src = span.getElementsByTag("a").eq(0).attr("href");

            // 请求每一个URL，得到诗词体
            Document sonDoc = Jsoup.parse(new URL(src), 50000);
            // 获取url中的ID，下面获取诗词体的时候用得到
            String id = src.substring(src.indexOf("_")+1, src.indexOf(".aspx"));
            Element body = sonDoc.getElementById("sonsyuanwen");
            Element cont = body.getElementsByClass("cont").get(0);
            String title = cont.getElementsByTag("h1").eq(0).text();
            String author = cont.getElementsByTag("p").get(0).getElementsByTag("a").eq(1).text();
            String content = cont.getElementById("contson" + id).text();

            System.out.println("title: " + title + ", author: " + author + ", content: " + content);
        }
    }
}

爬取的数据写入到es

public List<Poem> parseHtml() throws Exception {
    String wrapUrl = "https://www.gushiwen.org/gushi/tangshi.aspx";
    Document document = Jsoup.parse(new URL(wrapUrl), 50000);
    Elements elements = document.getElementsByClass("typecont");
    List<Poem> poems = new ArrayList<>();
    for (int i = 0; i < elements.size(); i++) {
        Element element = elements.get(i);
        Elements spans = element.getElementsByTag("span");
        for (int j = 0; j < spans.size(); j++) {
            Element span = spans.get(j);
            String src = span.getElementsByTag("a").eq(0).attr("href");
            Document sonDoc = Jsoup.parse(new URL(src), 50000);
            String id = src.substring(src.indexOf("_")+1, src.indexOf(".aspx"));
            Element body = sonDoc.getElementById("sonsyuanwen");
            Element cont = body.getElementsByClass("cont").get(0);
            String title = cont.getElementsByTag("h1").eq(0).text();
            String author = cont.getElementsByTag("p").get(0).getElementsByTag("a").eq(1).text();
            String content = cont.getElementById("contson" + id).text();

            poems.add(new Poem(title, author, content));
        }
    }
    return poems;
}

使用es的批量插入方法，将数据写入到es

@Test
void insertHtmlParser() throws Exception {
	BulkRequest request = new BulkRequest(INDEX);
	request.timeout(TimeValue.timeValueSeconds(100));
	List<Poem> poems = new HtmlParseUtil().parseHtml();
	for (Poem poem : poems) {
		request.add(new IndexRequest(INDEX)
				.source(JSON.toJSONString(poem), XContentType.JSON)
		);
	}
	BulkResponse bulk = restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
	System.out.println(bulk.hasFailures());
}

前端使用Vue.js完成搜索功能

引入js

axios.min.js （网络交互）
vue.min.js

页面编写

<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">

<head>
    <meta charset="utf-8"/>
    <title>古诗词搜索</title>
    <link rel="stylesheet" th:href="@{/static/css/style.css}"/>

</head>

<body class="pg">
<div class="page" id="app">
    <div id="mallPage" class=" mallist tmall- page-not-market ">

        <div id="header" class=" header-list-app">
            <div class="headerLayout">
                <div class="headerCon ">
                    <div class="header-extra">

                        <!--搜索-->
                        <div id="mallSearch" class="mall-search">
                            <form name="searchTop" class="mallSearch-form clearfix">
                                <fieldset>
                                    <legend>搜索</legend>
                                    <div class="mallSearch-input clearfix">
                                        <div class="s-combobox" id="s-combobox-685">
                                            <div class="s-combobox-input-wrap">
                                                <input v-model="keyword" type="text" autocomplete="off" value="dd"
                                                       id="mq"
                                                       class="s-combobox-input" aria-haspopup="true">
                                            </div>
                                        </div>
                                        <button @click.prevent="searchKey" type="submit" id="searchbtn">搜索</button>
                                    </div>
                                </fieldset>
                            </form>
                        </div>
                    </div>
                </div>
            </div>
        </div>

        <div id="content">
            <div class="main">

                <div class="view">
                    <div class="product" v-for="result in results">
                        <div class="product-iWrap">
                            <div style="text-align: center">
                                {{result.title}}
                            </div>
                            <div style="text-align: center">
                                {{result.author}}
                            </div>
                            <div style="text-align: left" v-html="result.content">
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>

<script th:src="@{/static/js/vue.min.js}"></script>
<script th:src="@{/static/js/axios.min.js}"></script>
<script>
    new Vue({
        el: '#app',
        data: {
            keyword: '',
            results: []
        },
        methods: {
            searchKey() {
                var keyword = this.keyword;
                console.log(keyword)
                axios.get("/search/"+keyword+"/1/100").then(res => {
                    console.log(res.data)
                    this.results = res.data;
                });
            }
        }
    });
</script>

</body>
</html>

后端编写

IndexController

package com.youngj.es.controller;

import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;

/**
 * description：
 *
 * @author YoungJ
 * @date 2020-09-12 15:15
 */
@Controller
public class IndexController {

    @GetMapping({"/", "/index"})
    public String index() {
        return "index";
    }
}

SearchController

package com.youngj.es.controller;

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

/**
 * description：
 *
 * @author YoungJ
 * @date 2020-09-12 15:46
 */
@RestController
public class SearchController {

    private static final String INDEX = "poem";

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List<Map<String, Object>> search(@PathVariable("keyword") String keyword,
                         @PathVariable("pageNo") int pageNo,
                         @PathVariable("pageSize") int pageSize) throws Exception {
        SearchRequest searchRequest = new SearchRequest(INDEX);

        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

        HighlightBuilder highlightBuilder = new HighlightBuilder()
                .requireFieldMatch(false)
                .field("content")
                .preTags("<span style='color: red'>")
                .postTags("</span>");

        sourceBuilder.highlighter(highlightBuilder);

        // 分页
        sourceBuilder.from(pageNo);
        sourceBuilder.size(pageSize);

        TermQueryBuilder termQueryBuilder = new TermQueryBuilder("content", keyword);
        sourceBuilder.query(termQueryBuilder);
        sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        searchRequest.source(sourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        List<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit hit : searchResponse.getHits().getHits()) {
            Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            HighlightField content = highlightFields.get("content");
            if (content != null) {
                Text[] fragments = content.getFragments();
                String newCon = "";
                for (Text text : fragments) {
                    newCon += text;
                }
                sourceAsMap.put("content", newCon);
            }
            list.add(sourceAsMap);
        }
        return list;
    }
}

最终效果

搜索古诗词中含有”白“的

搜索古诗词中含有夜的

YoungJ5788

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
SpringBoot+Es7.6.1+Jsoup+Vue+Docker打造古诗词实时搜索功能

文章目录服务安装下载安装elasticsearch安装elasticsearch head插件新建索引安装Kibana安装ik分词器ElasticSearch基本操作操作说明常用操作默认字段类型指定字段类型（定义索引规则）查询普通查询按条件查询查询指定字段排序分页多条件查询范围查询高亮显示修改删除SpringBoot集成ES引入maven依赖新建ElasticSearch配置类测试相关API创建测试类创建index判断索引是否存在删除索引创建文档批量创建文档判断文档是否存在获取文档更新文档删除文档搜索使用J
复制链接

扫一扫