服务安装
下载安装elasticsearch
Docker 安装 elasticsearch:7.6.1
docker pull elasticsearch:7.6.1
mkdir -p /Users/szcl/mydata/elasticsearch/config
mkdir -p /Users/szcl/mydata/elasticsearch/data
echo "http.host: 0.0.0.0" >> /Users/szcl/mydata/elasticsearch/config/elasticsearch.yml
chmod -R 777 /Users/szcl/mydata/elasticsearch/
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms64m -Xmx128m" -v /Users/szcl/mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v /Users/szcl/mydata/elasticsearch/data:/usr/share/elasticsearch/data -v /Users/szcl/mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins -d elasticsearch:7.6.1
查看是否启动成功
docker ps -a
如果未启动成功,通过以下命令查看日志:
docker logs -f b016c22606e1
访问服务器的9200端口:
安装elasticsearch head插件
docker pull mobz/elasticsearch-head:5
docker run -d -p 9100:9100 docker.io/mobz/elasticsearch-head:5
启动成功后访问:
刚安装的话可能存在跨域拒绝访问问题,需要修改配置,有两种方式:
-
直接修改elasticsearch外挂的配置
cd /mydata/elasticsearch/config vim elasticsearch.yml
在配置中新增
http.cors.enabled: true http.cors.allow-origin: "*"
重启容器
docker restart b016c22606e1
-
进入容器修改配置
docker exec -it b016c22606e1 /bin/bash cd ./config vim elasticsearch.yml
在配置中新增
http.cors.enabled: true http.cors.allow-origin: "*"
重启容器
docker restart b016c22606e1
新建索引
发现点OK时,没有反应,查看控制台
发现返回406错误代码,点进去查看详情
发现不支持x-www-form-urlencoded
解决方法:
-
进入head容器
docker exec -it 62c5c56241ae /bin/bash
-
进入_site文件夹
-
编辑vendor.js
vim vendor.js
-
把容器的文件copy到宿主机中编辑
参考:https://blog.csdn.net/zhaoyajie1011/article/details/98610002
-
安装vim
apt-get update apt-get install vim
-
-
修改内容
contentType: "application/x-www-form-urlencoded 修改为: contentType: "application/json;charset=UTF-8"
var inspectData = s.contentType === "application/x-www-form-urlencoded" 修改为: var inspectData = s.contentType === "application/json;charset=UTF-8"
-
重启容器
这时候创建成功了!但是head这个插件主要用来数据展示,不适合做些复杂查询,我们做查询最好安装功能更强大的Kibana
安装Kibana
-
Docker 安装
docker pull kibana:7.6.1
-
启动镜像
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://IP:9200 -p 5601:5601 -d kibana:7.6.1
-
修改配置
这里我把容器中的文件copy到宿主机上进行修改
docker cp 970f63f0babb:/usr/share/kibana/config/kibana.yml /mydata/kibana/config/
直接在宿主机编辑
vim kibana.yml
修改以下内容:
server.name: kibana server.port: 5601 server.host: "0.0.0.0" elasticsearch.hosts: [ "http://IP:9200" ] i18n.locale: "zh-CN" xpack.monitoring.ui.container.elasticsearch.enabled: true
把修改好的配置copy到容器中
docker cp /mydata/kibana/config/kibana.yml 970f63f0babb:/usr/share/kibana/config/
-
重启容器
docker restart 970f63f0babb
-
浏览器访问5601端口
安装ik分词器
-
进入elasticsearch容器
docker exec -it 98d725e6291e /bin/bash
-
安装
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.1/elasticsearch-analysis-ik-7.6.1.zip
-
重启所有容器
-
测试分词效果
-
打开kibana控制台http://localhost:5601/
-
侧边栏找到Dev Tools
-
测试ik_max_word(最细粒度拆分)
POST _analyze { "analyzer": "ik_max_word", "text": "中国共产党" }
-
测试ik_smart(最少切分)
POST _analyze { "analyzer": "ik_smart", "text": "中国共产党" }
-
-
自定义分词
-
比如我要对“我爱赵亚杰”进行分词,不管是ik_smart 还是 ik_max_word,都会把名字拆分成单个字
-
这时候就需要用到自定义分词,进入容器,找到ik分词器的配置
exec -it 98d725e6291e /bin/bash cd config/analysis-ik/ vi IKAnalyzer.cfg.xml
在
<entry key="ext_dict"></entry>
中配置自己的分词字典<entry key="ext_dict">my.dic</entry>
保存,新建my.dic词典
vi my.dic
my.dic中输入
赵亚杰
三个字,保存 -
重启elasticsearch容器
-
测试自定义分词效果
-
ElasticSearch基本操作
操作说明
操作 | method | URL地址 |
---|---|---|
创建文档(指定文档ID) | PUT | localhost:9200/索引名称/类型名称/文档ID |
创建文档(随机文档ID) | POST | localhost:9200/索引名称/类型名称 |
修改文档 | POST | localhost:9200/索引名称/类型名称/文档ID/_update |
删除文档 | DELETE | localhost:9200/索引名称/类型名称/文档ID |
查看文档(通过文档ID) | GET | localhost:9200/索引名称/类型名称/文档ID |
查询所有数据 | POST | localhost:9200/索引名称/类型名称/_search |
常用操作
-
查看健康状态
GET _cat/health
1599732945 10:15:45 elasticsearch yellow 1 1 5 5 0 0 2 0 - 71.4%
-
查看_cat里包含哪些东西
GET _cat/indices
yellow open poem tWco8rUWQCS1YuMtkrCl4A 1 1 1 0 5.1kb 5.1kb green open .kibana_task_manager_1 1SxsVdvgSZOOQ3X9wKXJzQ 1 0 2 1 16.2kb 16.2kb yellow open poem2 xWMF79GYTaKco1Ljo2SmrA 1 1 0 0 283b 283b green open .apm-agent-configuration UGRU7tD0Tj-bOnmo-nfZrw 1 0 0 0 283b 283b green open .kibana_1 r65DwNYWSha1v7AW5v62QQ 1 0 20 6 48kb 48kb
…
通过_cat可以查看很多信息
###创建索引
默认字段类型
PUT /poem/poem/1
{
"title": "相思",
"author": "王维",
"content": "红豆生南国,春来发几枝。愿君多采撷,此物最相思。"
}
执行
使用elasticsearch head插件查看index
通过数据浏览查看文档内容
指定字段类型(定义索引规则)
PUT /poem2
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"date": {
"type": "date"
},
"content": {
"type": "text"
}
}
}
}
使用head插件查看
查询
普通查询
GET /poem/_doc/1
或
GET /poem/poem/1
查询index为poem,_doc是默认的type,在elasticsearch8.x后,type会被淘汰,1是id为1的内容
{
"_index" : "poem",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "相思",
"author" : "王维",
"content" : "红豆生南国,春来发几枝。愿君多采撷,此物最相思。"
}
}
按条件查询
content包含“一”的:
GET /poem/_search?q=content:一
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.74386525,
"hits" : [
{
"_index" : "poem",
"_type" : "poem",
"_id" : "2",
"_score" : 0.74386525,
"_source" : {
"title" : "登鹳雀楼",
"author" : "王之涣",
"content" : "白日依山尽,黄河入海流。欲穷千里目,更上一层楼。"
}
},
{
"_index" : "poem",
"_type" : "poem",
"_id" : "3",
"_score" : 0.6489038,
"_source" : {
"title" : "九月九日忆山东兄弟",
"author" : "王维",
"content" : "独在异乡为异客,每逢佳节倍思亲。遥知兄弟登高处,遍插茱萸少一人。"
}
}
]
}
}
这里是否是模糊查询,取决于定义index的时候,字段的类型,如果是text类型,那么将会被分词,如果为keyword类型,将不会被分词。
查询指定字段
GET /poem/poem/_search
{
"query": {
"match": {
"content": "一"
}
},
"_source": ["title", "content"]
}
match会使用分词器解析
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.74386525,
"hits" : [
{
"_index" : "poem",
"_type" : "poem",
"_id" : "2",
"_score" : 0.74386525,
"_source" : {
"title" : "登鹳雀楼",
"content" : "白日依山尽,黄河入海流。欲穷千里目,更上一层楼。"
}
},
{
"_index" : "poem",
"_type" : "poem",
"_id" : "3",
"_score" : 0.6489038,
"_source" : {
"title" : "九月九日忆山东兄弟",
"content" : "独在异乡为异客,每逢佳节倍思亲。遥知兄弟登高处,遍插茱萸少一人。"
}
}
]
}
}
排序
GET /poem/poem/_search
{
"query": {
"match": {
"content": "一"
}
},
"_source": ["title", "content","date"],
"sort": [
{
"date": {
"order": "asc"
}
}
]
}
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "poem",
"_type" : "poem",
"_id" : "2",
"_score" : null,
"_source" : {
"date" : "2020-09-10",
"title" : "登鹳雀楼",
"content" : "白日依山尽,黄河入海流。欲穷千里目,更上一层楼。"
},
"sort" : [
1599696000000
]
},
{
"_index" : "poem",
"_type" : "poem",
"_id" : "3",
"_score" : null,
"_source" : {
"date" : "2020-09-11",
"title" : "九月九日忆山东兄弟",
"content" : "独在异乡为异客,每逢佳节倍思亲。遥知兄弟登高处,遍插茱萸少一人。"
},
"sort" : [
1599782400000
]
}
]
}
}
分页
GET /poem/poem/_search
{
"query": {
"match": {
"content": "一"
}
},
"_source": ["title", "content","date"],
"sort": [
{
"date": {
"order": "asc"
}
}
],
"from": 0,
"size": 1
}
from: 从多少条开始查询;
size:查询条数
多条件查询
GET /poem/poem/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"author": "王维"
}
},
{
"match": {
"date": "2020-09-11"
}
}
]
}
}
}
或
GET /poem/poem/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"author": "王维"
}
},
{
"match": {
"date": "2020-09-12"
}
}
]
}
}
}
must 相当于mysql的and
must_not 相当于mysql的not
should 相当于mysql的or
匹配多条件查询,多个词用空格分开
GET /poem/poem/_search
{
"query": {
"match": {
"content": "三 一"
}
}
}
范围查询
GET /poem/poem/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"author": "王维"
}
}
],
"filter": {
"range": {
"index": {
"gte": 1,
"lt": 3
}
}
}
}
}
}
gt 大于; gte大于等于;lt小于;lte小于等于
高亮显示
GET /poem/poem/_search
{
"query": {
"match": {
"content": "一"
}
},
"highlight": {
"pre_tags": "<span style='color: red'>",
"post_tags": "</span>",
"fields": {
"content": {}
}
}
}
使用highlight关键字
修改
POST /poem/_doc/1/_update
{
"doc": {
"date": "2020-09-10"
}
}
{
"_index" : "poem",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "相思",
"author" : "王维",
"content" : "红豆生南国,春来发几枝。愿君多采撷,此物最相思。",
"date" : "2020-09-10"
}
}
每次修改version都会自增
删除
DELETE /poem2/_doc/1(删除指定文档)
或
DELETE /poem2(删除index)
{
"_index" : "poem2",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
{
"acknowledged" : true
}
通过
GET _cat/indices
查看所有的index
yellow open poem tWco8rUWQCS1YuMtkrCl4A 1 1 1 0 12.2kb 12.2kb
green open .kibana_task_manager_1 1SxsVdvgSZOOQ3X9wKXJzQ 1 0 2 1 16.2kb 16.2kb
green open .apm-agent-configuration UGRU7tD0Tj-bOnmo-nfZrw 1 0 0 0 283b 283b
green open .kibana_1 r65DwNYWSha1v7AW5v62QQ 1 0 23 3 73.7kb 73.7kb
发现poem2已经被删掉了
SpringBoot集成ES
引入maven依赖
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
不指定版本有可能引入的和实际使用的版本不一致
<properties>
<java.version>1.8</java.version>
<elasticsearch.version>7.6.1</elasticsearch.version>
</properties>
新建ElasticSearch配置类
package com.youngj.es.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
/**
* ElasticSearch配置文件
* @author YoungJ
*/
@Configuration
public class ElasticSearchClientConfig {
@Bean
public RestHighLevelClient restHighLevelClient() {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("127.0.0.1", 9200, "http")));
return client;
}
}
测试相关API
创建测试类
package com.youngj.es.api;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.IOException;
@SpringBootTest
class EsApiApplicationTests {
private static final String INDEX = "youngj_poem";
@Autowired
private RestHighLevelClient restHighLevelClient;
@Test
void contextLoads() {
}
}
创建index
@Test
void testCreateIndex() throws IOException {
CreateIndexRequest request = new CreateIndexRequest(INDEX);
CreateIndexResponse indexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
System.out.println(indexResponse);
}
判断索引是否存在
/**
* 判断索引是否存在
* @throws IOException
*/
@Test
void getIndex() throws IOException {
GetIndexRequest request = new GetIndexRequest(INDEX);
boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}
删除索引
/**
* 删除索引
* @throws IOException
*/
@Test
void delIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest(INDEX);
AcknowledgedResponse response = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
System.out.println(response.isAcknowledged());
}
创建文档
/**
* 创建文档
* @throws IOException
*/
@Test
void addDoc() throws IOException {
IndexRequest request = new IndexRequest(INDEX);
request.id("1");
request.timeout(TimeValue.timeValueSeconds(1));
request.source(JSON.toJSONString(new Poem("行宫", "元稹", "寥落古行宫,宫花寂寞红。白头宫女在,闲坐说玄宗。")), XContentType.JSON);
IndexResponse indexResponse = restHighLevelClient.index(request, RequestOptions.DEFAULT);
System.out.println(indexResponse);
System.out.println(indexResponse.status());
}
IndexResponse[index=youngj_poem,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
CREATED
批量创建文档
/**
* 批量创建文档
* @throws IOException
*/
@Test
void addBatchDoc() throws IOException {
BulkRequest request = new BulkRequest(INDEX);
request.timeout(TimeValue.timeValueSeconds(10));
List<Poem> list = new ArrayList<>();
list.add(new Poem("行宫", "元稹", "寥落古行宫,宫花寂寞红。白头宫女在,闲坐说玄宗。"));
list.add(new Poem("新嫁娘词", "王建", "三日入厨下,洗手作羹汤。未谙姑食性,先遣小姑尝。"));
list.add(new Poem("相思", "王维", "红豆生南国,春来发几枝。愿君多采撷,此物最相思。"));
list.add(new Poem("杂诗三首·其二", "王维", "君自故乡来,应知故乡事。来日绮窗前,寒梅著花未?"));
list.add(new Poem("鹿柴", "王维", "空山不见人,但闻人语响。返景入深林,复照青苔上。"));
list.add(new Poem("芙蓉楼送辛渐", "王昌龄", "寒雨连江夜入吴,平明送客楚山孤。洛阳亲友如相问,一片冰心在玉壶。"));
list.add(new Poem("江雪", "柳宗元", "千山鸟飞绝,万径人踪灭。孤舟蓑笠翁,独钓寒江雪。"));
for (int i = 0; i < list.size(); i++) {
request.add(new IndexRequest(INDEX)
.id((i+2)+"")
.source(JSON.toJSONString(list.get(i)), XContentType.JSON)
);
}
BulkResponse bulk = restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
System.out.println(bulk.status());
System.out.println(bulk.hasFailures());
}
判断文档是否存在
/**
* 判断文档是否存在
* @throws IOException
*/
@Test
void chkDocExist() throws IOException {
GetRequest request = new GetRequest(INDEX);
request.id("1");
boolean exists = restHighLevelClient.exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}
获取文档
/**
* 获取文档
* @throws IOException
*/
@Test
void getDoc() throws IOException {
GetRequest request = new GetRequest(INDEX);
request.id("1");
GetResponse documentFields = restHighLevelClient.get(request, RequestOptions.DEFAULT);
System.out.println(JSON.toJSONString(documentFields.getSource()));
}
结果:
{"author":"元稹","title":"行宫","content":"寥落古行宫,宫花寂寞红。白头宫女在,闲坐说玄宗。"}
更新文档
/**
* 更新文档
* @throws IOException
*/
@Test
void updateDoc() throws IOException {
UpdateRequest request = new UpdateRequest(INDEX, "1");
request.timeout(TimeValue.timeValueSeconds(1));
Poem poem = new Poem("登鹳雀楼", "王之涣", "白日依山尽,黄河入海流。欲穷千里目,更上一层楼。");
request.doc(JSON.toJSONString(poem), XContentType.JSON);
UpdateResponse updateResponse = restHighLevelClient.update(request, RequestOptions.DEFAULT);
System.out.println(JSON.toJSONString(updateResponse.status()));
System.out.println(updateResponse.getGetResult());
}
删除文档
/**
* 删除文档
* @throws IOException
*/
@Test
void delDoc() throws IOException {
DeleteRequest request = new DeleteRequest(INDEX, "2");
DeleteResponse deleteResponse = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
System.out.println(deleteResponse.status());
}
搜索
/**
* 搜索
* @throws IOException
*/
@Test
void search() throws IOException {
SearchRequest request = new SearchRequest(INDEX);
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("content", "三");
SearchSourceBuilder query = sourceBuilder.query(matchQueryBuilder);
request.source(query);
SearchResponse search = restHighLevelClient.search(request, RequestOptions.DEFAULT);
System.out.println(search.status());
System.out.println(JSON.toJSONString(search));
}
QueryBuilders 构建查询条件
使用Jsoup爬取网页数据写入ES
引入maven依赖
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.2</version>
</dependency>
新建解析Html工具类
public class HtmlParseUtil {
}
分析网页
审查元素,找到唐诗的主体部分,找到对应的html标签
我们发现,class=”sons“的标签下面有7个div,分别对应着无言绝句、七言绝句…
我们点开第一个div,也就是五言绝句,发现里面的span标签对应着五言绝句的标题,里面是个a标签,点击跳转到诗词体
思路:
通过循环所有的class="typecont"的div,拿到span标签下的a标签的链接,请求后拿到诗词体
public void parseHtml() throws Exception {
// 最外层的URL
String wrapUrl = "https://www.gushiwen.org/gushi/tangshi.aspx";
// 使用Jsoup.parse,把HTML结果解析成Document对象,我们可以像js那样使用里面的方法
Document document = Jsoup.parse(new URL(wrapUrl), 50000);
Elements elements = document.getElementsByClass("typecont");
for (int i = 0; i < elements.size(); i++) {
Element element = elements.get(i);
Elements spans = element.getElementsByTag("span");
for (int j = 0; j < spans.size(); j++) {
Element span = spans.get(j);
String src = span.getElementsByTag("a").eq(0).attr("href");
String title = span.getElementsByTag("a").eq(0).text();
System.out.println("title: " + title + ", src: " + src);
}
}
}
public static void main(String[] args) throws Exception {
new HtmlParseUtil().parseHtml();
}
拿到了链接之后,我们点进去分析诗词体的HTML
通过查看元素我们发现,在class="cont"里面,
h1的内容是标题,
标签里面的第二个标签内容是作者,
的内容是诗词内容,这个id是contson和url中的内容拼接
public void parseHtml() throws Exception {
// 最外层的URL
String wrapUrl = "https://www.gushiwen.org/gushi/tangshi.aspx";
// 使用Jsoup.parse,把HTML结果解析成Document对象,我们可以像js那样使用里面的方法
Document document = Jsoup.parse(new URL(wrapUrl), 50000);
Elements elements = document.getElementsByClass("typecont");
for (int i = 0; i < elements.size(); i++) {
Element element = elements.get(i);
Elements spans = element.getElementsByTag("span");
for (int j = 0; j < spans.size(); j++) {
Element span = spans.get(j);
String src = span.getElementsByTag("a").eq(0).attr("href");
// 请求每一个URL,得到诗词体
Document sonDoc = Jsoup.parse(new URL(src), 50000);
// 获取url中的ID,下面获取诗词体的时候用得到
String id = src.substring(src.indexOf("_")+1, src.indexOf(".aspx"));
Element body = sonDoc.getElementById("sonsyuanwen");
Element cont = body.getElementsByClass("cont").get(0);
String title = cont.getElementsByTag("h1").eq(0).text();
String author = cont.getElementsByTag("p").get(0).getElementsByTag("a").eq(1).text();
String content = cont.getElementById("contson" + id).text();
System.out.println("title: " + title + ", author: " + author + ", content: " + content);
}
}
}
爬取的数据写入到es
public List<Poem> parseHtml() throws Exception {
String wrapUrl = "https://www.gushiwen.org/gushi/tangshi.aspx";
Document document = Jsoup.parse(new URL(wrapUrl), 50000);
Elements elements = document.getElementsByClass("typecont");
List<Poem> poems = new ArrayList<>();
for (int i = 0; i < elements.size(); i++) {
Element element = elements.get(i);
Elements spans = element.getElementsByTag("span");
for (int j = 0; j < spans.size(); j++) {
Element span = spans.get(j);
String src = span.getElementsByTag("a").eq(0).attr("href");
Document sonDoc = Jsoup.parse(new URL(src), 50000);
String id = src.substring(src.indexOf("_")+1, src.indexOf(".aspx"));
Element body = sonDoc.getElementById("sonsyuanwen");
Element cont = body.getElementsByClass("cont").get(0);
String title = cont.getElementsByTag("h1").eq(0).text();
String author = cont.getElementsByTag("p").get(0).getElementsByTag("a").eq(1).text();
String content = cont.getElementById("contson" + id).text();
poems.add(new Poem(title, author, content));
}
}
return poems;
}
使用es的批量插入方法,将数据写入到es
@Test
void insertHtmlParser() throws Exception {
BulkRequest request = new BulkRequest(INDEX);
request.timeout(TimeValue.timeValueSeconds(100));
List<Poem> poems = new HtmlParseUtil().parseHtml();
for (Poem poem : poems) {
request.add(new IndexRequest(INDEX)
.source(JSON.toJSONString(poem), XContentType.JSON)
);
}
BulkResponse bulk = restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
System.out.println(bulk.hasFailures());
}
前端使用Vue.js完成搜索功能
引入js
-
axios.min.js (网络交互)
-
vue.min.js
页面编写
<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="utf-8"/>
<title>古诗词搜索</title>
<link rel="stylesheet" th:href="@{/static/css/style.css}"/>
</head>
<body class="pg">
<div class="page" id="app">
<div id="mallPage" class=" mallist tmall- page-not-market ">
<div id="header" class=" header-list-app">
<div class="headerLayout">
<div class="headerCon ">
<div class="header-extra">
<!--搜索-->
<div id="mallSearch" class="mall-search">
<form name="searchTop" class="mallSearch-form clearfix">
<fieldset>
<legend>搜索</legend>
<div class="mallSearch-input clearfix">
<div class="s-combobox" id="s-combobox-685">
<div class="s-combobox-input-wrap">
<input v-model="keyword" type="text" autocomplete="off" value="dd"
id="mq"
class="s-combobox-input" aria-haspopup="true">
</div>
</div>
<button @click.prevent="searchKey" type="submit" id="searchbtn">搜索</button>
</div>
</fieldset>
</form>
</div>
</div>
</div>
</div>
</div>
<div id="content">
<div class="main">
<div class="view">
<div class="product" v-for="result in results">
<div class="product-iWrap">
<div style="text-align: center">
{{result.title}}
</div>
<div style="text-align: center">
{{result.author}}
</div>
<div style="text-align: left" v-html="result.content">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<script th:src="@{/static/js/vue.min.js}"></script>
<script th:src="@{/static/js/axios.min.js}"></script>
<script>
new Vue({
el: '#app',
data: {
keyword: '',
results: []
},
methods: {
searchKey() {
var keyword = this.keyword;
console.log(keyword)
axios.get("/search/"+keyword+"/1/100").then(res => {
console.log(res.data)
this.results = res.data;
});
}
}
});
</script>
</body>
</html>
后端编写
IndexController
package com.youngj.es.controller;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;
/**
* description:
*
* @author YoungJ
* @date 2020-09-12 15:15
*/
@Controller
public class IndexController {
@GetMapping({"/", "/index"})
public String index() {
return "index";
}
}
SearchController
package com.youngj.es.controller;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
/**
* description:
*
* @author YoungJ
* @date 2020-09-12 15:46
*/
@RestController
public class SearchController {
private static final String INDEX = "poem";
@Autowired
private RestHighLevelClient restHighLevelClient;
@GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
public List<Map<String, Object>> search(@PathVariable("keyword") String keyword,
@PathVariable("pageNo") int pageNo,
@PathVariable("pageSize") int pageSize) throws Exception {
SearchRequest searchRequest = new SearchRequest(INDEX);
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
HighlightBuilder highlightBuilder = new HighlightBuilder()
.requireFieldMatch(false)
.field("content")
.preTags("<span style='color: red'>")
.postTags("</span>");
sourceBuilder.highlighter(highlightBuilder);
// 分页
sourceBuilder.from(pageNo);
sourceBuilder.size(pageSize);
TermQueryBuilder termQueryBuilder = new TermQueryBuilder("content", keyword);
sourceBuilder.query(termQueryBuilder);
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
List<Map<String, Object>> list = new ArrayList<>();
for (SearchHit hit : searchResponse.getHits().getHits()) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
HighlightField content = highlightFields.get("content");
if (content != null) {
Text[] fragments = content.getFragments();
String newCon = "";
for (Text text : fragments) {
newCon += text;
}
sourceAsMap.put("content", newCon);
}
list.add(sourceAsMap);
}
return list;
}
}
最终效果
搜索古诗词中含有”白“的
搜索古诗词中含有夜的