一、软件环境
软件 | 版本号 | 备注 |
---|---|---|
Spring boot | 2.7.2 | 3.x版本建议使用ElasticSearch8.x |
ElasticSearch | 7.17.4 | ElasticSearch 7.x 可使用JDK 8 ElasticSearch 8.x 要求使用JDK 11+ |
二、安装ElasticSearch
下载地址:https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.4-linux-x86_64.tar.gz
上传压缩包至/usr/local/
cd /usr/local/
//解压
tar -xvf elasticsearch-7.17.4-linux-x86_64.tar.gz
修改配置文件/usr/local/elasticsearch-7.17.4/config/elasticsearch.yml
注意 :后面需要跟一个空格
//数据存储路径,文件不存在则先创建
path.data: /usr/local/elasticsearch-7.17.4/data
//日志存储路径
path.logs: /usr/local/elasticsearch-7.17.4/logs
//在底部增加以下内容,以便支持设置密码
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
修改内存参数配置/usr/local/elasticsearch-7.17.4/config/jvm.options,可根据实际需求配置。
-Xms512m
-Xmx512m
JDK版本兼容,该版本默认要求JDK11,系统配置了JDK8,启动时会冲突,故进行以下调整
编辑/usr/local/elasticsearch-7.17.4/bin/elasticsearch-env,注释红框部分
ElasticSearch不能以root启动,为指定用户配置权限
//ElasticSearch不能以root启动,为指定用户配置权限
chown -R 用户名:用户名 /usr/local/elasticsearch-7.17.4
//启动ElasticSearch,需切换为非root用户
/usr/local/elasticsearch-7.17.4/bin/elasticsearch -d
//配置密码,需先启动一次ElasticSearch
/usr/local/elasticsearch-7.17.4/bin/elasticsearch-setup-passwords interactive
三、安装Kibana
下载地址:https://artifacts.elastic.co/downloads/kibana/kibana-7.17.4-linux-x86_64.tar.gz
上传压缩包至/usr/local/
cd /usr/local/
//解压
tar -zxvf kibana-7.17.4-linux-x86_64.tar.gz
编辑配置文件/usr/local/kibana-7.17.4-linux-x86_64/config/kibana.yml
//端口号
server.port: 5601
//服务器绑定地址,允许所有网络接口访问
server.host: "0.0.0.0"
//elasticsearch账户配置
elasticsearch.username: "kibana_system"
elasticsearch.password: "密码"
//中文
i18n.locale: "zh-CN"
kibana和ElasticSearch一样,不能以root启动,为指定用户配置权限
//kibana不能以root启动,为指定用户配置权限
chown -R 用户名:用户名 /usr/local/kibana-7.17.4-linux-x86_64
//前台启动
/usr/local/kibana-7.17.4-linux-x86_64/bin/kibana
//后台启动
nohup /usr/local/kibana-7.17.4-linux-x86_64/bin/kibana &
四、IK中文分词器
下载地址(根据对应的ElasticSearch版本号进行下载):
https://github.com/infinilabs/analysis-ik/releases
在ElasticSearch安装路径的plugins文件夹里,创建ik文件夹,如/usr/local/elasticsearch-7.17.4/plugins/ik,解压文件放到该路径下。
重启ElasticSearch即可。
五、Spring boot整合ElasticSearch
在Es7.15版本之后,es官方将它的高级客户端RestHighLevelClient标记为弃用状态。同时推出了全新的java API客户端Elasticsearch Java API Client,该客户端也将在Elasticsearch8.0及以后版本中成为官方推荐使用的客户端。
本文直接使用Elasticsearch Java API Client,后续方便升级8.x
pom.xml中增加:
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>7.17.24</version>
</dependency>
配置文件:
spring.elasticsearch.uris=http://localhost:9200
spring.elasticsearch.username=elastic
spring.elasticsearch.password=*******
配置类ElasticsearchConfig:
@Configuration
public class ElasticsearchConfig {
@Value("${spring.elasticsearch.uris}")
private String uris;
@Value("${spring.elasticsearch.username}")
private String username;
@Value("${spring.elasticsearch.password}")
private String password;
@Bean
public ElasticsearchClient elasticsearchClient() {
BasicCredentialsProvider credsProv = new BasicCredentialsProvider();
credsProv.setCredentials(
AuthScope.ANY, new UsernamePasswordCredentials(username, password)
);
RestClient restClient = RestClient.builder(
HttpHost.create(uris)).setHttpClientConfigCallback(hc -> hc.setDefaultCredentialsProvider(credsProv)).build();
#多节点可参考
/*
RestClient restClient = RestClient.builder(
new HttpHost("192.168.1.10", 9200),
new HttpHost("192.168.1.11", 9200),
new HttpHost("192.168.1.12", 9200)).build();
*/
ElasticsearchTransport transport = new RestClientTransport(
restClient, new JacksonJsonpMapper());
return new ElasticsearchClient(transport);
}
}
在service类中自动装配ElasticsearchClient,后续直接使用
@Autowired
private ElasticsearchClient esClient;
六、索引相关操作
1.索引是否存在
http请求
GET /索引名称
JAVA API
BooleanResponse existsResponse = esClient.indices()
.exists(builder -> builder.index("索引名称"));
if (existsResponse.value()) {
//存在
}else{
//不存在
}
2.创建索引
http请求
PUT /索引名称
{
//指定默认分词器为ik_max_word
"settings" : {
"index" : {
"analysis.analyzer.default.type": "ik_max_word"
}
},
"mappings": {
"properties": {
"字段1": {
"type": "keyword" //keyword不进行分词
},
"字段2": {
"type": "text" //text进行分词
},
"字段3": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
JAVA API
//方式一
//定义映射
TypeMapping typeMapping = new TypeMapping.Builder()
.properties("integer字段", p -> p.integer(i -> i))
.properties("keyword字段",p->p.keyword(k -> k))
.properties("text字段", p -> p.text(t -> t))
.properties("日期字段", p -> p.date(d -> d.format("yyyy-MM-dd")))
.properties("日期时间字段", p -> p.date(d -> d.format("yyyy-MM-dd HH:mm:ss")))
.build();
esClient.indices().create(new CreateIndexRequest.Builder()
.index("索引名称")
.mappings(typeMapping)
.build());
//方式二、根据json内容创建索引
String mappings = """
{
"mappings" : {
"properties" : {
"integer字段" : {
"type" : "integer"
},
"keyword字段" : {
"type" : "keyword"
},
"text字段" : {
"type" : "text"
},
"日期字段" : {
"type" : "date",
"index" : false,
"format" : "yyyy-MM-dd"
},
"日期时间字段" : {
"type" : "date",
"index" : false,
"format" : "yyyy-MM-dd HH:mm:ss"
}
}
}
}
""";
esClient.indices().create(new CreateIndexRequest.Builder()
.index("索引名称")
.withJson(new StringReader(mappings))
.build());
3.查询索引映射信息
http请求
GET /索引名称/_mapping
JAVA API
GetMappingResponse response = esClient.indices()
.getMapping(builder -> builder.index("索引名称"));
IndexMappingRecord indexMappingRecord = response.get("索引名称");
TypeMapping typeMapping = indexMappingRecord.mappings();
Map<String, Property> properties=typeMapping.properties();
List<IndexMapping> mappings=new ArrayList<>();
for(String key:properties.keySet()){
IndexMapping mapping_item=new IndexMapping();
//字段名称
mapping_item.setField_name(key);
String json_str=String.valueOf(properties.get(key)._get());
json_str=json_str.substring(json_str.indexOf("Property: ")+9);
JSONObject property_json= JSONObject.parseObject(json_str);
//字段类型
mapping_item.setField_type(property_json.getString("type"));
//自定义格式
if(property_json.containsKey("format")){
mapping_item.setField_format(property_json.getString("format"));
}
mappings.add(mapping_item);
}
4.向索引添加映射字段
http请求
PUT /索引名称/_mapping
{
"properties": {
"新增字段": {
"type": "keyword"
}
}
}
JAVA API
//JSONObject mappings 为要增加的映射内容,参考http请求,这里省略细节
PutMappingResponse response=esClient.indices()
.putMapping(new PutMappingRequest.Builder()
.index("索引名称")
.withJson(new StringReader(mappings.toString()))
.build());
// 响应状态
Boolean acknowledged = response.acknowledged();
5.删除索引
http请求
DELETE /索引名称
JAVA API
DeleteIndexResponse response = esClient.indices()
.delete(builder -> builder.index("索引名称");
// 响应状态
Boolean acknowledged = response.acknowledged();
七、文档相关操作
1.添加文档
http请求
POST /索引名称/_doc/文档id
{
"字段1": "内容",
"字段2": "内容"
}
JAVA API
// 插入文档到索引
//JSONObject json为文档内容
IndexRequest<Object> request = new IndexRequest.Builder<>()
.index("索引名称")
.id(”文档id“)
.document(json)
.build();
IndexResponse response = esClient.index(request);
2.编辑文档
http请求
PUT /索引名称/_doc/文档id
{
"要修改的字段1":"要修改的内容",
"要修改的字段2":"要修改的内容"
}
JAVA API
//要修改的内容用Map组装
Map<String,Object> updateMap=new HashMap<>();
UpdateRequest<Object, Object> updateRequest = new UpdateRequest.Builder<>()
.index("索引名称")
.id(”文档id“)
.doc(updateMap)
.build();
UpdateResponse<Object> updateResponse = esClient.update(updateRequest, Object.class);
3.根据id查询文档
http请求
GET /索引名称/_doc/文档id
JAVA API
GetRequest getRequest = new GetRequest.Builder()
.index("索引名称")
.id(”文档id“)
.build();
GetResponse<Object> response = esClient.get(getRequest, Object.class);
if (response.found()) {
return response.source();
} else {
throw new MyException(ResultEnum.DATA_IS_EXIST.getCode(),"数据不存在");
}
4.删除文档
http请求
DELETE /索引名称/_doc/文档id
JAVA API
//支持批量删除,String[] id_arr为要删除的文档id数组
List<BulkOperation> bulkOperations = new ArrayList<>();
for(int i=0;i<id_arr.length;i++){
String del_id=id_arr[i];
bulkOperations.add(new BulkOperation.Builder().delete(
d -> d.id(del_id).index("索引名称")).build());
}
BulkResponse bulkResponse = esClient.bulk(e -> e.index("索引名称").operations(bulkOperations));
5.筛选文档(结合高亮查询)
http请求
GET blog/_search
{
"query": {
"bool" : {
//必须满足的条件
"must" : [
//精确匹配
{"term" : { "字段名称" : "自动内容" }},
//模糊查询
{"query_string": {
"default_field": "字段名称",
"query": "*模糊匹配内容*"
}},
//范围查询,gt大于,gte大于等于,lt小于,lte小于等于
"range": {
"字段名称": {
"gte": "最小值",
"lte": "最大值"
}
}
],
//排除的条件
"must_not" : [
//精确匹配
{"term" : { "字段名称" : "自动内容" }},
//模糊查询
{"query_string": {
"default_field": "字段名称",
"query": "*模糊匹配内容*"
}}
]
}
},
"highlight": {
"boundary_scanner_locale":"zh_CN",
"fields": {
"高亮字段1": {
"pre_tags": [
"<span color='red'>"
],
"post_tags": [
"</span>"
]
},
//默认采用<em>标签
"高亮字段2": {}
}
},
//排序规则
"sort": [
{
//根据评分排序
"_score": {
"order": "desc"
}
},
{
"字段名称": {
"order": "desc"
}
}
],
//从第几条开始获取,从0开始
"from": 0,
//获取多少条
"size": 10
}
JAVA API
//queryJson是查询条件的json,参考http方式
SearchRequest searchRequest = new SearchRequest.Builder()
.index("索引名称")
.withJson(new StringReader(queryJson.toString()))
.highlight(highlightBuilder->highlightBuilder
.preTags("<span color='red'>")
.postTags("</span>")
.fields("高亮字段1", highlightFieldBuilder -> highlightFieldBuilder)
.fields("高亮字段2", highlightFieldBuilder -> highlightFieldBuilder))
.build();
SearchResponse<Object> response = esClient.search(searchRequest,Object.class);
List<Hit<Object>> hits = response.hits().hits();
//不需要输出文档id的话可以不要
List<Map<String,Object>> data_list = hits.stream().map(p->{
Map<String,Object> map=new HashMap<>();
map.put("docoment_id",p.id());
map.putAll((Map<String, Object>)p.source());
map.put("highlight",p.highlight());
return map;
}).collect(Collectors.toList());
//data_list是数据list,
//符合条件的数据总数为(int)response.hits().total().value()
八、结合ingest attachment实现文档解析
1.安装ingest attachment插件
下载地址(版本号可根据对应的ElasticSearch版本号进行替换):
安装方法:
切换至ElasticSearch根目录,执行
#linux
./bin/elasticsearch-plugin install file:///path/to/ingest-attachment-7.17.4.zip
#windows
./bin/elasticsearch-plugin install file:///C:/path/to/ingest-attachment-7.17.4.zip
重启ElasticSearch,定义文本抽取管道
PUT /_ingest/pipeline/attachment
{
"description": "Extract attachment information",
"processors": [
{
"attachment": {
"field": "content",
"ignore_missing": true
}
},
{
"remove": {
"field": "content"
}
}
]
}
在attachment
中指定要过滤的字段为content
,所以写入Elasticsearch
时需要将文档内容放在content
字段,传入内容需为文档的base64编码。支持txt、word、Excel、PPT、PDF等文件格式。
2.文件转base64编码
// 文件路径
String filePath = "E:/xxx/xxx.pdf"; // 请替换为你的文件路径
// 读取文件字节
byte[] fileBytes = Files.readAllBytes(Paths.get(filePath)); // 读取文件内容
String base64_str = Base64.getEncoder().encodeToString(fileBytes); // 编码为Base64字符串
3.实现文档解析
/**
* 模拟管道处理,仅模拟,不会真正插入文档
*/
Map<String, Object> source = new HashMap<>();
source.put("content", "文档的base64编码");
SimulateResponse response = client.ingest().simulate(builder -> builder
.id("my-pipeline")
.docs(documentBuilder -> documentBuilder
.index("索引名称")
.id(”文档id“)
.source(JsonData.of(source))));
log.info("response={}", response);
4.在文档索引的过程中使用
Map<String, Object> source = new HashMap<>();
source.put("content", "文档的base64编码");
IndexRequest<Object> request = new IndexRequest.Builder<>()
.index("索引名称")
.id(”文档id“)
.document(JsonData.of(source))
.pipeline("attachment")
.build();
IndexResponse response = esClient.index(request);
logger.info(response.toString());