ElasticSearch安装
Elasticsearch是一个非常强大的搜索引擎,官网下载地址:https://www.elastic.co/cn/downloads/elasticsearch.
下载时需特别注意:
- 8.X以下的Elasticsearch版本需要和Springboot版本对应,如果你使用Springboot集成Elasticsearch.版本对应图如下:
- 8.X以上的版本可以参考官方文档: https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/8.2/installation.html
windows和liunx安装步骤基本一致,下载后解压,然后启动bin目录下的elasticsearch脚本即可(后台启动需要添加-d).
如需设置账号密码
1、修改elasticsearch.yml文件,重启ES
#是否启用es的安全设置,启用安全设置后es各节点、客户端的传输都会加密,并需要账号密码
xpack.security.enabled: true
2.设置默认用户密码
[elsearch@HN-82 bin]$ ./elasticsearch-setup-passwords interactive
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]:
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Changed password for user [apm_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]
设置完毕,接下来可以安装IK插件,或者直接准备测试!!!
测试的话:
windows用ElasticSearch Head简单调试.
liunx用命令: curl --user elastic:password -X GET localhost:9200调试.
出现这个表示liunx环境下部署成功.
开启远程访问
默认ES无法使用主机ip进行远程连接,需要开启远程连接权限
1.修改ES安装包中config/elasticsearch.yml配置文件
vim elasticsearch.yml
2.重新启动ES服务
./elasticsearch
启动出现如下错误:
bootstrap check failure [1] of [4]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
bootstrap check failure [2] of [4]: max number of threads [3802] for user [chenyn] is too low, increase to at least [4096]
bootstrap check failure [3] of [4]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
bootstrap check failure [4] of [4]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers
3.1解决错误-1
$ vim /etc/security/limits.conf
在最后面追加下面内容
soft nofile 65536
hard nofile 65536
soft nproc 4096
hard nproc 4096
退出重新登录检测配置是否生效:
ulimit -Hn
ulimit -Sn
ulimit -Hu
ulimit -Su
3.2解决错误-2
#进入limits.d目录下修改配置文件。
$ vim /etc/security/limits.d/20-nproc.conf
修改为
启动ES用户名 soft nproc 4096
3.3解决错误-3
编辑sysctl.conf文件
$ vim /etc/sysctl.conf
vm.max_map_count=655360 #centos7 系统
vm.max_map_count=262144 #ubuntu 系统
执行以下命令生效:
$ sysctl -p
3.4解决错误-4
编辑elasticsearch.yml配置文件
$ vim conf/elasticsearch.yml
cluster.initial_master_nodes: [“node-1”]
4.如果有密码
编辑elasticsearch.yml配置文件,
$ vim conf/elasticsearch.yml
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
重启启动ES服务,并通过浏览器访问
IK中文分词插件
需要中文分词需要安装这个插件,ES自带的中文分词是按字来的,不是词.
地址: https://github.com/medcl/elasticsearch-analysis-ik/tree/master
下载时也需特别注意版本:
安装只需要把下载完成后直接解压,然后再elasticsearch-7.15.2下的plugins文件夹中新建ik文件夹,并把刚刚解压的ik文件放入.
测试:
GET /_analyze
{
“analyzer”: “ik_max_word”,
“text”: “我在测试”
}
ik_smart为最少切分,ik_max_word为最细粒度划分
扩展词词典
一些词语或短语如网络语,可以自己添加定义.
- 打开IK分词器config目录
- 在IKAnalyzer.cfg.xml配置文件内容添加:
3.新建一个 ext.dic,可以参考config目录下复制一个配置文件进行修改
4)重启elasticsearch
停用词词典
一些我们在搜索时想忽略的词汇,如:关于宗教、政治等敏感词语,可以自己添加定义.
同扩展词词典,不过多介绍.
ElasticSearch基本操作
可以使用kibana来进行网页端操作,也可以安装ElasticSearch Head来进行操作.
下面以ElasticSearch Head进行基本操作演示:
首先在Google浏览器中安装ElasticSearch Head插件,然后打开它,如下图:
然后开始连接,登录后即可使用:
除了基本信息查看,我们还需要进行复杂查询,比如查询IK自定义的分词:
如果使用了IK插件并自定义了生产工艺这个扩展词,会被分为生产工艺
如果使用了IK插件没有自定义分词,会被分词为生产,工艺.
如果没有使用IK插件,会被分词为生,产,工,艺.
Springboot集成Elasticsearch
- 首先maven里添加依赖.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
- 然后在yaml中配置.
- 创建config文件
@Configuration
public class ElasticSearchConfig extends AbstractElasticsearchConfiguration {
@Value("${elasticsearch.url}")
private String url;
@Value("${elasticsearch.port}")
private Integer port;
@Value("${elasticsearch.username}")
private String username;
@Value("${elasticsearch.password}")
private String password;
@Override
public RestHighLevelClient elasticsearchClient() {
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(username, password));
return new RestHighLevelClient(RestClient.builder(new HttpHost(url, port,"http"))
.setHttpClientConfigCallback(httpClientBuilder -> {
httpClientBuilder.disableAuthCaching();
return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
}));
}
}
- 配置entiy文件,网上说可以在实体类里配置ik的分词搜索类型,但我试了不行,所以所有字段的配置都要mapping里.
@Document(indexName = "guide_change")
@Mapping(mappingPath = "es/mapping.json")
public class GuideChangeExtral implements Serializable {
private Long id;
private String name;
// 需要分词、查询的字段需要加上这个注解
// 字符串类型(text:支持分词,全文检索,支持模糊、精确查询,不支持聚合,排序操作;text类型的最大支持的字符长度无限制,适合大字段存储;),
// 存储时的分词器、搜索时用的分词器(这里用的都是ik分词器,IK提供了两个分词算法: (ik_smart和ik_max_word ),其中ik_smart为最少切分,ik_max_word为最细粒度划分!)
// @Field(type = FieldType.Text, analyzer = "ik", searchAnalyzer = "ik")
private String content;
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
}
mapping.json文件
{
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
},
"content": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
}
}
}
接下来就可以直接调用接口了,
特别注意使用的xxxMapper需要继承ElasticsearchRepository,
以下代码只展示了Controller层的,主要是因为用了不少service层,全部展示文件太多了,就不全部展示了.
/**
* ES模块
* @author haowu
* @since 2022.08.15
*/
@EntityName("ES模块")
@RestController
@Valid
public class ESController implements ESApi {
@Resource
ESFacade esFacade;
@Resource
GuideChangeFacade guideChangeFacade;
@Resource
private ElasticsearchRestTemplate elasticsearchTemplate;
@Resource
private CommonMapper<EsWeightDict> esWeightDictMapper;
/**
* 创建索引
* @return
*/
@RequestMapping(value = "/es/create/index", method = RequestMethod.GET)
public ResultDto createEsIndex() {
boolean index = elasticsearchTemplate.indexOps(GuideChangeExtral.class).create();
System.out.println("创建索引结果是" + index);
return ResultFactory.getSuccessResult();
}
/**
* 删除索引
* @return
*/
@RequestMapping(value = "/es/delete/index", method = RequestMethod.GET)
public ResultDto deleteEsIndex() {
boolean deleteIndex = elasticsearchTemplate.indexOps(GuideChangeExtral.class).delete();
System.out.println("删除索引结果是" + deleteIndex);
return ResultFactory.getSuccessResult();
}
/**
* 是否存在索引
* @return
*/
@RequestMapping(value = "/es/exist/index", method = RequestMethod.GET)
public ResultDto<Boolean> existEsIndex() {
boolean existsIndex = elasticsearchTemplate.indexOps(GuideChangeExtral.class).exists();
System.out.println("是否存在的结果是" + existsIndex);
return ResultFactory.getSuccessResult(existsIndex);
}
/**
* 添加文档(注: 文档新建时如果没有索引也会自动新增索引)
* @param guideChangeExtral 待添加的内容
* @author: haowu
*/
@RequestMapping(value = "/es/save/document", method = RequestMethod.POST)
public ResultDto<Long> saveDoc(@RequestBody GuideChangeExtral guideChangeExtral) {
Long id = Long.valueOf(System.currentTimeMillis());
guideChangeExtral.setId(id);
esFacade.save(guideChangeExtral);
return ResultFactory.getSuccessResult(id);
}
/**
* 同步文档(注: 从mysql中的guide_change表同步)
* @author: haowu
*/
@RequestMapping(value = "/es/sync/document", method = RequestMethod.GET)
public ResultDto<Long> syncDoc() {
GuideChangeQueryDto queryDto = new GuideChangeQueryDto();
queryDto.setPageSize(Integer.MAX_VALUE);
List<GuideChangeESDto> changeList = guideChangeFacade.queryListToES(queryDto);
List<GuideChangeExtral> changes = new ArrayList<>();
if (ObjectUtil.isNotEmpty(changeList)){
changeList.stream().forEach(change->{
GuideChangeExtral guideChangeExtral = new GuideChangeExtral();
guideChangeExtral.setId(change.getId());
guideChangeExtral.setName(change.getChangeClassify());
guideChangeExtral.setContent(change.getChangeInstructions());
guideChangeExtral.setSituation(change.getChangeSituation());
guideChangeExtral.setType(change.getChangeType());
guideChangeExtral.setLevel(change.getChangeLevel());
guideChangeExtral.setSource(ChangeSourceEnum.DS.getCode());
changes.add(guideChangeExtral);
});
}
esFacade.saveAll(changes);
return ResultFactory.getSuccessResult();
}
/**
* 修改文档
* @param guideChangeExtral 待添加的内容
* @author: haowu
*/
@RequestMapping(value = "/es/upate/document", method = RequestMethod.POST)
public ResultDto updateDoc(@RequestBody GuideChangeExtral guideChangeExtral) {
esFacade.save(guideChangeExtral);
return ResultFactory.getSuccessResult();
}
/**
* 查询文档
* @param keyword
* @return
*/
@RequestMapping(value = "/es/query/document/by/name", method = RequestMethod.GET)
public ListResultDto<GuideChangeExtral> queryDocByName(String keyword) {
List<GuideChangeExtral> result = esFacade.getListByKeyword(keyword);
return ResultFactory.getListResult(result);
}
/**
* 查询所有文档
* @return
*/
@RequestMapping(value = "/es/query/document/all", method = RequestMethod.GET)
public ListResultDto<GuideChangeExtral> queryDocAll() {
Iterator<GuideChangeExtral> result = esFacade.getListAll().iterator();
List<GuideChangeExtral> list = new ArrayList<>();
if (ObjectUtil.isNotEmpty(result)){
while (result.hasNext()){
list.add(result.next());
}
}
return ResultFactory.getListResult(list);
}
/**
* 文档是否存在
* @param id
* @return
*/
@RequestMapping(value = "/es/exist/document", method = RequestMethod.GET)
public ResultDto<Boolean> existDoc(Long id) {
return ResultFactory.getSuccessResult( esFacade.existsById(id));
}
/**
* 根据id查询文档
* @param id
* @return
*/
@RequestMapping(value = "/es/query/document/by/id", method = RequestMethod.GET)
public ResultDto<GuideChangeExtral> queryDocById(Long id) {
Optional<GuideChangeExtral> optionalStu = esFacade.findById(id);
return optionalStu.isPresent()?ResultFactory.getSuccessResult(optionalStu.get()):ResultFactory.getSuccessResult();
}
/**
* 根据id删除文档
* @param id
* @return
*/
@RequestMapping(value = "/es/delete/document", method = RequestMethod.POST)
public ResultDto deleteDoc(Long id) {
esFacade.deleteById(id);
return ResultFactory.getSuccessResult();
}
/**
* 条件查询(在content中查找)
* @param keyword 查询关键字
* @author: haowu
*/
@RequestMapping(value = "/es/list/keyword", method = RequestMethod.GET)
public ListResultDto<List<GuideChangeExtral>> getListByKeyword(String keyword) {
return ResultFactory.getListResult(esFacade.getListByKeyword(keyword));
}
/**
* 复杂查询
* @param keyword 查询关键字
* @author: haowu
*/
@RequestMapping(value = "/es/list/keyword/extral", method = RequestMethod.GET)
public ListResultDto<GuideChangeSimpleDto> getListByKeywordExtral(@RequestParam String keyword) {
//需要查询的字段
DisMaxQueryBuilder disMaxQueryBuilder = QueryBuilders.disMaxQuery();
MultiMatchQueryBuilder contentAnalyzer = QueryBuilders.multiMatchQuery(keyword,"content", "name").analyzer("ik_smart");
disMaxQueryBuilder.add(contentAnalyzer);
//读取本地权重文件到数据库 已读取 所以注释
/* List<EsWeightDict> dictList = new ArrayList<>();
try(BufferedReader reader =new BufferedReader(new FileReader(this.getClass().getClassLoader().getResource("es/query-dict.txt").getPath()))) {
String words;
while ((words = reader.readLine())!=null){
if(StrUtil.isEmpty(words)){
continue;
}
String[] wordArr = words.split(",");
EsWeightDict weightDict = new EsWeightDict();
if(wordArr.length > 1){
weightDict.setWeightItemName(wordArr[0]);
weightDict.setWeightItemCode(Float.valueOf(wordArr[1]));
}else{
weightDict.setWeightItemName(wordArr[0]);
weightDict.setWeightItemCode(2f);
}
dictList.add(weightDict);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
esWeightDictMapper.batchAdd(dictList,true);*/
//关键字权重查询
List<EsWeightDict> esWeightDicts = esWeightDictMapper.queryAll(TasDataAuth.allAuth(), new EsWeightDictQueryDto(), EsWeightDict.class);
esWeightDicts.stream().forEach(dict->{
if (keyword.contains(dict.getWeightItemName())){
disMaxQueryBuilder.add(QueryBuilders.multiMatchQuery(dict.getWeightItemName(),"content","name").analyzer("ik_smart")
.field("name",dict.getWeightItemCode())
.field("content",dict.getWeightItemCode()));
}
});
//构建高亮查询
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(disMaxQueryBuilder)
//设置查询条件
.withHighlightFields(new HighlightBuilder.Field("content"))
.withHighlightFields(new HighlightBuilder.Field("name"))
.withHighlightBuilder(new HighlightBuilder().preTags("<span style='color:blue'>").postTags("</span>"))
//设置分页查询
.withPageable(PageRequest.of(0, 5))
.build();
SearchHits<GuideChangeExtral> search = elasticsearchTemplate.search(searchQuery, GuideChangeExtral.class);
//得到查询返回的内容
List<SearchHit<GuideChangeExtral>> searchHits = search.getSearchHits();
//设置一个最后需要返回的实体类集合
List<GuideChangeSimpleDto> changeExtrals = new ArrayList<>();
//遍历返回的内容进行处理
for (SearchHit<GuideChangeExtral> searchHit : searchHits) {
//高亮的内容
Map<String, List<String>> highlightFields = searchHit.getHighlightFields();
//将高亮的内容填充到content中
searchHit.getContent().setContent(highlightFields.get("content") == null ? searchHit.getContent().getContent() : highlightFields.get("content").get(0));
searchHit.getContent().setName(highlightFields.get("name") == null ? searchHit.getContent().getName() : highlightFields.get("name").get(0));
GuideChangeSimpleDto simpleDto = TasCommonUtils.convert(searchHit.getContent(), GuideChangeSimpleDto.class);
simpleDto.setScore(searchHit.getScore());
//放到实体类中
float scoreNormal = 1.5f;
if (scoreNormal<simpleDto.getScore()){
changeExtrals.add(simpleDto);
}
}
TasCommonUtils.translate(changeExtrals);
return ResultFactory.getListResult(changeExtrals);
}
}
词频统计
对一个文档进行词频统计.
TermVectorsRequest request = new TermVectorsRequest(index, indexType, id);
request.setFields("content");
request.setFieldStatistics(true);
request.setTermStatistics(true);
request.setPositions(true);
request.setOffsets(true);
request.setPayloads(false);
Map<String, Integer> filterSettings = new HashMap<>();
filterSettings.put("max_num_terms", 10);//词云数量
filterSettings.put("min_term_freq", 2);//在当前文档词的频率
filterSettings.put("max_term_freq", 100);
filterSettings.put("min_doc_freq", 1);//索引中有几个记录出现
filterSettings.put("max_doc_freq", 100);
filterSettings.put("min_word_length", 2);
filterSettings.put("max_word_length", 10);
request.setFilterSettings(filterSettings);
TermVectorsResponse response = elasticsearchTemplate.getClient().termvectors(request, RequestOptions.DEFAULT);
List<TermVectorsResponse.TermVector> termVectorList = response.getTermVectorsList();
for (TermVectorsResponse.TermVector termVector : termVectorList) {
String fieldName = termVector.getFieldName();
TermVectorsResponse.TermVector.FieldStatistics fieldStatistics = termVector.getFieldStatistics();
List<TermVectorsResponse.TermVector.Term> terms = termVector.getTerms();
for (TermVectorsResponse.TermVector.Term term : terms) {
//+ "--" + term.getTokens()
System.out.println("----term:" + term.getTerm() + " -DocFreq:" + term.getDocFreq() + " -TermFreq:" + term.getTermFreq());
//term.getTokens().forEach(s -> System.out.println("----" + s.));
}
}