spring-dada-elasticsearch3.0.0+elasticsearch5.5.0+ik+synonum构建实时的搜索引擎

最新推荐文章于 2022-05-24 16:15:46 发布

阳阳雨季

最新推荐文章于 2022-05-24 16:15:46 发布

阅读量880

点赞数

文章标签： elasticsearch5.5.0 spring-data-elastics

本文链接：https://blog.csdn.net/sunliyang1992/article/details/78615403

版权

最近由于人事变动，接到elasticsearch5.5.0有关的项目，项目很简单，就是做一个近义词词库和数据的检索，然而在自己搭建的时候出现了诸多的问题，特此记录一下。而且，关于网上用elasticsearch的大多多是老资料了，有些问题没办法解决，也把自己遇到的一些问题和解决方式记录一下。

首先是关于elasticsearch5.5.0的安装，可以直接到官网下载，我用的是5.5.0的elasticsearch的官网地址，这个地方的版本是最新的下载地址，你下载5.5.0的话，你右击其中的一种格式，我下载的是tar,你可以修改版本号，下载对应的版本。

下载完成以后，就可以解压启动了(图解貌似有点麻烦，暂不做了，以后有时间补上)。默认情况下，需要修改一下配置(我的是单节点，集群的话，你可以找相关的资料，有时间我也会写一个集群的博客)：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: mya
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /Users/haomaiche/Downloads/elasticsearch-5.5.0/data
#
# Path to log files:
#
path.logs: /Users/haomaiche/Downloads/elasticsearch-5.5.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

无非就是配置一下节点名称，索引文件和日志的路径。(小提示，mapping和setting在5.*版本后不在配置文件里面配置了，通过rest风格进行配置)。

接下来就是下载插件

1)分词器:国内分词器做的好的就是ik分词器了，下载地址为：ik分词器下载地址，你只需要下载对应的版本就可以，解压后，吧里面的内容copy到elasticsearch的plugin下面的ik文件夹下(ik文件夹是需要创建的)。搞定重启，看是否生效需要访问地址为：访问地址。

2)近义词:关于近义词的插件我这边提供两个，一个基于文件方式，一个基于数据库的方式。

2.1)文件方式:文件的方式包含两个途径，一个远程，一个本地，插件的名称是elasticsearch-analysis-dynamic-synonym,然为这个插件的支持es只到5.2，所以你下载了源码在打包的时候修改一下pom.xml文件es的版本就可以了，下载地址

解压后放到插件文件中的一个文件夹下就可以了。

2.2)基于数据库的形式管理近义词，插件的名称是elasticsearch-dynamic-synonym，下载的地址：下载地址

打包的时候，也需要修改一下es的版本。

以上是es服务器方面的配置，如果你不依赖什么代码的话，只是对现有数据做搜索，接下来的内容你就不必看了。接下来我从代码层做一些笔记。(提示：已经有es和mysql的集成插件做分词了，这个和数据库是分析数据的记录的意思，不是上面的近义词的维护，插件的名称是：elasticsearch-jdbc ,一些用法可以百度查询) 。

es+spring-data-elasticsearch+spring结合:

首先es是版本的5.5.0,我们的spring-data-elasticsearch应该用3.0.0的版本才能兼容，spring的版本要是5.5.0的版本，在这个过程中，我的spring从4.＊升级到5.0的时候有一个小问题就是我的junit不能用了，原来本版本需要升级到4.12才行。

接下来就是代码(近义词用的是基于文件的形式)：

1.实体类:

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import org.springframework.data.elasticsearch.annotations.Mapping;
import org.springframework.data.elasticsearch.annotations.Setting;

@Document(indexName= "symentry", type = "city-type")
@Setting(settingPath = "/config/hmc-setting.json")
@Mapping(mappingPath = "/config/symentry-mapping-json")
public class Symentry {
	
	@Id
	private String id;
	
	public String getId() {
		return id;
	}

	public void setId(String id) {
		this.id = id;
	}

	@Field(type = FieldType.text, index = true, analyzer="ik_smart", searchAnalyzer="ik_smart", store = true)
	private String text;

	public String getText() {
		return text;
	}

	public void setText(String text) {
		this.text = text;
	}
	
}

在实体类中需要配置setting和mapping，以及分词器和是否创建索引等。

hmc-setting.json:(这个地方需要注意飙红的是你设置的文件路径)

{
	"index" : {
	    "analysis" : {
	        "analyzer" : {
	            "synonym" : {
	                "tokenizer" : "ik_smart",
	                "filter" : ["remote_synonym","local_synonym"]
 	           }
	        },
	        "filter" : {
	            "remote_synonym" : {
	                "type" : "dynamic_synonym",
	                "synonyms_path" : "",
	                "interval": 30
	            },
	            "local_synonym" : {
	                "type" : "dynamic_synonym",
	                "synonyms_path" : "synonym.txt"
	            }
	        }
	    }
	}
}

symentry-mapping-json(这个地方需要注意飙红的是你设置的type名称)

{
  "city-type": {
    "_all": {
      "enabled": true
    },
    "properties": {
      "text":{
        "type":"text",
        "analyzer":"synonym",
        "search_analyzer":"synonym"
      }
    }
  }

创建实体类的索引类：(这个类需要在配置文件配置是spring注入的)

import org.springframework.data.elasticsearch.repository.ElasticsearchCrudRepository;

import com.sly.facade.entity.Symentry;

public interface SymentryRepository   extends ElasticsearchCrudRepository<Symentry,String>{

}

spring和es的配置文件：spring-elasticsearch.xml，elasticsearch-infrastructure.xml(飙红是你需要注入的类的
包名称，为什么需要注入一个类，下面会介绍)。

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:elasticsearch="http://www.springframework.org/schema/data/elasticsearch"
       xsi:schemaLocation="http://www.springframework.org/schema/data/elasticsearch 
http://www.springframework.org/schema/data/elasticsearch/spring-elasticsearch-1.0.xsd
		http://www.springframework.org/schema/beans 
http://www.springframework.org/schema/beans/spring-beans-3.1.xsd">

    <import resource="elasticsearch-infrastructure.xml"/>

    <bean name="elasticsearchTemplate"
          class="org.springframework.data.elasticsearch.core.ElasticsearchTemplate">
        <constructor-arg name="client" ref="client"/>
    </bean>
    
    <elasticsearch:repositories base-package="com.*.repositories"/>

</beans>
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:elasticsearch="http://www.springframework.org/schema/data/elasticsearch"
       xsi:schemaLocation="http://www.springframework.org/schema/data/elasticsearch 
http://www.springframework.org/schema/data/elasticsearch/spring-elasticsearch.xsd
		http://www.springframework.org/schema/beans 
http://www.springframework.org/schema/beans/spring-beans.xsd">


    <!-- ip4 -->
    <elasticsearch:transport-client id="client" 
cluster-name="${elasticsearch.cluster.name}" cluster-nodes="${elasticsearch.esNodes}" />

    <!-- ip6 -->
    <!--<elasticsearch:transport-client id="client" cluster-name="elasticsearch" cluster-nodes="[::1]:9300" />-->

</beans>

配置基于完成了，接下来是测试的

import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
import java.util.List;
import org.junit.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.data.elasticsearch.core.query.SearchQuery;
import com.sly.facade.entity.City;
import com.sly.facade.entity.Symentry;
import com.sly.facade.repositories.CityRepository;
import com.sly.facade.repositories.SymentryRepository;

/**
 * 
 * <b>Description：</b> 直销测试 <br/>
 * <b>ClassName：</b> sysCitytest <br/>
 * <b>@author：</b> sly <br/>
 * <b>@date：</b> 2016年7月19日 上午11:38:20 <br/>
 * <b>@version: </b>  <br/>
 */
public class EleTest {

	@Autowired
	ElasticsearchTemplate elasticsearchTemplate;
	@Autowired
	SymentryRepository symentryRepository;

	@Autowired
	CityRepository cityRepository;
	@Test
	public void addS() throws Exception{
		
		elasticsearchTemplate.deleteIndex(Symentry.class);
		elasticsearchTemplate.createIndex(Symentry.class);
		elasticsearchTemplate.putMapping(Symentry.class);
		elasticsearchTemplate.refresh(Symentry.class);
		Symentry city = new Symentry();
		city.setId("1345l");
		city.setText("别克 中国");
		symentryRepository.save(city);
		
	}
	
	@Test
	public void addC() throws Exception{
		elasticsearchTemplate.deleteIndex(Symentry.class);
		elasticsearchTemplate.createIndex(Symentry.class);
		elasticsearchTemplate.putMapping(Symentry.class);
		elasticsearchTemplate.refresh(Symentry.class);
		City c = new City();
		c.setId("qwerf");
		c.setName("你好");
		c.setDescription("我热爱我的祖国，别克都不要买");
		
		cityRepository.save(c);
	}
	
	@Test
	public void searC() throws Exception{
		
		String d = "最近有关别克车的销量不错";
		SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("provincea")
        	    .withTypes("city").withQuery(matchQuery("description", d)).build();  
		
		List<City> sellingCarInfos = elasticsearchTemplate.queryForList(searchQuery, City.class);
		for (City symentry : sellingCarInfos) {
			System.err.println(symentry);
		}
	}
	
	@Test
	public void search() throws Exception{

		String d = "bieke，我是谁";
		SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("symentry")
        	    .withTypes("city-type").withQuery(matchQuery("text", d)).build();  
		
		List<Symentry> sellingCarInfos = elasticsearchTemplate.queryForList(searchQuery, Symentry.class);
		for (Symentry symentry : sellingCarInfos) {
			System.err.println(symentry);
		}
	}
}

执行玩创建以后，你需要访问一下查看setting和mapping是否生效，地址：http://localhost:9200/symentry,生效内容如下：

{
symentry: {
aliases: { },
mappings: {
city-type: {
properties: {
id: {
type: "text",
fields: {
keyword: {
type: "keyword",
ignore_above: 256
}
}
},
text: {
type: "text",
store: true,
analyzer: "ik_smart"
}
}
}
},
settings: {
index: {
number_of_shards: "5",
provided_name: "symentry",
creation_date: "1511429248956",
analysis: {
filter: {
remote_synonym: {
type: "dynamic_synonym",
synonyms_path: "",
interval: "30"
},
local_synonym: {
type: "dynamic_synonym",
synonyms_path: "synonym.txt"
}
},
analyzer: {
synonym: {
filter: [
"remote_synonym",
"local_synonym"
],
tokenizer: "ik_smart"
}
}
},
number_of_replicas: "1",
uuid: "cU8K12CZRgWnkxTXsvXnGw",
version: {
created: "5050099"
}
}
}
}
}

比如你创建的近义词中是：别克,bieke,你查询的时候，查询bieke也是能查到的，自此近义词搜索就搞定了。
最后说几点小的知识点，
第一个：关于配置启示我门可以用curl命令或者postman例如：
curl -XPUT "http://127.0.0.1:9200/provincea" -d '{
"settings":{
"index":{
"analysis":{
"analyzer":{
"synonym":{
"tokenizer":"ik_smart",
"filter":["remote_synonym","local_synonym"]
}
},
"filter":{
"remote_synonym":{
"type":"dynamic_synonym",
"synonyms_path":"",
"interval":30},
"local_synonym":{
"type":"dynamic_synonym",
“bd_”
"synonyms_path":"synonym.txt"
}
}
}
}
}
}'

第二：关于spring对es的操作有两种方式，第一种是继承ElasticsearchCrudRepository这个类，可以做简单的增删改查，
如果要做复杂的比如分页，高亮等就需要elasticsearchTemplate模版类
可能还有一些问题我没记录到，如果有遇到的可以留言大家互相讨论。

阳阳雨季

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
spring-dada-elasticsearch3.0.0+elasticsearch5.5.0+ik+synonum构建实时的搜索引擎

spring-dada-elasticsearch3.0.0+elasticsearch5.5.0+ik+synonum构建实时的搜索引擎
复制链接

扫一扫