spring-dada-elasticsearch3.0.0+elasticsearch5.5.0+ik+synonum构建实时的搜索引擎

最近由于人事变动,接到elasticsearch5.5.0有关的项目,项目很简单,就是做一个近义词词库和数据的检索,然而在自己搭建的时候出现了诸多的问题,特此记录一下。而且,关于网上用elasticsearch的大多多是老资料了,有些问题没办法解决,也把自己遇到的一些问题和解决方式记录一下。

首先是关于elasticsearch5.5.0的安装,可以直接到官网下载,我用的是5.5.0的elasticsearch的官网地址,这个地方的版本是最新的下载地址,你下载5.5.0的话,你右击其中的一种格式,我下载的是tar,你可以修改版本号,下载对应的版本

下载完成以后,就可以解压启动了(图解貌似有点麻烦,暂不做了,以后有时间补上)。默认情况下,需要修改一下配置(我的是单节点,集群的话,你可以找相关的资料,有时间我也会写一个集群的博客):

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: mya
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /Users/haomaiche/Downloads/elasticsearch-5.5.0/data
#
# Path to log files:
#
path.logs: /Users/haomaiche/Downloads/elasticsearch-5.5.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

无非就是配置一下节点名称,索引文件和日志的路径。(小提示,mapping和setting在5.*版本后不在配置文件里面配置了,通过rest风格进行配置)。

接下来就是下载插件

1)分词器:国内分词器做的好的就是ik分词器了,下载地址为:ik分词器下载地址,你只需要下载对应的版本就可以,解压后,吧里面的内容copy到elasticsearch的plugin下面的ik文件夹下(ik文件夹是需要创建的)。搞定重启,看是否生效需要访问地址为:访问地址

2)近义词:关于近义词的插件我这边提供两个,一个基于文件方式,一个基于数据库的方式。

2.1)文件方式:文件的方式包含两个途径,一个远程,一个本地,插件的名称是elasticsearch-analysis-dynamic-synonym,然为这个插件的支持es只到5.2,所以你下载了源码在打包的时候修改一下pom.xml文件es的版本就可以了,下载地址

解压后放到插件文件中的一个文件夹下就可以了。

2.2)基于数据库的形式管理近义词,插件的名称是elasticsearch-dynamic-synonym,下载的地址:下载地址

打包的时候,也需要修改一下es的版本。

以上是es服务器方面的配置,如果你不依赖什么代码的话,只是对现有数据做搜索,接下来的内容你就不必看了。接下来我从代码层做一些笔记。(提示:已经有es和mysql的集成插件做分词了,这个和数据库是分析数据的记录的意思,不是上面的近义词的维护,插件的名称是:elasticsearch-jdbc ,一些用法可以百度查询)  。

es+spring-data-elasticsearch+spring结合:

首先es是版本的5.5.0,我们的spring-data-elasticsearch应该用3.0.0的版本才能兼容,spring的版本要是5.5.0的版本,在这个过程中,我的spring从4.*升级到5.0的时候有一个小问题就是我的junit不能用了,原来本版本需要升级到4.12才行。

接下来就是代码(近义词用的是基于文件的形式):

1.实体类:

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import org.springframework.data.elasticsearch.annotations.Mapping;
import org.springframework.data.elasticsearch.annotations.Setting;

@Document(indexName= "symentry", type = "city-type")
@Setting(settingPath = "/config/hmc-setting.json")
@Mapping(mappingPath = "/config/symentry-mapping-json")
public class Symentry {
	
	@Id
	private String id;
	
	public String getId() {
		return id;
	}

	public void setId(String id) {
		this.id = id;
	}

	@Field(type = FieldType.text, index = true, analyzer="ik_smart", searchAnalyzer="ik_smart", store = true)
	private String text;

	public String getText() {
		return text;
	}

	public void setText(String text) {
		this.text = text;
	}
	
}

在实体类中需要配置setting和mapping,以及分词器和是否创建索引等。
hmc-setting.json:(这个地方需要注意飙红的是你设置的文件路径)


{
	"index" : {
	    "analysis" : {
	        "analyzer" : {
	            "synonym" : {
	                "tokenizer" : "ik_smart",
	                "filter" : ["remote_synonym","local_synonym"]
 	           }
	        },
	        "filter" : {
	            "remote_synonym" : {
	                "type" : "dynamic_synonym",
	                "synonyms_path" : "",
	                "interval": 30
	            },
	            "local_synonym" : {
	                "type" : "dynamic_synonym",
	                "synonyms_path" : "synonym.txt"
	            }
	        }
	    }
	}
}

symentry-mapping-json(这个地方需要注意飙红的是你设置的type名称)
{
  "city-type": {
    "_all": {
      "enabled": true
    },
    "properties": {
      "text":{
        "type":"text",
        "analyzer":"synonym",
        "search_analyzer":"synonym"
      }
    }
  }

创建实体类的索引类:(这个类需要在配置文件配置是spring注入的)

import org.springframework.data.elasticsearch.repository.ElasticsearchCrudRepository;

import com.sly.facade.entity.Symentry;

public interface SymentryRepository   extends ElasticsearchCrudRepository<Symentry,String>{

}


spring和es的配置文件:spring-elasticsearch.xml,elasticsearch-infrastructure.xml(飙红是你需要注入的类的
包名称,为什么需要注入一个类,下面会介绍)。

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:elasticsearch="http://www.springframework.org/schema/data/elasticsearch"
       xsi:schemaLocation="http://www.springframework.org/schema/data/elasticsearch 
http://www.springframework.org/schema/data/elasticsearch/spring-elasticsearch-1.0.xsd
		http://www.springframework.org/schema/beans 
http://www.springframework.org/schema/beans/spring-beans-3.1.xsd">

    <import resource="elasticsearch-infrastructure.xml"/>

    <bean name="elasticsearchTemplate"
          class="org.springframework.data.elasticsearch.core.ElasticsearchTemplate">
        <constructor-arg name="client" ref="client"/>
    </bean>
    
    <elasticsearch:repositories base-package="com.*.repositories"/>

</beans>
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:elasticsearch="http://www.springframework.org/schema/data/elasticsearch"
       xsi:schemaLocation="http://www.springframework.org/schema/data/elasticsearch 
http://www.springframework.org/schema/data/elasticsearch/spring-elasticsearch.xsd
		http://www.springframework.org/schema/beans 
http://www.springframework.org/schema/beans/spring-beans.xsd">


    <!-- ip4 -->
    <elasticsearch:transport-client id="client" 
cluster-name="${elasticsearch.cluster.name}" cluster-nodes="${elasticsearch.esNodes}" />

    <!-- ip6 -->
    <!--<elasticsearch:transport-client id="client" cluster-name="elasticsearch" cluster-nodes="[::1]:9300" />-->

</beans>


配置基于完成了,接下来是测试的
import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
import java.util.List;
import org.junit.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.data.elasticsearch.core.query.SearchQuery;
import com.sly.facade.entity.City;
import com.sly.facade.entity.Symentry;
import com.sly.facade.repositories.CityRepository;
import com.sly.facade.repositories.SymentryRepository;

/**
 * 
 * <b>Description:</b> 直销测试 <br/>
 * <b>ClassName:</b> sysCitytest <br/>
 * <b>@author:</b> sly <br/>
 * <b>@date:</b> 2016年7月19日 上午11:38:20 <br/>
 * <b>@version: </b>  <br/>
 */
public class EleTest {

	@Autowired
	ElasticsearchTemplate elasticsearchTemplate;
	@Autowired
	SymentryRepository symentryRepository;

	@Autowired
	CityRepository cityRepository;
	@Test
	public void addS() throws Exception{
		
		elasticsearchTemplate.deleteIndex(Symentry.class);
		elasticsearchTemplate.createIndex(Symentry.class);
		elasticsearchTemplate.putMapping(Symentry.class);
		elasticsearchTemplate.refresh(Symentry.class);
		Symentry city = new Symentry();
		city.setId("1345l");
		city.setText("别克 中国");
		symentryRepository.save(city);
		
	}
	
	@Test
	public void addC() throws Exception{
		elasticsearchTemplate.deleteIndex(Symentry.class);
		elasticsearchTemplate.createIndex(Symentry.class);
		elasticsearchTemplate.putMapping(Symentry.class);
		elasticsearchTemplate.refresh(Symentry.class);
		City c = new City();
		c.setId("qwerf");
		c.setName("你好");
		c.setDescription("我热爱我的祖国,别克都不要买");
		
		cityRepository.save(c);
	}
	
	@Test
	public void searC() throws Exception{
		
		String d = "最近有关别克车的销量不错";
		SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("provincea")
        	    .withTypes("city").withQuery(matchQuery("description", d)).build();  
		
		List<City> sellingCarInfos = elasticsearchTemplate.queryForList(searchQuery, City.class);
		for (City symentry : sellingCarInfos) {
			System.err.println(symentry);
		}
	}
	
	@Test
	public void search() throws Exception{

		String d = "bieke,我是谁";
		SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("symentry")
        	    .withTypes("city-type").withQuery(matchQuery("text", d)).build();  
		
		List<Symentry> sellingCarInfos = elasticsearchTemplate.queryForList(searchQuery, Symentry.class);
		for (Symentry symentry : sellingCarInfos) {
			System.err.println(symentry);
		}
	}
}


执行玩创建以后,你需要访问一下查看setting和mapping是否生效,地址:http://localhost:9200/symentry,生效内容如下:

{
symentry: {
aliases: { },
mappings: {
city-type: {
properties: {
id: {
type: "text",
fields: {
keyword: {
type: "keyword",
ignore_above: 256
}
}
},
text: {
type: "text",
store: true,
analyzer: "ik_smart"
}
}
}
},
settings: {
index: {
number_of_shards: "5",
provided_name: "symentry",
creation_date: "1511429248956",
analysis: {
filter: {
remote_synonym: {
type: "dynamic_synonym",
synonyms_path: "",
interval: "30"
},
local_synonym: {
type: "dynamic_synonym",
synonyms_path: "synonym.txt"
}
},
analyzer: {
synonym: {
filter: [
"remote_synonym",
"local_synonym"
],
tokenizer: "ik_smart"
}
}
},
number_of_replicas: "1",
uuid: "cU8K12CZRgWnkxTXsvXnGw",
version: {
created: "5050099"
}
}
}
}
}


比如你创建的近义词中是:别克,bieke,你查询的时候,查询bieke也是能查到的,自此近义词搜索就搞定了。
最后说几点小的知识点,
第一个:关于配置启示我门可以用curl命令或者postman例如:
curl -XPUT "http://127.0.0.1:9200/provincea" -d '{
"settings":{
"index":{
"analysis":{
"analyzer":{
"synonym":{
"tokenizer":"ik_smart",
"filter":["remote_synonym","local_synonym"]
}
},
"filter":{
"remote_synonym":{
"type":"dynamic_synonym",
"synonyms_path":"",
"interval":30},
"local_synonym":{
"type":"dynamic_synonym",
“bd_”
"synonyms_path":"synonym.txt"
}
}
}
}
}
}'


第二:关于spring对es的操作有两种方式,第一种是继承ElasticsearchCrudRepository这个类,可以做简单的增删改查,
如果要做复杂的比如分页,高亮等就需要elasticsearchTemplate模版类
可能还有一些问题我没记录到,如果有遇到的可以留言大家互相讨论。





  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值