es java 搜索引擎_Elasticsearch-Java中文搜索器（下）

最新推荐文章于 2023-03-06 16:39:43 发布

钢盅郭子

最新推荐文章于 2023-03-06 16:39:43 发布

阅读量567

点赞数

文章标签： es java 搜索引擎

本文链接：https://blog.csdn.net/weixin_31006689/article/details/114168735

版权

ElasticSearch Java api 详解_V1.0

集群的连接

作为Elasticsearch节点

实例化一个节点的客户端是获得客户端的最简单的方式。这个Client可以执行elasticsearch相关的操作。

import static org.elasticsearch.node.NodeBuilder.*;//on startup

Node node =nodeBuilder().node();

Client client=node.client();//on shutdown

node.close();

当你启动一个node,它就加入了elasticsearch集群。你可以通过简单的设置cluster.name或者明确地使用clusterName方法拥有不同的集群。

你能够在你项目的/src/main/resources/elasticsearch.yml文件中定义cluster.name。只要elasticsearch.yml在classpath目录下面，你就能够用到它来启动你的节点。

cluster.name: yourclustername

或者通过java：

Node node = nodeBuilder().clusterName("yourclustername").node();

Client client= node.client();

利用Client的好处是，操作可以自动地路由到这些操作被执行的节点，而不需要执行双跳(double hop)。例如，索引操作将会在该操作最终存在的分片上执行。

当你启动了一个节点，最重要的决定是是否它将保有数据。大多数情况下，我们仅仅需要用到clients，而不需要分片分配给它们。这可以通过设置node.data为false或者设置 node.client为true来简单实现。

import static org.elasticsearch.node.NodeBuilder.*;//on startup

Node node = nodeBuilder().client(true).node();

Client client=node.client();//on shutdown

node.close();

传输(transport)客户端

TransportClient利用transport模块远程连接一个elasticsearch集群。它并不加入到集群中，只是简单的获得一个或者多个初始化的transport地址，并以轮询的方式与这些地址进行通信。

//on startup

Client client = newTransportClient()

.addTransportAddress(new InetSocketTransportAddress("host1", 9300))

.addTransportAddress(new InetSocketTransportAddress("host2", 9300));//on shutdown

client.close();

注意，如果你有一个与elasticsearch集群不同的集群，你可以设置机器的名字。

Settings settings =ImmutableSettings.settingsBuilder()

.put("cluster.name", "myClusterName").build();

Client client= newTransportClient(settings);//Add transport addresses and do something with the client...

你也可以用elasticsearch.yml文件来设置。

这个客户端可以嗅到集群的其它部分，并将它们加入到机器列表。为了开启该功能，设置client.transport.sniff为true。

Settings settings =ImmutableSettings.settingsBuilder()

.put("client.transport.sniff", true).build();

TransportClient client= new TransportClient(settings);

其它的transport客户端设置有如下几个：

ParameterDescription

client.transport.ignore_cluster_name

true：忽略连接节点的集群名验证

client.transport.ping_timeout

ping一个节点的响应时间，默认是5s

client.transport.nodes_sampler_interval

sample/ping 节点的时间间隔，默认是5s

Java 索引API

索引API允许开发者索引类型化的JSON文档到一个特定的索引，使其可以被搜索。

生成JSON文档

有几种不同的方式生成JSON文档

利用byte[]或者作为一个String手动生成

利用一个Map将其自动转换为相应的JSON

利用第三方库如Jackson去序列化你的bean

利用内置的帮助函数XContentFactory.jsonBuilder()

手动生成

需要注意的是，要通过Date Format编码日期。

String json = "{" +

"\"user\":\"kimchy\"," +

"\"postDate\":\"2013-01-30\"," +

"\"message\":\"trying out Elasticsearch\"" +

"}";

使用map

Map json = new HashMap();

json.put("user","kimchy");

json.put("postDate",newDate());

json.put("message","trying out Elasticsearch");

序列化bean

elasticsearch早就用到了Jackson，把它放在了org.elasticsearch.common.jackson下面。你可以在你的pom.xml文件里面添加你自己的Jackson版本。

com.fasterxml.jackson.core

jackson-databind

2.1.3

这样，你就可以序列化你的bean为JSON。

import com.fasterxml.jackson.databind.*;//instance a json mapper

ObjectMapper mapper = new ObjectMapper(); //create once, reuse//generate json

String json = mapper.writeValueAsString(yourbeaninstance);

利用elasticsearch帮助类

elasticsearch提供了内置的帮助类来将数据转换为JSON

import static org.elasticsearch.common.xcontent.XContentFactory.*;

XContentBuilder builder=jsonBuilder()

.startObject()

.field("user", "kimchy")

.field("postDate", newDate())

.field("message", "trying out Elasticsearch")

.endObject()

注意，你也可以使用startArray(String)和endArray()方法添加数组。另外，field可以接收任何类型的对象，你可以直接传递数字、时间甚至XContentBuilder对象。

可以用下面的方法查看json。

String json = builder.string();

索引文档

下面的例子将JSON文档索引为一个名字为“twitter”，类型为“tweet”，id值为1的索引。

import static org.elasticsearch.common.xcontent.XContentFactory.*;

IndexResponse response= client.prepareIndex("twitter", "tweet", "1")

.setSource(jsonBuilder()

.startObject()

.field("user", "kimchy")

.field("postDate", newDate())

.field("message", "trying out Elasticsearch")

.endObject()

)

.execute()

.actionGet();

你也可以不提供id:

String json = "{" +

"\"user\":\"kimchy\"," +

"\"postDate\":\"2013-01-30\"," +

"\"message\":\"trying out Elasticsearch\"" +

"}";

IndexResponse response= client.prepareIndex("twitter", "tweet")

.setSource(json)

.execute()

.actionGet();

IndexResponse将会提供给你索引信息

//Index name

String _index =response.getIndex();//Type name

String _type =response.getType();//Document ID (generated or not)

String _id =response.getId();//Version (if it's the first time you index this document, you will get: 1)

long _version = response.getVersion();

如果你在索引时提供了过滤，那么IndexResponse将会提供一个过滤器(percolator )

IndexResponse response = client.prepareIndex("twitter", "tweet", "1")

.setSource(json)

.execute()

.actionGet();

List matches = response.matches();

Java 获取API

获取API允许你通过id从索引中获取类型化的JSON文档，如下例：

GetResponse response = client.prepareGet("twitter", "tweet", "1")

.execute()

.actionGet();

操作线程

The get API allows to set the threading model the operation will be performed when the actual execution of the API is performed on the same node (the API is executed on a shard that is allocated on the same server).

默认情况下，operationThreaded设置为true表示操作执行在不同的线程上面。下面是一个设置为false的例子。

GetResponse response = client.prepareGet("twitter", "tweet", "1")

.setOperationThreaded(false)

.execute()

.actionGet();

删除API

删除api允许你通过id，从特定的索引中删除类型化的JSON文档。如下例：

DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")

.execute()

.actionGet();

操作线程

默认情况下，operationThreaded设置为true表示操作执行在不同的线程上面。下面是一个设置为false的例子。

DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")

.setOperationThreaded(false)

.execute()

.actionGet();

更新API

你能够创建一个UpdateRequest,然后将其发送给client。

UpdateRequest updateRequest = newUpdateRequest();

updateRequest.index("index");

updateRequest.type("type");

updateRequest.id("1");

updateRequest.doc(jsonBuilder()

.startObject()

.field("gender", "male")

.endObject());

client.update(updateRequest).get();

或者你也可以利用prepareUpdate方法

client.prepareUpdate("ttl", "doc", "1")

.setScript("ctx._source.gender = \"male\"", ScriptService.ScriptType.INLINE)

.get();

client.prepareUpdate("ttl", "doc", "1")

.setDoc(jsonBuilder()

.startObject()

.field("gender", "male")

.endObject())

.get();

1-3行用脚本来更新索引，5-10行用doc来更新索引。

当然，java API也支持使用upsert。如果文档还不存在，会根据upsert内容创建一个新的索引。

IndexRequest indexRequest = new IndexRequest("index", "type", "1")

.source(jsonBuilder()

.startObject()

.field("name", "Joe Smith")

.field("gender", "male")

.endObject());

UpdateRequest updateRequest= new UpdateRequest("index", "type", "1")

.doc(jsonBuilder()

.startObject()

.field("gender", "male")

.endObject())

.upsert(indexRequest);

client.update(updateRequest).get();

如果文档index/type/1已经存在，那么在更新操作完成之后，文档为：

{"name" : "Joe Dalton","gender": "male"}

否则，文档为：

{"name" : "Joe Smith","gender": "male"}

bulk API

bulk API允许开发者在一个请求中索引和删除多个文档。下面是使用实例。

import static org.elasticsearch.common.xcontent.XContentFactory.*;

BulkRequestBuilder bulkRequest=client.prepareBulk();//either use client#prepare, or use Requests# to directly build index/delete requests

bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")

.setSource(jsonBuilder()

.startObject()

.field("user", "kimchy")

.field("postDate", newDate())

.field("message", "trying out Elasticsearch")

.endObject()

)

);

bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")

.setSource(jsonBuilder()

.startObject()

.field("user", "kimchy")

.field("postDate", newDate())

.field("message", "another post")

.endObject()

)

);

BulkResponse bulkResponse=bulkRequest.execute().actionGet();if(bulkResponse.hasFailures()) {//process failures by iterating through each bulk response item

}

搜索API

搜索API允许开发者执行一个搜索查询，返回满足查询条件的搜索信息。它能够跨索引以及跨类型执行。查询既可以用Java查询API也可以用Java过滤API。查询的请求体由SearchSourceBuilder构建。

importorg.elasticsearch.action.search.SearchResponse;importorg.elasticsearch.action.search.SearchType;import org.elasticsearch.index.query.FilterBuilders.*;import org.elasticsearch.index.query.QueryBuilders.*;

SearchResponse response= client.prepareSearch("index1", "index2")

.setTypes("type1", "type2")

.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)

.setQuery(QueryBuilders.termQuery("multi", "test")) //Query

.setPostFilter(FilterBuilders.rangeFilter("age").from(12).to(18)) //Filter

.setFrom(0).setSize(60).setExplain(true)

.execute()

.actionGet();

注意，所有的参数都是可选的。下面是最简洁的形式。

//MatchAll on the whole cluster with all default options

SearchResponse response = client.prepareSearch().execute().actionGet();

搜索模式(Java Class)

SearchRequestBuilder reqBuilder =client.prepareSearch(App.ESProp.INDEX_NAME)

.setTypes("task_info").setSearchType(SearchType.DEFAULT)

.setExplain(true);

QueryStringQueryBuilder queryString=QueryBuilders

.queryString("中华");

queryString.field("taskContent");

queryString.minimumShouldMatch("1");

reqBuilder.setQuery(QueryBuilders.boolQuery().should(queryString))

.setExplain(true);

SearchResponse resp=reqBuilder.execute().actionGet();

SearchHit[] hits=resp.getHits().getHits();

List> results = new ArrayList>();for(SearchHit hit : hits) {

results.add(hit.getSource());

}

System.out.println("result ---->>>>");for (int i = 0; i < results.size(); i++) {

System.out.println(results.get(i));

}

上面的实例中，包含了一个简单的查询，在此有几点个人的理解，请看下面；

基本查询器

SearchResponse response = client.prepareSearch().execute().actionGet();// 获取全部

SearchRequestBuilder searchRequestBuilder = client.prepareSearch("index1", "index2"); 在索引为index1 和index2中进行文档查询

searchRequestBuilder.setTypes("type1", "type2"); // es 的搜索 Search 不但联合多个库(index1、index2)，而是可以是跨类型的(即跨表的 type1、type2)

searchRequestBuilder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH); //设置查询类型

searchRequestBuilder.setFrom(0).setSize(10); //设置分页信息

searchRequestBuilder.addSort("crawlDate", SortOrder.DESC); // 按照时间降序

searchRequestBuilder.setExplain(true); // 设置是否按查询匹配度排序

searchRequestBuilder.setSearchType，设置搜索类型，主要的搜索类型有：

QUERY_THEN_FETCH:查询是针对所有的块执行的，但返回的是足够的信息，而不是文档内容(Document)。结果会被排序和分级，基于此，只有相关的块的文档对象会被返回。由于被取到的仅仅是这些，故而返回的 hit 的大小正好等于指定的 size。这对于有许多块的 index 来说是很便利的(返回结果不会有重复的，因为块被分组了)

QUERY_AND_FETCH:最原始(也可能是最快的)实现就是简单的在所有相关的 shard上执行检索并返回结果。每个 shard 返回一定尺寸的结果。由于每个shard已经返回了一定尺寸的hit，这种类型实际上是返回多个 shard的一定尺寸的结果给调用者。

DFS_QUERY_THEN_FETCH：与 QUERY_THEN_FETCH 相同，预期一个初始的散射相伴用来为更准确的 score 计算分配了的term频率。

DFS_QUERY_AND_FETCH:与 QUERY_AND_FETCH 相同，预期一个初始的散射相伴用来为更准确的 score 计算分配了的term频率。

SCAN：在执行了没有进行任何排序的检索时执行浏览。此时将会自动的开始滚动结果集。

COUNT：只计算结果的数量，也会执行 facet。

Match Query (链接内有详细解释)

QueryBuilder qb = QueryBuilders.matchQuery("name", "kimchy elasticsearch");//name是field,kimchy elasticsearch是要查询的字符串

MultiMatch Query (链接内有详细解释)

QueryBuilder qb =QueryBuilders.multiMatchQuery("kimchy elasticsearch", //Text you are looking for//kimchy elasticsearch是要查询的字符串

"user", "message" //Fields you query on//user 和 message都是field

);

构建文本查询器

QueryStringQueryBuilder queryString = QueryBuilders.queryString("\"" + content + "\""); 构建文本查询器

queryString.field(k); 设置匹配字段值

termQuery

强制匹配原则，禁止进行分词搜索

Should

should查询中会默认将查询分成多个termQuery查询，他的精准值采用minimumShouldMatch参数进行设置。

Spring ES 操作简介

连接ES客户端

@BeanpublicElasticsearchTemplate elasticsearchTemplate() {return newElasticsearchTemplate(client());

}

@BeanpublicClient client(){

Settings settings=ImmutableSettings.settingsBuilder()

.put("cluster.name", "elasticsearch")

.put("client.transport.ping_timeout", "3s").build();

TransportClient client= newTransportClient(settings);

TransportAddress address= new InetSocketTransportAddress("120.24.165.15", 9300);

client.addTransportAddress(address);returnclient;

}

@BeanpublicElasticsearchActionService elasticsearchService() {

ElasticsearchActionService elasticsearchService= newElasticsearchActionService();

elasticsearchService.init(elasticsearchTemplate());returnelasticsearchService;

}

初始化索引(库)

初始化文档库，建立索引，实现批量新增数据。

privateElasticsearchTemplate elasticsearchTemplate;

@AutowiredprivateClient esClient;public voidinit(ElasticsearchTemplate clzz) {

elasticsearchTemplate=(ElasticsearchTemplate) clzz;if (!elasticsearchTemplate.indexExists(App.ESProp.INDEX_NAME)) {

elasticsearchTemplate.createIndex(App.ESProp.INDEX_NAME);

}

elasticsearchTemplate.putMapping(TaskInfo.class);

elasticsearchTemplate.putMapping(NewsInfo.class);

}/*** 新增或者修改文档信息

*@author高国藩

* @date 2017年5月12日下午3:16:27

*@paramtaskInfoList

*@return

public boolean update(ListtaskInfoList) {

List queries = new ArrayList();for(TaskInfo taskInfo : taskInfoList) {

IndexQuery indexQuery= newIndexQueryBuilder().withId(taskInfo.getTaskId()).withObject(taskInfo).build();

queries.add(indexQuery);

}

elasticsearchTemplate.bulkIndex(queries);return true;

}

采用注解方式，初始化Mapping文件(class)

packagecom.sk.system.es;importorg.springframework.data.annotation.Id;importorg.springframework.data.elasticsearch.annotations.Document;importorg.springframework.data.elasticsearch.annotations.Field;importorg.springframework.data.elasticsearch.annotations.FieldIndex;importorg.springframework.data.elasticsearch.annotations.FieldType;importcom.sk.browser.config.App;/*** store 是否存储 FieldIndex.not_analyzed 不进行分词 indexAnalyzer="ik" 使用IK进行分词处理*/

//@Document(indexName = APP.ESProp.INDEX_NAME, type = APP.ESProp.TYPE_TASK_INFO, indexStoreType = APP.ESProp.INDEX_STORE_TYPE, shards = APP.ESProp.SHARDS, replicas = APP.ESProp.REPLICAS, refreshInterval = APP.ESProp.REFRESH_INTERVAL)

@Document(indexName = App.ESProp.INDEX_NAME, type =App.ESProp.TYPE_TASK_INFO)public classTaskInfo {

@Id//标注ID,将作为文档ID存在

@Field(index = FieldIndex.not_analyzed, store = true)privateString taskId;

@Field(type= FieldType.Integer, index = FieldIndex.not_analyzed, store = true)privateInteger userId;

@Field(type= FieldType.String, indexAnalyzer="ik", searchAnalyzer="ik", store = true)privateString taskContent;

@Field(type= FieldType.String, indexAnalyzer="ik", searchAnalyzer="ik", store = true)privateString taskArea;

@Field(type= FieldType.String, indexAnalyzer="ik", searchAnalyzer="ik", store = true)privateString taskTags;

@Field(type= FieldType.Integer, index = FieldIndex.not_analyzed, store = true)privateInteger taskState;

@Field(type= FieldType.String, index = FieldIndex.not_analyzed, store = true)privateString updateTime;

@Field(type= FieldType.String, indexAnalyzer="ik", searchAnalyzer="ik", store = true)privateString userNickName;publicString getTaskId() {returntaskId;

}public voidsetTaskId(String taskId) {this.taskId =taskId;

}publicInteger getUserId() {returnuserId;

}public voidsetUserId(Integer userId) {this.userId =userId;

}publicString getTaskContent() {returntaskContent;

}public voidsetTaskContent(String taskContent) {this.taskContent =taskContent;

}publicString getTaskArea() {returntaskArea;

}public voidsetTaskArea(String taskArea) {this.taskArea =taskArea;

}publicString getTaskTags() {returntaskTags;

}public voidsetTaskTags(String taskTags) {this.taskTags =taskTags;

}publicInteger getTaskState() {returntaskState;

}public voidsetTaskState(Integer taskState) {this.taskState =taskState;

}publicString getUpdateTime() {returnupdateTime;

}public voidsetUpdateTime(String updateTime) {this.updateTime =updateTime;

}publicString getUserNickName() {returnuserNickName;

}public voidsetUserNickName(String userNickName) {this.userNickName =userNickName;

}

@OverridepublicString toString() {return "TaskInfo [taskId=" + taskId + ", userId=" +userId+ ", taskContent=" + taskContent + ", taskArea=" +taskArea+ ", taskState=" +taskState+ ", updateTime=" + updateTime + ", userNickName="

+ userNickName + "]";

}publicTaskInfo(String taskId, Integer userId, String taskContent,

String taskArea, String taskTags, Integer taskState,

String updateTime, String userNickName) {this.taskId =taskId;this.userId =userId;this.taskContent =taskContent;this.taskArea =taskArea;this.taskTags =taskTags;this.taskState =taskState;this.updateTime =updateTime;this.userNickName =userNickName;

}publicTaskInfo() {//TODO Auto-generated constructor stub

}

钢盅郭子

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
es java 搜索引擎_Elasticsearch-Java中文搜索器（下）

ElasticSearch Java api 详解_V1.0集群的连接作为Elasticsearch节点实例化一个节点的客户端是获得客户端的最简单的方式。这个Client可以执行elasticsearch相关的操作。import static org.elasticsearch.node.NodeBuilder.*;//on startupNode node =nodeBuilder().node...
复制链接

扫一扫