首先要建一个elasticsearch的集群,我建了三个node,具体配置过程,按着官网来,参考文档:[https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.3/index.html]
- 建立maven工程,配置xml文件
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${es.version}</version>
</dependency>
- 连接远程服务器
// on startup
Settings settings = Settings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = TransportClient.builder().settings(settings).build();
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300))
// on shutdown
client.close();
- Scroll
以下摘自elasticserch的官方文档
While a search request returns a single “page” of results, the scroll
API can be used to retrieve large numbers of results (or even all
results) from a single search request, in much the same way as you
would use a cursor on a traditional database.While a search request
returns a single “page” of results, the scroll API can be used to
retrieve large numbers of results (or even all results) from a single
search request, in much the same way as you would use a cursor on a
traditional database.Scrolling is not intended for real time user requests, but rather for
processing large amounts of data, e.g. in order to reindex the
contents of one index into a new index with a different configuration.
scroll 查询 可以用来对 Elasticsearch 有效地执行大批量的文档查询,而又不用付出深度分页那种代价。
游标查询允许我们 先做查询初始化,然后再批量地拉取结果。 这有点儿像传统数据库中的 cursor 。
游标查询会取某个时间点的快照数据。 查询初始化之后索引上的任何变化会被它忽略。 它通过保存旧的数据文件来实现这个特性,结果就像保留初始化时的索引 视图 一样。
深度分页的代价根源是结果集全局排序,如果去掉全局排序的特性的话查询结果的成本就会很低。 游标查询用字段 _doc 来排序。 这个指令让 Elasticsearch 仅仅从还有结果的分片返回下一批结果。
启用游标查询可以通过在查询的时候设置参数 scroll 的值为我们期望的游标查询的过期时间。 游标查询的过期时间会在每次做查询的时候刷新,所以这个时间只需要足够处理当前批的结果就可以了,而不是处理查询结果的所有文档的所需时间。 这个过期时间的参数很重要,因为保持这个游标查询窗口需要消耗资源,所以我们期望如果不再需要维护这种资源就该早点儿释放掉。 设置这个超时能够让 Elasticsearch 在稍后空闲的时候自动释放这部分资源。
- Use Scrolls in Java
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb = termQuery("multi", "test");
SearchResponse scrollResp = client.prepareSearch(test)
.addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {
for (SearchHit hit : scrollResp.getHits().getHits()) {
//Handle the hit...
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
//Break condition: No hits are returned
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}
发现一篇不错的博文,使用scroll实现Elasticsearch数据遍历和深度分页,里面介绍了scroll的用法,实现的原理,深度分页,以及scrol-scan(一种不排序的scroll)。[http://lxwei.github.io/posts/使用scroll实现Elasticsearch数据遍历和深度分页.html]
- Query DSL
Elasticsearch provides a full Java query dsl in a similar manner to the REST Query DSL. The factory for query builders is QueryBuilders. Once your query is ready, you can use the Search API.
To use QueryBuilders just import them in your class:
import static org.elasticsearch.index.query.QueryBuilders.*;
Note that you can easily print (aka debug) JSON generated queries using toString() method on QueryBuilder object.
The QueryBuilder can then be used with any API that accepts a query, such as count and search.
Query DSL东西有点多,下一篇继续写