接上篇,继续介绍transportclient Java API
4.Aggregations
ElasticSearch搜索时用的Aggregations与关系型数据库Mysql中 count() gropy by max min avg等聚合函数类似。
Aggregations 包含两个重要的概念:
1.桶(bucket):数据分组。与group by类似;
2.指标(metric):针对分组后的数据,进行统计计算(例如:max min avg等操作);
4.1Structuring aggregations
例如,这里是一个三级聚合(可以嵌套查询),由以下部分组成:
- Terms aggregation (bucket)
- Date Histogram aggregation (bucket)
- Average aggregation (metric)
SearchResponse sr = client.prepareSearch()
.addAggregation(AggregationBuilders
.terms("by_country")
.field("country")
.subAggregation(AggregationBuilders
.dateHistogram("by_year")
.field("dateOfBirth")
.dateHistogramInterval(DateHistogramInterval.YEAR)
.subAggregation(AggregationBuilders
.avg("avg_children")
.field("children"))
)
)
.execute().actionGet();
聚合可以是metric聚合或bucket聚合。
4.2Metrics aggregations
示例都比较简单,使用时需要根据业务需求,自己查找,可以通过官网 API等方式进一步了解
Metrics aggregations包含:
基于metrics aggregation的多个值返回,包含min
, max
, sum
, count
, avg
对其中Stats Aggregation做一个说明:
/**
* metrics_aggregations
*/
public void metricAgg() {
StatsAggregationBuilder aggregation =
AggregationBuilders
/*聚合名称*/
.stats("agg")
.field("height");
SearchResponse searchResponse = client.prepareSearch()
.addAggregation(aggregation)
.execute()
.actionGet();
Stats agg = searchResponse.getAggregations().get("agg");
double min = agg.getMin();
double max = agg.getMax();
double avg = agg.getAvg();
double sum = agg.getSum();
long count = agg.getCount();
}
Stats Aggregation的扩展,增加了sum_of_squares(平方), variance(方差), std_deviation(标准偏差) std_deviation_bounds(标准偏差界限)
-
Value Count Aggregation
-
Percentile Aggregation
-
Percentile Ranks Aggregation
-
Cardinality Aggregation
-
Geo Bounds Aggregation
-
Top Hits Aggregation
-
Scripted Metric Aggregation
4.3 Bucket aggregations
官网示例比较多,基本上是 AggregationBuilders类的函数。
针对Filters Aggeration做一个示例
/**
* 4.3 Bucket aggregations
*/
public void bucketAgg() {
AggregationBuilder aggregation =
AggregationBuilders
.filters("agg",
new FiltersAggregator.KeyedFilter("men", QueryBuilders.termQuery("gender", "male")),
new FiltersAggregator.KeyedFilter("women", QueryBuilders.termQuery("gender", "female")));
SearchResponse searchResponse = client.prepareSearch()
.addAggregation(aggregation)
.execute()
.actionGet();
// searchResponse is here your SearchResponse object
Filters agg = searchResponse.getAggregations().get("agg");
// For each entry
for (Filters.Bucket entry : agg.getBuckets()) {
// bucket key
String key = entry.getKeyAsString();
// Doc count
long docCount = entry.getDocCount();
System.out.print("Key :" + key + "douCount:" + docCount);
}
}
-
Missing Aggregation
-
Nested Aggregation
-
Reverse Nested Aggregation
-
Children Aggregation
-
Terms Aggregation
-
Order
-
Significant Terms Aggregation
-
Range Aggregation
-
Date Range Aggregation
-
Ip Range Aggregation
-
Histogram Aggregation
-
Date Histogram Aggregation
-
Geo Distance Aggregation
-
Geo Hash Grid Aggregation
注意:
聚合搜索,涉及大量的计算,Mysql等关系型数据库,如果进行了大量的聚合操作,将耗费大量的计算资源,会影响其增、删、改、查的基本功能。
同理,数据量比较大的情况下,建议不要在ElasticSearch集群做计算。
5.Query DSL
以下方法,基本上都是org.elasticsearch.index.query.QueryBuilders类的函数,具体应用比较简单,可以通过API或官网、源码做详细了解。
QueryBuilder qb = matchAllQuery();
高级别的全文查询,通常是用来文本域的全文检索(例如,邮件的正文),并在执行检索前,知道哪个域被分析且将被analyzer
或 search_analyzer
执行全文查询的标准查询,包括模糊匹配和短语或邻近查询。
多个域的匹配查询
针对非常见字的特殊查询
支持紧凑的Lucene查询字符串语法,允许您在单个查询字符串中指定AND | OR | NOT条件和多字段搜索。 仅限专家用户。
一种更简单,更健壮的查询字符串语法版本,适合直接向用户公开。
字段查询
多字段查询
范围查询
查找指定字段包含任何非空值的文档。
前缀查询
通配符查询,单个字符用 (?
) 多个字符用 (*
)
正则表达式查询
模糊查询,模糊函数使用Levenshtein编辑距离
类型查询
通过ID集合查询
6.Java API Administration
ElasticSeach提供了Java Api执行管理任务。
为了使用他们,需要调用admin()
进而获得AdminClient。
AdminClient adminClient = client.admin();
6.1Indices Administration
IndicesAdminClient indicesAdminClient = client.admin().indices();
/*1.创建索引*/
indicesAdminClient.prepareCreate("twitter").get();
/*2.索引设置*/
indicesAdminClient.prepareCreate("twitter")
.setSettings(Settings.builder()
.put("index.number_of_shards", 3)
.put("index.number_of_replicas", 2)
)
.get();
/*3.设置mapping*/
indicesAdminClient.prepareCreate("twitter")
.addMapping("tweet", "{\n" +
" \"tweet\": {\n" +
" \"properties\": {\n" +
" \"message\": {\n" +
" \"type\": \"string\"\n" +
" }\n" +
" }\n" +
" }\n" +
" }")
.get();
/*已经存在的索引设置mapping*/
indicesAdminClient.preparePutMapping("twitter")
.setType("user")
.setSource("{\n" +
" \"properties\": {\n" +
" \"user_name\": {\n" +
" \"type\": \"string\"\n" +
" }\n" +
" }\n" +
"}")
.get();
/*4.refresh*/
/*更新所有索引*/
indicesAdminClient.prepareRefresh().get();
/*更新单个索引*/
indicesAdminClient
.prepareRefresh("twitter")
.get();
/*更新多个索引*/
indicesAdminClient
.prepareRefresh("twitter", "company")
.get();
/*5.Get Settings*/
GetSettingsResponse response = client.admin().indices()
.prepareGetSettings("company", "employee").get();
for (ObjectObjectCursor<String, Settings> cursor : response.getIndexToSettings()) {
String index = cursor.key;
Settings settings = cursor.value;
Integer shards = settings.getAsInt("index.number_of_shards", null);
Integer replicas = settings.getAsInt("index.number_of_replicas", null);
}
/*6.Update Indices Settings*/
indicesAdminClient.prepareUpdateSettings("twitter")
.setSettings(Settings.builder()
.put("index.number_of_replicas", 0)
)
.get();
6.2Cluster Administration
cluster Health api能够获取集群的健康状态,同时给集群中每个索引的技术信息。
/*1.集群状态*/
/*所有索引信息*/
ClusterHealthResponse healths = clusterAdminClient.prepareHealth().get();
String clusterName = healths.getClusterName();
int numberOfDataNodes = healths.getNumberOfDataNodes();
int numberOfNodes = healths.getNumberOfNodes();
/*迭代查看索引信息*/
for (ClusterIndexHealth health : healths.getIndices().values()) {
String index = health.getIndex();
int numberOfShards = health.getNumberOfShards();
int numberOfReplicas = health.getNumberOfReplicas();
/*索引信息*/
ClusterHealthStatus status = health.getStatus();
}
能够同health api获取给定集群或索引的状态值
/*2.Wait for status*/
clusterAdminClient.prepareHealth()
.setWaitForYellowStatus()
.get();
clusterAdminClient.prepareHealth("company")
.setWaitForGreenStatus()
.get();
clusterAdminClient.prepareHealth("employee")
.setWaitForGreenStatus()
.setTimeout(TimeValue.timeValueSeconds(2))
.get();
如果索引状态值和所期望的值不一致,将显示的中断并抛出异常
/*3.异常处理*/
ClusterHealthResponse response = clusterAdminClient.prepareHealth("company")
.setWaitForGreenStatus()
.get();
ClusterHealthStatus status = response.getIndices().get("company").getStatus();
if (!status.equals(ClusterHealthStatus.GREEN)) {
throw new RuntimeException("Index is in " + status + " state");
}