ElasticSearch 应用开发（六）TransportClient——Java API（2）

最新推荐文章于 2024-09-23 15:55:53 发布

腊八粥2018

最新推荐文章于 2024-09-23 15:55:53 发布

阅读量625

点赞数 1

本文链接：https://blog.csdn.net/HuoqilinHeiqiji/article/details/88663946

版权

ElasticSearch 专栏收录该内容

30 篇文章 10 订阅

订阅专栏

接上篇，继续介绍transportclient Java API

4.Aggregations

ElasticSearch搜索时用的Aggregations与关系型数据库Mysql中 count() gropy by max min avg等聚合函数类似。

Aggregations 包含两个重要的概念：

1.桶（bucket）：数据分组。与group by类似；

2.指标（metric）：针对分组后的数据，进行统计计算（例如：max min avg等操作）；

4.1Structuring aggregations

例如，这里是一个三级聚合（可以嵌套查询），由以下部分组成：

Terms aggregation (bucket)
Date Histogram aggregation (bucket)
Average aggregation (metric)

SearchResponse sr = client.prepareSearch()
				.addAggregation(AggregationBuilders
						.terms("by_country")
						.field("country")
						.subAggregation(AggregationBuilders
								.dateHistogram("by_year")
								.field("dateOfBirth")
								.dateHistogramInterval(DateHistogramInterval.YEAR)
								.subAggregation(AggregationBuilders
										.avg("avg_children")
										.field("children"))
						)
				)
				.execute().actionGet();

聚合可以是metric聚合或bucket聚合。

4.2Metrics aggregations

示例都比较简单，使用时需要根据业务需求，自己查找，可以通过官网 API等方式进一步了解

Metrics aggregations包含：

基于metrics aggregation的多个值返回，包含min, max, sum, count, avg

对其中Stats Aggregation做一个说明：

	/**
	 * metrics_aggregations
	 */
	public void metricAgg() {

		StatsAggregationBuilder aggregation =
				AggregationBuilders
						/*聚合名称*/
						.stats("agg")
						.field("height");

		SearchResponse searchResponse = client.prepareSearch()
				.addAggregation(aggregation)
				.execute()
				.actionGet();

		Stats agg = searchResponse.getAggregations().get("agg");
		double min = agg.getMin();
		double max = agg.getMax();
		double avg = agg.getAvg();
		double sum = agg.getSum();
		long count = agg.getCount();

	}

Extended Stats Aggregation

Stats Aggregation的扩展，增加了sum_of_squares（平方）, variance（方差）, std_deviation（标准偏差） std_deviation_bounds（标准偏差界限）

4.3 Bucket aggregations

官网示例比较多，基本上是 AggregationBuilders类的函数。

针对Filters Aggeration做一个示例

/**
 * 4.3 Bucket aggregations
 */
public void bucketAgg() {
	AggregationBuilder aggregation =
			AggregationBuilders
					.filters("agg",
							new FiltersAggregator.KeyedFilter("men", QueryBuilders.termQuery("gender", "male")),
							new FiltersAggregator.KeyedFilter("women", QueryBuilders.termQuery("gender", "female")));


	SearchResponse searchResponse = client.prepareSearch()
			.addAggregation(aggregation)
			.execute()
			.actionGet();


	// searchResponse is here your SearchResponse object
	Filters agg = searchResponse.getAggregations().get("agg");

	// For each entry
	for (Filters.Bucket entry : agg.getBuckets()) {
		// bucket key
		String key = entry.getKeyAsString();
		// Doc count
		long docCount = entry.getDocCount();
		System.out.print("Key :" + key + "douCount:" + docCount);
	}
}

注意：

聚合搜索，涉及大量的计算，Mysql等关系型数据库，如果进行了大量的聚合操作，将耗费大量的计算资源，会影响其增、删、改、查的基本功能。

同理，数据量比较大的情况下，建议不要在ElasticSearch集群做计算。

5.Query DSL

以下方法，基本上都是org.elasticsearch.index.query.QueryBuilders类的函数，具体应用比较简单，可以通过API或官网、源码做详细了解。

Match All Query

 QueryBuilder qb = matchAllQuery();

Full text queries

高级别的全文查询，通常是用来文本域的全文检索（例如，邮件的正文），并在执行检索前，知道哪个域被分析且将被analyzer 或 search_analyzer

match query

执行全文查询的标准查询，包括模糊匹配和短语或邻近查询。

multi_match query

多个域的匹配查询

common_terms query

针对非常见字的特殊查询

query_string query

支持紧凑的Lucene查询字符串语法，允许您在单个查询字符串中指定AND | OR | NOT条件和多字段搜索。仅限专家用户。

simple_query_string

一种更简单，更健壮的查询字符串语法版本，适合直接向用户公开。

Term level queries

字段查询

多字段查询

范围查询

查找指定字段包含任何非空值的文档。

前缀查询

通配符查询，单个字符用 (?) 多个字符用 (*)

regexp query

正则表达式查询

fuzzy query

模糊查询，模糊函数使用Levenshtein编辑距离

type query

类型查询

ids query

通过ID集合查询

6.Java API Administration

ElasticSeach提供了Java Api执行管理任务。

为了使用他们，需要调用admin() 进而获得AdminClient。

AdminClient adminClient = client.admin();

6.1Indices Administration

        IndicesAdminClient indicesAdminClient = client.admin().indices();

	/*1.创建索引*/
	indicesAdminClient.prepareCreate("twitter").get();

	/*2.索引设置*/
	indicesAdminClient.prepareCreate("twitter")
			.setSettings(Settings.builder()
					.put("index.number_of_shards", 3)
					.put("index.number_of_replicas", 2)
			)
			.get();

	/*3.设置mapping*/
	indicesAdminClient.prepareCreate("twitter")
			.addMapping("tweet", "{\n" +
					"    \"tweet\": {\n" +
					"      \"properties\": {\n" +
					"        \"message\": {\n" +
					"          \"type\": \"string\"\n" +
					"        }\n" +
					"      }\n" +
					"    }\n" +
					"  }")
			.get();

	/*已经存在的索引设置mapping*/
	indicesAdminClient.preparePutMapping("twitter")
			.setType("user")
			.setSource("{\n" +
					"  \"properties\": {\n" +
					"    \"user_name\": {\n" +
					"      \"type\": \"string\"\n" +
					"    }\n" +
					"  }\n" +
					"}")
			.get();

	/*4.refresh*/
	/*更新所有索引*/
	indicesAdminClient.prepareRefresh().get();
	/*更新单个索引*/
	indicesAdminClient
			.prepareRefresh("twitter")
			.get();
	/*更新多个索引*/
	indicesAdminClient
			.prepareRefresh("twitter", "company")
			.get();

	/*5.Get Settings*/
	GetSettingsResponse response = client.admin().indices()
			.prepareGetSettings("company", "employee").get();
	for (ObjectObjectCursor<String, Settings> cursor : response.getIndexToSettings()) {
		String index = cursor.key;
		Settings settings = cursor.value;
		Integer shards = settings.getAsInt("index.number_of_shards", null);
		Integer replicas = settings.getAsInt("index.number_of_replicas", null);
	}

	/*6.Update Indices Settings*/
	indicesAdminClient.prepareUpdateSettings("twitter")
			.setSettings(Settings.builder()
					.put("index.number_of_replicas", 0)
			)
			.get();

6.2Cluster Administration

cluster Health api能够获取集群的健康状态，同时给集群中每个索引的技术信息。


	/*1.集群状态*/

	/*所有索引信息*/
	ClusterHealthResponse healths = clusterAdminClient.prepareHealth().get();
	String clusterName = healths.getClusterName();
	int numberOfDataNodes = healths.getNumberOfDataNodes();
	int numberOfNodes = healths.getNumberOfNodes();


	/*迭代查看索引信息*/
	for (ClusterIndexHealth health : healths.getIndices().values()) {
		String index = health.getIndex();
		int numberOfShards = health.getNumberOfShards();
		int numberOfReplicas = health.getNumberOfReplicas();
		/*索引信息*/
		ClusterHealthStatus status = health.getStatus();
	}

能够同health api获取给定集群或索引的状态值


	/*2.Wait for status*/
	clusterAdminClient.prepareHealth()
			.setWaitForYellowStatus()
			.get();

	clusterAdminClient.prepareHealth("company")
			.setWaitForGreenStatus()
			.get();

	clusterAdminClient.prepareHealth("employee")
			.setWaitForGreenStatus()
			.setTimeout(TimeValue.timeValueSeconds(2))
			.get();

如果索引状态值和所期望的值不一致，将显示的中断并抛出异常

	/*3.异常处理*/
	ClusterHealthResponse response = clusterAdminClient.prepareHealth("company")
			.setWaitForGreenStatus()
			.get();

	ClusterHealthStatus status = response.getIndices().get("company").getStatus();
	if (!status.equals(ClusterHealthStatus.GREEN)) {
		throw new RuntimeException("Index is in " + status + " state");
	}