elasticsearch-java应用-对数据进行聚合分析

3 篇文章 0 订阅
2 篇文章 0 订阅

【问题描述】:

需求:对员工信息进行聚合分析

1.先对员工按照国家country分组,然后再按入职年限join_date进行分组,分组后在计算组内平均薪资

员工数据如下:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1,
    "hits": [
      {
        "_index": "company",
        "_type": "employee",
        "_id": "5",
        "_score": 1,
        "_source": {
          "name": "mike",
          "age": 37,
          "position": "finance manager",
          "country": "usa",
          "join_date": "2015-01-01",
          "salary": 15000
        }
      },
      {
        "_index": "company",
        "_type": "employee",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "marry",
          "age": 35,
          "position": "technique manager",
          "country": "china",
          "join_date": "2017-01-01",
          "salary": 12000
        }
      },
      {
        "_index": "company",
        "_type": "employee",
        "_id": "4",
        "_score": 1,
        "_source": {
          "name": "jen",
          "age": 25,
          "position": "junior finance",
          "country": "usa",
          "join_date": "2016-01-01",
          "salary": 7000
        }
      },
      {
        "_index": "company",
        "_type": "employee",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "jack",
          "age": 27,
          "position": "technique",
          "country": "china",
          "join_date": "2017-01-01",
          "salary": 10000
        }
      },
      {
        "_index": "company",
        "_type": "employee",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "tom",
          "age": 32,
          "position": "senior technique software",
          "country": "china",
          "join_date": "2016-01-01",
          "salary": 11000
        }
      }
    ]
  }
}
java 学习视频中使用的是5.2版本,代码如下


《图1》

no modules loaded
loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
loaded plugin [org.elasticsearch.transport.Netty4Plugin]
Exception in thread "main" Failed to execute phase [query], all shards failed; shardFailures {[yWbGC_XwQsyJXKZD3kIDgQ][company][0]: RemoteTransportException[[yWbGC_X][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: IllegalArgumentException[Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]; }{[yWbGC_XwQsyJXKZD3kIDgQ][company][1]: RemoteTransportException[[yWbGC_X][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: IllegalArgumentException[Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]; }{[yWbGC_XwQsyJXKZD3kIDgQ][company][2]: RemoteTransportException[[yWbGC_X][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: IllegalArgumentException[Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]; }{[yWbGC_XwQsyJXKZD3kIDgQ][company][3]: RemoteTransportException[[yWbGC_X][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: IllegalArgumentException[Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]; }{[yWbGC_XwQsyJXKZD3kIDgQ][company][4]: RemoteTransportException[[yWbGC_X][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: IllegalArgumentException[Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]; }
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243)
	at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107)
	at org.elasticsearch.action.search.InitialSearchPhase.access$100(InitialSearchPhase.java:49)
	at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$onFailure$1(InitialSearchPhase.java:217)
	at org.elasticsearch.action.search.InitialSearchPhase.maybeFork(InitialSearchPhase.java:171)
	at org.elasticsearch.action.search.InitialSearchPhase.access$000(InitialSearchPhase.java:49)
	at org.elasticsearch.action.search.InitialSearchPhase$2.onFailure(InitialSearchPhase.java:217)
	at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73)
	at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:51)
	at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:527)
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1098)
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1191)
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1175)
	at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:66)
	at org.elasticsearch.action.search.SearchTransportService$6$1.onFailure(SearchTransportService.java:385)
	at org.elasticsearch.search.SearchService$2.onFailure(SearchService.java:324)
	at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:318)
	at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:312)
	at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1002)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: NotSerializableExceptionWrapper[: Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]; nested: IllegalArgumentException[Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.];
	at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:619)
	at org.elasticsearch.action.search.SearchPhaseExecutionException.guessRootCauses(SearchPhaseExecutionException.java:170)
	at org.elasticsearch.action.search.SearchPhaseExecutionException.getCause(SearchPhaseExecutionException.java:111)
	at org.elasticsearch.ElasticsearchException.writeTo(ElasticsearchException.java:286)
	at org.elasticsearch.action.search.SearchPhaseExecutionException.writeTo(SearchPhaseExecutionException.java:61)
	at org.elasticsearch.common.io.stream.StreamOutput.writeException(StreamOutput.java:864)
	at org.elasticsearch.ElasticsearchException.writeTo(ElasticsearchException.java:286)
	at org.elasticsearch.transport.ActionTransportException.writeTo(ActionTransportException.java:59)
	at org.elasticsearch.common.io.stream.StreamOutput.writeException(StreamOutput.java:864)
	at org.elasticsearch.transport.TcpTransport.sendErrorResponse(TcpTransport.java:1151)
	at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:71)
	at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:66)
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler$1.onFailure(HandledTransportAction.java:92)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:222)
	... 28 more
Caused by: java.lang.IllegalArgumentException: Fielddata is disabled on text fields by default. Set fielddata=true on [country] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
	at org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:301)
	at org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:115)
	at org.elasticsearch.index.query.QueryShardContext.getForField(QueryShardContext.java:165)
	at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.resolve(ValuesSourceConfig.java:96)
	at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.resolveConfig(ValuesSourceAggregationBuilder.java:294)
	at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:287)
	at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:36)
	at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.build(AbstractAggregationBuilder.java:132)
	at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.build(AggregatorFactories.java:329)
	at org.elasticsearch.search.SearchService.parseSource(SearchService.java:749)
	at org.elasticsearch.search.SearchService.createContext(SearchService.java:558)
	at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:534)
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:330)
	at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:316)
	... 9 more

Process finished with exit code 1

《日志1》


【定为分析】:

1.报错信息如《图1》,考虑原因:a.导错包,b.es版本不兼容

a.比对后导包正常

b.查看es的jar包源码如下:

elasticsearch-5.2.2.jar如下图



而elasticsearch-6.0.1.jar如下图:


getBuckets方法在6.x版本中获取的集合对象为Bucket的子类,使用强转将类型转化;

while(groupByCountryBucketInterator.hasNext()) {
            StringTerms.Bucket groupByCountryBucket = groupByCountryBucketInterator.next();
            System.out.println(groupByCountryBucket.getKey()+":"+groupByCountryBucket.getDocCount());

            Histogram groupByJoinDate = (Histogram)groupByCountryBucket.getAggregations().asMap().get("group_by_join_date");
//            Iterator<org.elasticsearch.search.aggregations.bucket.histogram.Histogram.Bucket> groupByJoinDateBucketIterator = groupByJoinDate.getBuckets().iterator();
            Iterator<Histogram.Bucket> groupByJoinDateBucketIterator = (Iterator<Histogram.Bucket>)groupByJoinDate.getBuckets().iterator();
            while(groupByJoinDateBucketIterator.hasNext()) {
                Histogram.Bucket groupByJoinDateBucket = groupByJoinDateBucketIterator.next();
                System.out.println(groupByJoinDateBucket.getKey() + ":" +groupByJoinDateBucket.getDocCount());

                Avg avg = (Avg) groupByJoinDateBucket.getAggregations().asMap().get("avg_salary");
                System.out.println(avg.getValue());
            }
        }

b.2 编译通过后,运行程序,报错日志如【问题描述】中的《日志1》:

text类型的数据会按照倒排索引的方式进行分词,官方文档解释该字段会被加载到堆中,消耗大量的堆空间,导致用户体验延迟,所以默认情况下禁用fielddata字段,默认值为false;

但是对此文本字段上的脚本进行排序、聚合或访问值时会出现:Fielddata is disabled on text fields by default 异常

即若使用聚合,需设置此字段的fielddata值为true;

在5.2版本的es中mapping创建后,不能修改_mapping的fielddata值。在6.x版本中可以进行修改,方式如下:



【问题根因】:

1.es的jar包版本不一样,api差异;

2.es版本差异;

【解决方案】:

getBuckets方法在6.x版本中获取的集合对象为Bucket的子类,使用强转将类型转化;

while(groupByCountryBucketInterator.hasNext()) {
            StringTerms.Bucket groupByCountryBucket = groupByCountryBucketInterator.next();
            System.out.println(groupByCountryBucket.getKey()+":"+groupByCountryBucket.getDocCount());

            Histogram groupByJoinDate = (Histogram)groupByCountryBucket.getAggregations().asMap().get("group_by_join_date");
//            Iterator<org.elasticsearch.search.aggregations.bucket.histogram.Histogram.Bucket> groupByJoinDateBucketIterator = groupByJoinDate.getBuckets().iterator();
            Iterator<Histogram.Bucket> groupByJoinDateBucketIterator = (Iterator<Histogram.Bucket>)groupByJoinDate.getBuckets().iterator();
            while(groupByJoinDateBucketIterator.hasNext()) {
                Histogram.Bucket groupByJoinDateBucket = groupByJoinDateBucketIterator.next();
                System.out.println(groupByJoinDateBucket.getKey() + ":" +groupByJoinDateBucket.getDocCount());

                Avg avg = (Avg) groupByJoinDateBucket.getAggregations().asMap().get("avg_salary");
                System.out.println(avg.getValue());
            }
        }

设置country字段的fielddata值为true;



【归纳总结】:

es版本之间的差异较大,在语法及应用时需根据当前版本随机应变,了解各个版本为解决哪些问题做了哪些修改;


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值