by group 累加中文字段_elasticsearch中的MR

本文介绍了如何在Elasticsearch 2.2.0中使用ScriptedMetricAggregation进行复杂聚合计算,通过脚本实现用户活跃分值的统计。博主分享了配置Groovy脚本、创建索引、插入数据以及执行查询脚本的完整过程,强调了在实际使用中遇到的文档不全的问题,并提供了调试脚本的建议。
摘要由CSDN通过智能技术生成

elastic search真是个让人既爱又恨的东西,性能强劲,功能强大,但就是在使用中遇到种种问题(多半因为文档太差)。
文章记录一下在es 2.2.0版本中使用Scripted Metric Aggregation(也就是牛X的map-reduce)的方法。
api这是官方文档,但是并不详细,看完并不能干出什么事来.这是java api

下面贴出完整的实践内容和代码(敏感内容已抹去),目的是根据行为日志得出活跃分值

  • 在elasticsearch.yml文件中添加配置启用groovy脚本
script.engine.groovy.file.aggs: truescript.engine.groovy.file.mapping: truescript.engine.groovy.file.search: truescript.engine.groovy.file.update: truescript.engine.groovy.file.plugin: truescript.engine.groovy.indexed.aggs: truescript.engine.groovy.indexed.mapping: falsescript.engine.groovy.indexed.search: truescript.engine.groovy.indexed.update: falsescript.engine.groovy.indexed.plugin: falsescript.engine.groovy.inline.aggs: truescript.engine.groovy.inline.mapping: truescript.engine.groovy.inline.search: truescript.engine.groovy.inline.update: truescript.engine.groovy.inline.plugin: true
  • 这是es 中保存的数据
curl -XPUT "http://10.1.200.34:9200/behavior-2017.02/candidate/AVoG9St-6pLzqkumYcIr" -d '{ "businessLine": "platform", "createTime": "2017-02-04T02:30:14.000Z", "latitude": 0, "longitude": 0, "name": "c_login", "network": "unknown", "ownerId": 6403128, "ownerType": "candidate", "params": { "positionId": "2112620" }, "uuid": "a36556ed286348aeb970e0ba1cda1447"}'curl -XPUT "http://10.1.200.34:9200/behavior-2017.02/candidate/AVoG9SgJP_y-H6mvM9g8" -d '{ "businessLine": "platform", "clientIp": "*3.1*8.113.*6", "createTime": "2017-02-04T02:30:13.683Z", "latitude": 0, "longitude": 0, "name": "c_login", "network": "unknown", "ownerId": 6403118, "ownerType": "candidate", "terminal": "pc", "userAgent": "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36", "uuid": "0d9b007e3d624180aefd57e7df0b656c",}'curl -XPUT "http://10.1.200.34:9200/behavior-2017.02/candidate/AVoG9SgJP_y-H6mvM9g1" -d '{ "businessLine": "platform", "clientIp": "*23.*8.*3.1*", "createTime": "2017-02-04T02:30:13.683Z", "latitude": 0, "longitude": 0, "name": "c_register", "network": "unknown", "ownerId": 6403127, "ownerType": "candidate", "terminal": "pc", "userAgent": "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36", "uuid": "0d9b007e3d624180aefd57e7df0b656c",}'
  • 这是查询用的脚本,以id为分组,统计行为信息。
curl -XGET "http://10.1.200.34:9200/behavior*/_search/?pretty" -d '{ "aggs": { "group_by_ownerId" : {  "terms" : { "field" : "ownerId"  }, "aggs":{ "livenessScore": { "scripted_metric": { "init_script" : {"file": "user-liveness-score-init"}, "map_script" : {"file": "user-liveness-score-map"}, "combine_script" : {"file": "user-liveness-score-combine"}, "reduce_script" : {"file": "user-liveness-score-reduce"}, "params": { "_agg": {"resumeScore":{6403127:60}} } } } }  } }, "query": { "filtered": { "query": { "match_all": {} } } }, "fields": [ "ownerId" ]}'
  • 脚本及脚本的存放位置
[root@jiqi001 scripts]# pwd/apps/elasticsearch/config/scripts[root@jiqi001 scripts]# lsuser-liveness-score-combine.groovy user-liveness-score-init.groovy user-liveness-score-map.groovy user-liveness-score-reduce.groovy
_agg.loginScoreInWeek=0;_agg.loginScoreInMonth=0;_agg.registerScoreInWeek=0;_agg.registerScoreInMonth=0;~ "user-liveness-score-init.groovy" 15L, 383C
xDaysBefore = Math.round((new Date().getTime() - doc.createTime) / 1000 / 60 / 60 / 24);behaviorName = doc.name.value;resumeScore = _agg.resumeScore.get(String.valueOf(doc.ownerId.value));if (behaviorName.equals("c_login")) { if (xDaysBefore <= 7) { if (_agg.loginScoreInWeek < 4) { _agg.loginScoreInWeek += 2; } } else if (7 < xDaysBefore && xDaysBefore <= 30) { if (_agg.loginScoreInMonth < 2) { _agg.loginScoreInMonth += 1; } }} else if (behaviorName.equals("c_register")) { if (resumeScore != null && resumeScore > 30) { if (xDaysBefore <= 7) { if (_agg.registerScoreInWeek == 0) { _agg.registerScoreInWeek = 5; } } else if (7 < xDaysBefore && xDaysBefore <= 30) { if (_agg.registerScoreInMonth == 0) { _agg.registerScoreInMonth = 3; } } }};~ "user-liveness-score-map.groovy" 66L, 2657C``````shell_agg~ "user-liveness-score-combine.groovy" 1L, 5C```````shelldouble score = 0;loginScoreInWeek=0;loginScoreInMonth=0;registerScoreInWeek=0;registerScoreInMonth=0;for (a in _aggs) { if(loginScoreInWeek<4){ loginScoreInWeek += a.get("loginScoreInWeek"); }; if(loginScoreInMonth<2){ loginScoreInMonth += a.get("loginScoreInMonth"); }; if(registerScoreInWeek<5){ registerScoreInWeek += a.get("registerScoreInWeek"); }; if(registerScoreInMonth<3){ registerScoreInMonth += a.get("registerScoreInMonth"); }; }; if(loginScoreInWeek>4){ loginScoreInWeek =4; }; if(loginScoreInMonth>2){ loginScoreInMonth =2; }; if(registerScoreInWeek>5){ registerScoreInWeek =5; }; if(registerScoreInMonth>3){ registerScoreInMonth =3; }; score += loginScoreInWeek; score += loginScoreInMonth; score += registerScoreInWeek; score += registerScoreInMonth; return score;"user-liveness-score-reduce.groovy"```- Java代码```java private Map getUsersLiveScore(Map userAndResumeScore) throws InterruptedException, ExecutionException { Map userAndLivenessScore = new HashMap<>(); Map param = new HashMap<>(); param.put("resumeScore", userAndResumeScore); Map params = new HashMap<>(); params.put("_agg", param); Client client = eSClient.getClient(); AggregationBuilder aggregation = AggregationBuilders.terms("group_by_ownerId") .field("ownerId") .subAggregation( AggregationBuilders.scriptedMetric("livenessScore") .params(params) .initScript(new Script("user-liveness-score-init", ScriptService.ScriptType.FILE, "groovy", null)) .mapScript(new Script("user-liveness-score-map", ScriptService.ScriptType.FILE, "groovy", null)) .combineScript(new Script("user-liveness-score-combine", ScriptService.ScriptType.FILE, "groovy", null)) .reduceScript(new Script("user-liveness-score-reduce", ScriptService.ScriptType.FILE, "groovy", null)) ); TermsQueryBuilder ownerId = QueryBuilders.termsQuery("ownerId", userAndResumeScore.keySet()); BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); boolQueryBuilder.must(ownerId); boolQueryBuilder.must(QueryBuilders.rangeQuery("createTime").gte(DateTime.now().plusDays(-30).toDate())); SearchResponse response = client.prepareSearch("behavior-*") .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) .setQuery(boolQueryBuilder) .addAggregation(aggregation) .setFrom(0) .setSize(3000) .addField("ownerId") .execute() .get(); for (Aggregation agg : response.getAggregations()) { List buckets = ((LongTerms) agg).getBuckets(); for (Terms.Bucket bucket : buckets) { String userId = bucket.getKeyAsString(); for (Aggregation agg2 : bucket.getAggregations()) { double score = (double) ((InternalScriptedMetric) agg2).aggregation(); userAndLivenessScore.put(String.valueOf(userId), score); } } } return userAndLivenessScore; }```- Tips1.参数param必须放在_agg变量里。2.可以用"combine_script":"_agg;","reduce_script":"_aggs;"来调试脚本。3.combine组合的结果会以分片为分组,并非整个查询结果的组合.比如查询一个index如果在5个分片上有结果则返回一个长度为5的数组,
d8b0e35f5c1eef78600a82f2e44cb176.png
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值