背景:近期因为项目需求,需要用到druid的分位数聚合查询,关于分位数的概念这里不做详述,自行百度即可。因为没有使用过且网上也说直方图特性还在druid的实验特性里面,应该是还不太完善,因此个人尝试研究使用了一下,看看结果是个啥样的。
应用场景:approximate-histograms配合使用quantile(quantiles)等分位数post-agg可以实现查询0.95/0.98/0.99等的页面加载时间。
因为我们是Druid的业务使用方,所以服务并非我们这边管理,根据官网的说明,首先需要添加插件,公司使用的Druid服务版本是0.9.2版本的,截止目前官网最新是0.12.3了。所以需要:
添加扩展支持
添加方式:查看{DRUID}/extensions目录下druid-histogram存在。
druid-histogram需要添加到extension:
druid.extensions.loadList=["druid-histogram",.....]
节点需要重启来加载新添加的extension:
- 查询端,需要重启historical节点和broker节点。
- 数据摄入端,需要重启overlord节
服务插件得到支持后,然后数据摄入:
根据官网直方图的介绍:http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html
数据摄入
{ ...... "metricsSpec": [ ...... { "name": "pageLoad", "type": "longSum", "fieldName": "pageLoad" }, { "type" : "approxHistogramFold", "name" : "his_pageLoad", "fieldName" : "pageLoad", "resolution" : 50, "numBuckets" : 7, "lowerLimit" : 0.0, "upperLimit" : 10000000.0 } ...... ] ...... }
这里我只是用了pageLoad这字段来实验而已,看看druid对pageLoad进行sum和分位数计算的对比。
需要注意的是resolution、numBuckets、lowerLimit、upperLimit这几个参数的含义参见官网解释,这里不做过多说明,这里我的设置是完全看心情写的。接着就是查询了:
查询脚本
脚本1、求sum:
{ "queryType":"timeseries", "dataSource":{ "type":"table", "name":"bpm_page_view" }, "context":{ "priority":7, "timeout":3000, "queryId":"f7d75164-2d53-44fe-8978-10742e102c3d" }, "intervals":{ "type":"LegacySegmentSpec", "intervals":[ "2018-11-26T15:26:10.773+08:00/2018-11-26T15:56:10.773+08:00" ] }, "descending":false, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"appCode", "value":"ec269367bf854639a56cb1618a097c38", "extractionFn":null } ] }, "granularity":{ "type":"duration", "duration":60000, "origin":"1970-01-01T08:00:00.000+08:00" }, "aggregations":[ { "type":"filtered", "aggregator":{ "type":"longSum", "name":"pageLoad", "fieldName":"pageLoad" }, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"terminal", "value":"IOS", "extractionFn":null } ] } } ], "postAggregations":null }
脚本2、求分位数(这里求90%,95%,99%)
{ "queryType":"timeseries", "dataSource":{ "type":"table", "name":"bpm_page_view" }, "context":{ "priority":7, "timeout":3000, "queryId":"f7d75164-2d53-44fe-8978-10742e102c3d" }, "intervals":{ "type":"LegacySegmentSpec", "intervals":[ "2018-11-26T15:26:10.773+08:00/2018-11-26T15:56:10.773+08:00" ] }, "descending":false, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"appCode", "value":"ec269367bf854639a56cb1618a097c38", "extractionFn":null } ] }, "granularity":{ "type":"duration", "duration":60000, "origin":"1970-01-01T08:00:00.000+08:00" }, "aggregations":[ { "type":"filtered", "aggregator":{ "type": "approxHistogramFold", "name": "his_pageLoad", "fieldName": "his_pageLoad", "resolution" : null, "numBuckets" : null }, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"terminal", "value":"IOS", "extractionFn":null } ] } } ], "postAggregations":[ { "type" : "quantiles", "name" : "响应时间", "fieldName" : "his_pageLoad","probabilities" : [0.9,0.95,0.99] } ] }
聚合结果分析
向kafka发送数据,druid来处理,kafka生产者api:
package com.suning.ctbpm; import kafka.javaapi.producer.Producer; import kafka.producer.KeyedMessage; import kafka.producer.ProducerConfig; import java.util.Properties; public class KafkaProducerSimple { public static void main(String[] args) { String topic = "xxxx"; Properties props = new Properties(); props.put("serializer.class", "kafka.serializer.StringEncoder"); props.put("metadata.broker.list", "xxxxxxx"); props.put("request.required.acks", "1"); Producer<String, String> producer = new Producer<>(new ProducerConfig(props)); String msg; for (int i = 1; i <= 200; i++) { int j = i; //if (i == 10) { // j = 11; //} msg = "{\n" + " \"access\":\"IE_10_0\",\n" + " \"apdexSign\":100,\n" + " \"appCode\":\"ec269367bf854639a56cb1618a097c38\",\n" + " \"area\":\"某某区\",\n" + " \"blankScreen\":11,\n" + " \"browser\":\"IE\",\n" + " \"browserVersion\":\"IE_10\",\n" + " \"cache\":30,\n" + " \"city\":\"某某城市\",\n" + " \"country\":\"zh_CN\",\n" + " \"dns\":11,\n" + " \"domParser\":211,\n" + " \"domain\":\"xxx.xxx.com\",\n" + " \"firstAction\":110,\n" + " \"firstPacket\":44,\n" + " \"firstPaint\":20,\n" + " \"htmlLoad\":187,\n" + " \"ip\":\"10.200.181.61\",\n" + " \"keyPageCode\":[\n" + "\n" + " ],\n" + " \"logTime\":1543221571000,\n" + " \"net\":116,\n" + " \"operator\":\"unknown\",\n" + " \"os\":\"iOS 10 (iPhone)\",\n" + " \"pageLoad\":" + j + ",\n" + " \"pageRef\":\"http://xxx.xxx.com/broadcast/matchBefore.html\",\n" + " \"pageRender\":769,\n" + " \"processing\":765,\n" + " \"province\":\"谋省\",\n" + " \"redirect\":10,\n" + " \"request\":44,\n" + " \"resourceLoad\":558,\n" + " \"response\":101,\n" + " \"restPacket\":101,\n" + " \"slowPageSign\":10,\n" + " \"ssl\":10,\n" + " \"stalled\":10,\n" + " \"tcp\":42,\n" + " \"terminal\":\"IOS\",\n" + " \"ua\":\"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304;PPTVSports\",\n" + " \"unload\":10,\n" + " \"version\":\"V1.0.7\",\n" + " \"visitId\":\"f7f2-7f8c760d\"\n" + "}"; KeyedMessage<String, String> record = new KeyedMessage<>(topic, msg); producer.send(record); } producer.close(); } }
这里msg是根据自己的摄入脚本业务数据来造的,注意时间logTime字段,因为我们方便德鲁伊聚合到一个点上来观察,因为每一次聚合我们让logTime的时间一样,且查询脚本里面的intervals查询时间段应该是包括这个logTime。
第一次:让msg 中的logTime=1543233754000,让for循环10次,即:pageLoad从1到10的十条数据到kafka。
执行脚本1("intervals":["2018-11-26T19:34:03.205+08:00/2018-11-26T20:04:03.205+08:00"]):
[ { "timestamp": "2018-11-26T11:34:00.000Z", "result": { "pageLoad": 0 } }, ...... { "timestamp": "2018-11-26T12:01:00.000Z", "result": { "pageLoad": 0 } }, { "timestamp": "2018-11-26T12:02:00.000Z", "result": { "pageLoad": 55 } }, { "timestamp": "2018-11-26T12:03:00.000Z", "result": { "pageLoad": 0 } }, { "timestamp": "2018-11-26T12:04:00.000Z", "result": { "pageLoad": 0 } } ]
执行脚本2("intervals":["2018-11-26T19:34:03.205+08:00/2018-11-26T20:04:03.205+08:00"]):
[ { "timestamp": "2018-11-26T11:34:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } }, ...... { "timestamp": "2018-11-26T12:02:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 2, 1, 2 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 9, 9.5, 9.9 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:04:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } } ]
比较两次:druid将我的十条数据聚合在点的:2018-11-26T12:02:00.000Z,数据的logTime是:2018-11-26 20:02:34,十条数据1+2+。。。+10=55,且:TP90是9,TP95是9.5,TP99是9.9
第二次:让msg 中的logTime=1543234941000,让for循环10次,即:pageLoad从1到10的十条数据到kafka,且当i=9的时候我将j设置为7即:(1、2、3、4、5、6、7、8、7、10),为了验证是排序后的。
执行脚本1("intervals":["2018-11-26T19:53:12.89+08:00/2018-11-26T20:23:12.891+08:00"]):
[ { "timestamp": "2018-11-26T11:53:00.000Z", "result": { "pageLoad": 0 } },
......
{
"timestamp": "2018-11-26T12:02:00.000Z",
"result": {
"pageLoad": 55
}
},
...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "pageLoad": 53 } }, { "timestamp": "2018-11-26T12:23:00.000Z", "result": { "pageLoad": 0 } } ]
执行脚本2("intervals":["2018-11-26T19:53:12.89+08:00/2018-11-26T20:23:12.891+08:00"]):
[ { "timestamp": "2018-11-26T11:53:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } }, ...... { "timestamp": "2018-11-26T12:02:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 2, 1, 2 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 9, 9.5, 9.9 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 3, 1, 1 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 8, 9, 9.799999 ], "min": 1, "max": 10 } } }, ...... ]
比较两次:druid将我的十条数据聚合在点的:2018-11-26T12:22:00.000Z,数据的logTime是:2018-11-26 20:22:21,十条数据1+2+。。。+10=53,且:TP90是8,TP95是9,TP99是9.799999,这里的TP90之所以是8是因为排序了,第九个是8。
继续来吧
第三步:让msg 中的logTime=1543235747000,让for循环100次,即:pageLoad从1到100的十条数据到kafka。
执行脚本1("intervals":["2018-11-26T20:07:05.311+08:00/2018-11-26T20:37:05.311+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "pageLoad": 53 } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "pageLoad": 5050 } }, ...... ]
执行脚本2("intervals":["2018-11-26T20:07:05.311+08:00/2018-11-26T20:37:05.311+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 3, 1, 1 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 8, 9, 9.799999 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "his_pageLoad": { "breaks": [ -15.5, 1, 17.5, 34, 50.5, 67, 83.5, 100 ], "counts": [ 1, 16, 17, 16, 17, 16, 17 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 90, 95, 99 ], "min": 1, "max": 100 } } }, ...... ]
自行分析吧,再来
第四步:让msg 中的logTime=1543236350000,让for循环200次,即:pageLoad从1到200的十条数据到kafka。
执行脚本1("intervals":["2018-11-26T20:16:40.524+08:00/2018-11-26T20:46:40.524+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "pageLoad": 53 } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "pageLoad": 5050 } }, ...... { "timestamp": "2018-11-26T12:45:00.000Z", "result": { "pageLoad": 20100 } }, { "timestamp": "2018-11-26T12:46:00.000Z", "result": { "pageLoad": 0 } } ]
执行脚本2("intervals":["2018-11-26T20:16:40.524+08:00/2018-11-26T20:46:40.524+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 3, 1, 1 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 8, 9, 9.799999 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "his_pageLoad": { "breaks": [ -15.5, 1, 17.5, 34, 50.5, 67, 83.5, 100 ], "counts": [ 1, 16, 17, 16, 17, 16, 17 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 90, 95, 99 ], "min": 1, "max": 100 } } }, ...... { "timestamp": "2018-11-26T12:45:00.000Z", "result": { "his_pageLoad": { "breaks": [ -32.16666793823242, 1, 34.16666793823242, 67.33333587646484, 100.5, 133.6666717529297, 166.83334350585938, 200 ], "counts": [ 1, 33, 33, 33, 33, 33, 34 ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 180, 190, 198 ], "min": 1, "max": 200 } } }, { "timestamp": "2018-11-26T12:46:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "响应时间": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } } ]
还是自行分析吧,
到此:Druid直方图分位数的实战就结束了,比较看出,druid在将数据聚合到一个点的时候,先把数据进行升序排序,然后取TP分位数的那个值来单做聚合点的分位数值。
下班了。。。
转载请附上原创路径啊,引流哈哈哈:https://www.cnblogs.com/wynjauu/articles/10022863.html