最近在做的项目有个功能是kafka流量监控,目的是通过不同的维度(月度、日度、每时)监控kafka生产量和消费量!
在项目中用了3个表储存这些信息,分别是月度表(kafka_t_flowrate_mm),日度表(kafka_t_flowrate_dd),小时表(kafka_t_flowrate_hh)。
项目中需要根据集群名称、消费者名称、主题名称,时间周期去动态查询,前面三个条件都比较好解决,时间周期有点小麻烦,而且如果那个月主题没有消费就不产生数据,查询出来的结果必然漏掉那个月。
这里我想了一个办法,先轮询出要查询的所有月,再关联月表,如果哪个月没有数据,必然关联后数据是空,再把为空的字段替换成0即可。
下面是实现方式,月度、日度、每时都有些许差异:
月度:
/*初版*/
select range_mm,avg(output),avg(input)from kafka_t_flowrate_mm where cluster_id=3 and consumer_name='testconsumer3' and topic_name='test3'
and range_mm between '201803' and '201809'
group by range_mm
/*查询月区间的所有月份*/
SELECT TO_CHAR(ADD_MONTHS(TO_DATE('201803', 'yyyyMM'), ROWNUM - 1), 'yyyyMM') as monthlist FROM DUAL
CONNECT BY ROWNUM <=months_between(to_date('201809', 'yyyyMM'),to_date('201803', 'yyyyMM')) + 1
/*组合后的完整版*/
select b.monthlist as time_list,a.range_mm as time_range,nvl(a.output,0) as output,nvl(a.input,0) as input from (select range_mm,avg(output) as output,avg(input) as input from kafka_t_flowrate_mm where cluster_id=3 and consumer_name='testconsumer3' and topic_name='test3'
and range_mm between '201803' and '201809' group by range_mm) a right join (SELECT TO_CHAR(ADD_MONTHS(TO_DATE('201803', 'yyyyMM'), ROWNUM - 1), 'yyyyMM') as monthlist FROM DUAL
CONNECT BY ROWNUM <=months_between(to_date('201809', 'yyyyMM'),to_date('201803', 'yyyyMM')) + 1) b on a.range_mm=b.monthlist order by b.monthlist
日度:
/*初版*/
select range_dd,avg(output),avg(input)from kafka_t_flowrate_dd where cluster_id=3 and consumer_name='testconsumer3' and topic_name='test3'
and range_dd between '20180730' and '20180810'group by range_dd
/*查询日区间的所有日期*/
select ROWNUM,to_char(to_date('20180730','yyyymmdd')+ROWNUM-1,'yyyymmdd') as daylist from dual
connect by ROWNUM <=to_date('20180810','yyyymmdd')-to_date('20180730','yyyymmdd')+1
/*组合后完整版*/
select b.daylist as time_list,a.range_dd as time_range,nvl(a.output,0) as output,nvl(a.input,0) as input from (select range_dd,avg(output) as output,avg(input)as input from kafka_t_flowrate_dd
where cluster_id=3 and consumer_name='testconsumer3' and topic_name='test3' and range_dd between '20180730' and '20180810'
group by range_dd) a right join (select to_char(to_date('20180730','yyyymmdd')+ROWNUM-1,'yyyymmdd') as daylist from dual
connect by ROWNUM <=to_date('20180810','yyyymmdd')-to_date('20180730','yyyymmdd')+1) b on a.range_dd=b.daylist order by b.daylist
每时:
/*初版*/
select range_hh,avg(output),avg(input)from kafka_t_flowrate_hh where cluster_id=3 and consumer_name='testconsumer3' and topic_name='test3'
and range_hh between '2018082120' and '2018082210'group by range_hh
/*查询小时区间的所有小时*/
select ROWNUM,to_char(to_date('2018082120','yyyymmddhh24')+(ROWNUM-1)/24,'yyyymmddhh24') as dat from dual
connect by ROWNUM <=round(to_number(to_date('2018082210','yyyymmdd hh24')-to_date('2018082120','yyyymmdd hh24'))*24+1)
/*组合后完整版*/
select b.hourlist as time_list,a.range_hh as time_range,nvl(a.output,0) as output,nvl(a.input,0) as input from
(select range_hh,avg(output) as output,avg(input)as input from kafka_t_flowrate_hh where cluster_id=3 and consumer_name='testconsumer3' and topic_name='test3'
and range_hh between '2018082120' and '2018082210'group by range_hh) a right join
(select ROWNUM,to_char(to_date('2018082120','yyyymmddhh24')+(ROWNUM-1)/24,'yyyymmddhh24') as hourlist from dual
connect by ROWNUM <=round(to_number(to_date('2018082210','yyyymmdd hh24')-to_date('2018082120','yyyymmdd hh24'))*24+1)) b
on a.range_hh=b.hourlist order by b.hourlist
这里的第二个sql语句查询时间区间里面所有的小时里碰到一个小bug,这里用的是oralce11g。之前ROWNUM <=round(to_number(to_date('2018082210','yyyymmdd hh24')-to_date('2018082120','yyyymmdd hh24'))*24+1))这里没有用round函数包起来,在测试数据的时候大于4个小时的区间是正常的,如果时间区间是3个小时或者以内就会少一条数据,但是把to_number替换成3这种静态数据又可以正常查出3条数据,这里始终没想明白原因,但是有大牛帮助找到办法用round函数解决,不然只能通过其他方式去解决这个问题。
这里查出来之后的字段分别为echarts的x轴显示的时间、增量、生产量,echarts的数据组装部分同echarts官网,这里就不展示了