Doris物化视图加布隆过滤器提升10倍查询速度

文章讲述了在Doris数据库中遇到大规模数据查询性能瓶颈的问题,通过对SQL进行分析发现列存模型对大量字段排序影响较大。通过建立物化视图和分步查询,以及考虑使用布隆过滤器,成功将查询时间从65秒缩短到0.128秒,有效解决了查询速度问题。
摘要由CSDN通过智能技术生成


1.背景

设备状态历史表,按天分区是Duplicate 模型,页面上有明细查询。但是查一个月数据翻页翻到17万页,每页10条数据,就会卡住不动。

doris配置:测试环境 3BE ,32g内存,4线程

2.sql分析

字段比较多,下面是列表分页执行的Doris-sql,时间范围是一个月

		
				 select 
					id,
        epid,
        epname,
        matrix_no as matrixNo,
        cmu_no as cmuNo,
        esc_no as escNo,
        producer,
        report_time as reportTime,
        inner_temp as innerTemp,
        cls1_daily_cha_nrg as cls1DailyChaNrg,
        cls1_daily_dis_nrg as cls1DailyDisNrg,
        cls1_month_cha_nrg as cls1MonthChaNrg,
        cls1_month_dis_nrg as cls1MonthDisNrg,
        cls1_year_cha_nrg as cls1YearChaNrg,
        cls1_year_dis_nrg as cls1YearDisNrg,
        cls1_total_cha_nrg as cls1TotalChaNrg,
        cls1_total_dis_nrg as cls1TotalDisNrg,
        dcdc_running1 as dcdcRunning1,
        dcdc_bus_vol1 as dcdcBusVol1,
        dcdc_bus_curr1 as dcdcBusCurr1,
        dcdc_vol1 as dcdcVol1,
        dcdc_curr1 as dcdcCurr1,
        dcdc_insul1 as dcdcInsul1,
        cls1_pack_num as cls1PackNum,
        cls1_dev_status as cls1DevStatus,
        cls1_vol as cls1Vol,
        cls1_curr as cls1Curr,
        cls1_soc as cls1Soc,
        cls1_soh as cls1Soh,
        cls1_soe as cls1Soe,
        cls1_dod as cls1Dod,
        cls1_io_power as cls1IoPower,
        cls1_enable_cha_nrg as cls1EnableChaNrg,
        cls1_enable_dis_nrg as cls1EnableDisNrg,
        cls1_pack_max_temp as cls1PackMaxTemp,
        cls1_pack_max_temp_pos as cls1PackMaxTempPos,
        cls1_pack_min_temp as cls1PackMinTemp,
        cls1_pack_min_temp_pos as cls1PackMinTempPos,
        cls1_pack_min_vol as cls1PackMinVol,
        cls1_pack_min_vol_pos as cls1PackMinVolPos,
        cls1_pack_max_vol as cls1PackMaxVol,
        cls1_pack_max_vol_pos as cls1PackMaxVolPos,
        cls2_daily_cha_nrg as cls2DailyChaNrg,
        cls2_daily_dis_nrg as cls2DailyDisNrg,
        cls2_month_cha_nrg as cls2MonthChaNrg,
        cls2_month_dis_nrg as cls2MonthDisNrg,
        cls2_year_cha_nrg as cls2YearChaNrg,
        cls2_year_dis_nrg as cls2YearDisNrg,
        cls2_total_cha_nrg as cls2TotalChaNrg,
        cls2_total_dis_nrg as cls2TotalDisNrg,
        dcdc_running2 as dcdcRunning2,
        dcdc_bus_vol2 as dcdcBusVol2,
        dcdc_bus_curr2 as dcdcBusCurr2,
        dcdc_vol2 as dcdcVol2,
        dcdc_curr2 as dcdcCurr2,
        dcdc_insul2 as dcdcInsul2,
        cls2_pack_num as cls2PackNum,
        cls2_dev_status as cls2DevStatus,
        cls2_vol as cls2Vol,
        cls2_curr as cls2Curr,
        cls2_soc as cls2Soc,
        cls2_soh as cls2Soh,
        cls2_soe as cls2Soe,
        cls2_dod as cls2Dod,
        cls2_io_power as cls2IoPower,
        cls2_enable_cha_nrg as cls2EnableChaNrg,
        cls2_enable_dis_nrg as cls2EnableDisNrg,
        cls2_pack_max_temp as cls2PackMaxTemp,
        cls2_pack_max_temp_pos as cls2PackMaxTempPos,
        cls2_pack_min_temp as cls2PackMinTemp,
        cls2_pack_min_temp_pos as cls2PackMinTempPos,
        cls2_pack_min_vol as cls2PackMinVol,
        cls2_pack_min_vol_pos as cls2PackMinVolPos,
        cls2_pack_max_vol as cls2PackMaxVol,
        cls2_pack_max_vol_pos as cls2PackMaxVolPos
        from t_sacu_esc_state_his	 
        where   partition_date >= '2024-03-12 00:00:00' and partition_date<='2024-04-10 23:59:59' 
				and  report_time between '2024-03-12 00:00:00' and  '2024-04-10 23:59:59'
				order by report_time desc
				limit 10 offset 1733800;

查询了65秒

3.问题原因分析

上图sql查询了70多个字段,doris是列存查这么多字段对其排序性能影响比较大。改为只查询三个字段,只需要4.7秒

4.解决方案

4.1 建立物化视图

  1.   建立一个包含report_time,partition_date,id的
create materialized view mv_1 as
select partition_date,report_time,id from t_sacu_esc_state_his order by partition_date desc,report_time desc,id desc;

     2. 查看是否建立完成

SHOW ALTER TABLE ROLLUP FROM isoc; 

 4.2改写sql

改成分两次查询,第一次查询出分区编号与id,第二次根据分区编号加id查

查询id+分区编号只花费了一秒

 select partition_date,
       id				
        from t_sacu_esc_state_his
        where  
				  report_time between '2024-03-12 00:00:00' and  '2024-04-10 23:59:59'
				order by report_time 
				limit 10 offset 1733800;

查看是否命中物化视图

看到命中了mv_1这个物化视图

根据id加分区编号查询其他详情数据

					select  *		
        from t_sacu_esc_state_his
        where   partition_date = '2024-04-06'  and id in('_20240411105704563103300','_20240411105704573152300','_20240411105704563103000','_20240411105704573152500',
				'_20240411105704563102800','_20240411105704573153100','_20240411105704573153300','_20240411105704563102500','_20240411105704573153900','_20240411113802866110040')

只花费了0.128s

4.3 给id字段增加布隆过滤器

  • 6
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值