pg_qualstats-1.0.x ‘s suggestions are not accurate, So use pg_qualstats-2.0.x

 

pg_qualstats-1.0.x 's suggestions are not accurate, So use pg_qualstats-2.0.x

pg_qualstats is a PostgreSQL extension keeping statistics on predicates found in WHERE statements and JOIN clauses.

pg_qualstats  is able to analyze what are the most-often executed quals (predicates) on your database. The powa project makes use of this to provide advances index suggestions.

It also allows you to identify correlated columns, by identifying which columns are most frequently queried together.

I installed pg_qualstats-1.0.9 (the latest version of 1.0.x) on one of my PostgreSQL platform. After the PostgreSQL has been running for several hours, it gave me some suggestions in pg_qualstats_indexes_ddl. Here are some of them :

  nspname |   nspname  |      attnames        | idxtype | execution_count |                           ddl
---------------|----------------|-------------------------|----------|------------------------|----------------------------------------------------------------------------------
   public     |  tb_device  | {dev_addr,status} |  btree  |           949       |  CREATE INDEX idx_tb_device_dev_addr_status ON public.tb_device USING btree (dev_addr, status)
   public     |  tb_device  | {dev_addr,status} |  hash   |           949       |  CREATE INDEX idx_tb_device_dev_addr_status ON public.tb_device USING hash (dev_addr, status)
   public     |  tb_device  | {dev_addr,status} |  spgist |           949       |  CREATE INDEX idx_tb_device_dev_addr_status ON public.tb_device USING spgist (dev_addr, status)
   public     |  tb_device  |        {status}          |   brin   |           184       |  CREATE INDEX idx_tb_device_status ON public.tb_device USING brin (status)
   public     |  tb_device  |        {status}          |  btree  |           184       |  CREATE INDEX idx_tb_device_status ON public.tb_device USING btree (status)
   public     |  tb_device  |        {status}          |  hash  |           184       |  CREATE INDEX idx_tb_device_status ON public.tb_device USING hash (status)

As we can see, these suggestions are not available. 
1. It suggested me creating several different kinds of indexes on the same group of columns. 
2. It suggested me creating index on low-cardinality columns, for example, tb_device.status .
3. It suggested me to create a composite spgist index on (dev_addr, status), while "spgist" does not support multicolumn indexes.

So, pg_qualstats-1.0.9 is not useful. But after I uninstalled  pg_qualstats-1.0.9 and installed pg_qualstats-2.0.2, these bugs haven't appear. 
pg_qualstats-2.0.2 offers me available suggestions. So, use pg_qualstats-2.0.2 or later version.
 

 

pg_qualstats PostgreSQL 扩展,保持 WHERE 语句和 JOIN 子句中谓词的统计信息。

它能够分析数据库上执行最多的 quals(谓词)是什么。powa 项目利用它提供索引建议。

我在 PostgreSQL pg_qualstats-1.0.9(1.0.x 的最新版本)上安装了 pg_qualstats-1.0.9(PostgreSQL 运行了几个小时后,它给了我一些建议,pg_qualstats_indexes_ddl。下面是其中一些: 


nspname |   nspname  |      attnames        | idxtype | execution_count |                           ddl
---------------|----------------|-------------------------|----------|------------------------|----------------------------------------------------------------------------------
   public     |  tb_device  | {dev_addr,status} |  btree  |           949       |  CREATE INDEX idx_tb_device_dev_addr_status ON public.tb_device USING btree (dev_addr, status)
   public     |  tb_device  | {dev_addr,status} |  hash   |           949       |  CREATE INDEX idx_tb_device_dev_addr_status ON public.tb_device USING hash (dev_addr, status)
   public     |  tb_device  | {dev_addr,status} |  spgist |           949       |  CREATE INDEX idx_tb_device_dev_addr_status ON public.tb_device USING spgist (dev_addr, status)
   public     |  tb_device  |        {status}          |   brin   |           184       |  CREATE INDEX idx_tb_device_status ON public.tb_device USING brin (status)
   public     |  tb_device  |        {status}          |  btree  |           184       |  CREATE INDEX idx_tb_device_status ON public.tb_device USING btree (status)
   public     |  tb_device  |        {status}          |  hash  |           184       |  CREATE INDEX idx_tb_device_status ON public.tb_device USING hash (status)

 

正如我们所看到的,这些建议没有什么价值。
1. 它建议我在同一组列上创建几种不同类型的索引。
2. 它建议我在低基数列上创建索引,例如 tb_device.status。
3. 建议我在 (dev_addr 状态) 上创建复合 spgist 索引, 而 "spgist" 不支持多列索引。

因此,pg_qualstats-1.0.9 没有用。但是,在卸载 pg_qualstats-1.0.9 并安装了 pg_qualstats-2.0.2 后,这些错误没有出现。
pg_qualstats-2.0.2 给我提供了可用的建议。因此,使用 pg_qualstats-2.0.2 或更新版本。
 

 

 

优化代码 def fault_classification_wrapper(vin, main_path, data_path, log_path, done_path): start_time = time.time() isc_path = os.path.join(done_path, vin, 'isc_cal_result', f'{vin}_report.xlsx') if not os.path.exists(isc_path): print('No isc detection input!') else: isc_input = isc_produce_alarm(isc_path, vin) ica_path = os.path.join(done_path, vin, 'ica_cal_result', f'ica_detection_alarm_{vin}.csv') if not os.path.exists(ica_path): print('No ica detection input!') else: ica_input = ica_produce_alarm(ica_path) soh_path = os.path.join(done_path, vin, 'SOH_cal_result', f'{vin}_sohAno.csv') if not os.path.exists(soh_path): print('No soh detection input!') else: soh_input = soh_produce_alarm(soh_path, vin) alarm_df = pd.concat([isc_input, ica_input, soh_input]) alarm_df.reset_index(drop=True, inplace=True) alarm_df['alarm_cell'] = alarm_df['alarm_cell'].apply(lambda _: str(_)) print(vin) module = AutoAnalysisMain(alarm_df, main_path, data_path, done_path) module.analysis_process() flags = os.O_WRONLY | os.O_CREAT modes = stat.S_IWUSR | stat.S_IRUSR with os.fdopen(os.open(os.path.join(log_path, 'log.txt'), flags, modes), 'w') as txt_file: for k, v in module.output.items(): txt_file.write(k + ':' + str(v)) txt_file.write('\n') for x, y in module.output_sub.items(): txt_file.write(x + ':' + str(y)) txt_file.write('\n\n') fc_result_path = os.path.join(done_path, vin, 'fc_result') if not os.path.exists(fc_result_path): os.makedirs(fc_result_path) pd.DataFrame(module.output).to_csv( os.path.join(fc_result_path, 'main_structure.csv')) df2 = pd.DataFrame() for subs in module.output_sub.keys(): sub_s = pd.Series(module.output_sub[subs]) df2 = df2.append(sub_s, ignore_index=True) df2.to_csv(os.path.join(fc_result_path, 'sub_structure.csv')) end_time = time.time() print("time cost of fault classification:", float(end_time - start_time) * 1000.0, "ms") return
05-28
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值