组合索引怎么应该怎么选取引导列?

有这样一个SQL select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT'; id列选择性很高,owner选择性很低 要优化它很简单,只需要在t1表上建立一个组合索引(owner,id),在t2表上建立一个索引(id) 现在要讨论的是我们应该怎么建立组合索引,哪一列(owner,id)应该放在最前面? 现在来做个实验 SQL> desc t1 Name Type Nullable Default Comments ----------- ------------- -------- ------- -------- ID NUMBER Y OBJECT_NAME VARCHAR2(128) Y OWNER VARCHAR2(30) Y SQL> desc t2 Name Type Nullable Default Comments --------- ----------- -------- ------- -------- ID NUMBER Y STATUS VARCHAR2(7) Y TEMPORARY VARCHAR2(1) Y SQL> create index inx_id on t2(id); Index created. SQL> create index inx_id_owner on t1(id,owner); Index created. SQL> select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT'; Elapsed: 00:00:00.02 Execution Plan ---------------------------------------------------------- Plan hash value: 2432674005 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 88 (2)| 00:00:02 | | 1 | SORT AGGREGATE | | 1 | 16 | | | |* 2 | HASH JOIN | | 2416 | 38656 | 88 (2)| 00:00:02 | |* 3 | INDEX FAST FULL SCAN| INX_ID_OWNER | 2416 | 26576 | 50 (0)| 00:00:01 | | 4 | INDEX FAST FULL SCAN| INX_ID | 72475 | 353K| 37 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."ID"="T2"."ID") 3 - filter("T1"."OWNER"='SCOTT') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 392 consistent gets 0 physical reads 0 redo size 422 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> create index inx_owner_id on t1(owner,id); Index created. SQL> select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT'; Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 277464349 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 47 (3)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 16 | | | |* 2 | HASH JOIN | | 2416 | 38656 | 47 (3)| 00:00:01 | |* 3 | INDEX RANGE SCAN | INX_OWNER_ID | 2416 | 26576 | 9 (0)| 00:00:01 | | 4 | INDEX FAST FULL SCAN| INX_ID | 72475 | 353K| 37 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."ID"="T2"."ID") 3 - access("T1"."OWNER"='SCOTT') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 169 consistent gets 0 physical reads 0 redo size 422 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed 如果OWNER作为引导列,逻辑读由以前的392变成现在的169,并且由以前的 index fast full scan 变成index range scan 如果强制指定走索引 inx_id_owner SQL> select /*+ index(t1 inx_id_owner) */ count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT'; Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 3161475902 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 259 (1)| 00:00:04 | | 1 | SORT AGGREGATE | | 1 | 16 | | | |* 2 | HASH JOIN | | 2416 | 38656 | 259 (1)| 00:00:04 | |* 3 | INDEX FULL SCAN | INX_ID_OWNER | 2416 | 26576 | 221 (1)| 00:00:03 | | 4 | INDEX FAST FULL SCAN| INX_ID | 72475 | 353K| 37 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."ID"="T2"."ID") 3 - access("T1"."OWNER"='SCOTT') filter("T1"."OWNER"='SCOTT') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 387 consistent gets 0 physical reads 0 redo size 422 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed 依然要387个逻辑读。 为什么要以owner为引导列?因为ID是join列,并且ID列上面没过滤条件,如果以ID列作为引导列,由于没过滤条件 那么CBO只能走 index full scan,或者index fast full scan,因为引导列没过滤条件,走不了index range scan, 最多走index skip scan,不过index skip scan代价过高,因为index skip scan要求 引导列选择性很低,但是ID这里选择性很高 SQL> select /*+ index_ss(t1 inx_id_owner) */ count(*) from t1,t2 where t1.id=t2.id and t1.owner='SYS'; Elapsed: 00:00:00.10 Execution Plan ---------------------------------------------------------- Plan hash value: 3493079762 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 72529 (1)| 00:14:31 | | 1 | SORT AGGREGATE | | 1 | 16 | | | |* 2 | HASH JOIN | | 2416 | 38656 | 72529 (1)| 00:14:31 | |* 3 | INDEX SKIP SCAN | INX_ID_OWNER | 2416 | 26576 | 72491 (1)| 00:14:30 | | 4 | INDEX FAST FULL SCAN| INX_ID | 72475 | 353K| 37 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."ID"="T2"."ID") 3 - access("T1"."OWNER"='SYS') filter("T1"."OWNER"='SYS') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 387 consistent gets 0 physical reads 0 redo size 424 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed 如果owner列作为引导列,那么优化器就可以选择index range scan,这样相比index full scan, index fast full scan 肯定要少扫描很多leaf block,逻辑读就会相对较少。 其实到这里,是否可以总结一下建立组合索引的原则呢? 引导列要选择过滤条件的列作为引导列,比如 where a.xxx='xxx' 或者 a.xxx> 或者 a.xxx< 引导列的选择性越高越好,因为选择性越高,扫描的leaf block就越少,效率就越高 尽量把join列放到组合索引最后面 这里选择以owner列作为引导列,由于owner选择性很低,所以测试索引压缩对于性能的提升 SQL> analyze index inx_owner_id validate structure; Index analyzed. SQL> select height, 2 blocks, 3 lf_blks, 4 br_blks, 5 OPT_CMPR_COUNT, 6 OPT_CMPR_PCTSAVE 7 from index_stats 8 where name = 'INX_OWNER_ID'; HEIGHT BLOCKS LF_BLKS BR_BLKS OPT_CMPR_COUNT OPT_CMPR_PCTSAVE ---------- ---------- ---------- ---------- -------------- ---------------- 2 256 219 1 1 26 SQL> drop index inx_owner_id; Index dropped SQL> create index inx_owner_id on t1(owner,id) compress 1; Index created SQL> analyze index inx_owner_id validate structure; Index analyzed SQL> SQL> select height, 2 blocks, 3 lf_blks, 4 br_blks, 5 OPT_CMPR_COUNT, 6 OPT_CMPR_PCTSAVE 7 from index_stats 8 where name = 'INX_OWNER_ID'; HEIGHT BLOCKS LF_BLKS BR_BLKS OPT_CMPR_COUNT OPT_CMPR_PCTSAVE ---------- ---------- ---------- ---------- -------------- ---------------- 2 256 161 1 1 0 索引压缩之后,Leaf block 由原来的219降低到161个,节约了58个block 现在再来看一看执行计划+统计信息 SQL> select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT'; Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 277464349 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 45 (3)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 16 | | | |* 2 | HASH JOIN | | 2416 | 38656 | 45 (3)| 00:00:01 | |* 3 | INDEX RANGE SCAN | INX_OWNER_ID | 2416 | 26576 | 7 (0)| 00:00:01 | | 4 | INDEX FAST FULL SCAN| INX_ID | 72475 | 353K| 37 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."ID"="T2"."ID") 3 - access("T1"."OWNER"='SCOTT') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 169 consistent gets 0 physical reads 0 redo size 422 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed 由此可见,索引压缩之后,逻辑读并没有下降,还是169,但是索引的leaf blcok显著减少了,这样减少了存储空间,能降低物理IO
使用优化算法,以优化VMD算法的惩罚因子惩罚因子 (α) 和分解层数 (K)。 1、将量子粒子群优化(QPSO)算法与变分模态分解(VMD)算法结合 VMD算法背景: VMD算法是一种自适应信号分解算法,主要用于分解信号为不同频率带宽的模态。 VMD的关键参数包括: 惩罚因子 α:控制带宽的限制。 分解层数 K:决定分解出的模态数。 QPSO算法背景: 量子粒子群优化(QPSO)是一种基于粒子群优化(PSO)的一种改进算法,通过量子行为模型增强全局搜索能力。 QPSO通过粒子的量子行为使其在搜索空间中不受位置限制,从而提高算法的收敛速度与全局优化能力。 任务: 使用QPSO优化VMD中的惩罚因子 α 和分解层数 K,以获得信号分解的最佳效果。 计划: 定义适应度函数:适应度函数根据VMD分解的效果来定义,通常使用重构信号的误差(例如均方误差、交叉熵等)来衡量分解的质量。 初始化QPSO粒子:定义粒子的位置和速度,表示 α 和 K 两个参数。初始化时需要在一个合理的范围内为每个粒子分配初始位置。 执行VMD分解:对每一组 α 和 K 参数,运行VMD算法分解信号。 更新QPSO粒子:使用QPSO算法更新粒子的状态,根据适应度函数调整粒子的搜索方向和位置。 迭代求解:重复QPSO的粒子更新步骤,直到满足终止条件(如适应度函数达到设定阈值,或最大迭代次数)。 输出优化结果:最终,QPSO算法会返回一个优化的 α 和 K,从而使VMD分解效果最佳。 2、将极光粒子(PLO)算法与变分模态分解(VMD)算法结合 PLO的优点与适用性 强大的全局搜索能力:PLO通过模拟极光粒子的运动,能够更高效地探索复杂的多峰优化问题,避免陷入局部最优。 鲁棒性强:PLO在面对高维、多模态问题时有较好的适应性,因此适合海上风电时间序这种非线性、多噪声的数据。 应用场景:PLO适合用于优化VMD参数(α 和 K),并将其用于风电时间序的预测任务。 进一步优化的建议 a. 实现更细致的PLO更新策略,优化极光粒子的运动模型。 b. 将PLO优化后的VMD应用于真实的海上风电数据,结合LSTM或XGBoost等模型进行风电功率预测。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值