PostgreSQL in 的优化器处理以及如何优化
在使用数据库的过程中,经常会遇到需要匹配多个值的情况。
通常的写法包括:
-- select * from table where id = any(array);
-- select * from table where id in (values);
-- select * from table where id=x or id=x or ....;
-- select * from table where id in (query);
-- select * from table where id in ( values query );
-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;
每种写法会产生多种执行计划的可能,如下:
-- select * from table where id = any(array);
优化器可以使用index scan, bitmap scan, seq scan.
-- select * from table where id in (values);
优化器可以使用index scan, bitmap scan, seq scan.
-- select * from table where id=x or id=x or ....;
优化器可以使用bitmap scan + BitmapOr, seq scan.
-- select * from table where id in (query);
优化器可以使用join (merge,hash,nest).
-- select * from table where id in ( values query );
优化器可以使用join (merge,hash,nest).
-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;
优化器可以使用join (merge,hash,nest).
SQL优化策略是尽量减少CPU的运算以及page的扫描数量。
下面针对每种SQL,看看对应的可能的执行计划有什么差别,(使用开关来控制执行计划的选择, 如set enable_indexscan=off)
支持的开关如下:
enable_bitmapscan enable_hashjoin enable_indexscan enable_mergejoin enable_seqscan enable_tidscan
enable_hashagg enable_indexonlyscan enable_material enable_nestloop enable_sort
开始测试,使用auto_explain输出执行计划:
load 'auto_explain';
set auto_explain.log_analyze =true;
set auto_explain.log_buffers =true;
set auto_explain.log_nested_statements=true;
set auto_explain.log_timing=true;
set auto_explain.log_triggers=true;
set auto_explain.log_verbose=true;
set auto_explain.log_min_duration=0;
set client_min_messages ='log';
set work_mem='8GB';
测试SQL写法1:
-- select * from table where id = any(array);
do language plpgsql
$$
declare
v_id int[];
begin
select array_agg(trunc(random()*100000)) into v_id from generate_series(1,200) t(id);
perform * from t_in_test where id = any (v_id);
end;
$$
;
优化器选择1 (index scan):
离散扫描,适合小的扫描集。
LOG: duration: 2.312 ms plan:
Query Text: SELECT * from t_in_test where id = any (v_id)
Index Scan using t_in_test_pkey on public.t_in_test (cost=0.43..895.50 rows=200 width=37) (actual time=0.039..2.266 rows=200 loops=1)
Output: id, info
Index Cond: (t_in_test.id = ANY ('{50836,73414,41071,45604,...省略部分...,76236}'::integer[]))
Buffers: shared hit=776
CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
PL/pgSQL function inline_code_block line 6 at PERFORM
优化器选择2 (bitmap scan):
比index scan多了Recheck的开销,以及按照ctid排序的开销。
适合大的扫描集,排序的目的是减少离散扫描,还可以用到块设备的prefetch。
LOG: duration: 1.602 ms plan:
Query Text: SELECT * from t_in_test where id = any (v_id)
Bitmap Heap Scan on public.t_in_test (cost=888.55..1711.16 rows=200 width=37) (actual time=0.880..1.563 rows=200 loops=1)
Output: id, info
Recheck Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))
Heap Blocks: exact=184
Buffers: shared hit=784
-> Bitmap Index Scan on t_in_test_pkey (cost=0.00..888.50 rows=200 width=0) (actual time=0.846..0.846 rows=200 loops=1)
Index Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))
Buffers: shared hit=600
CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
PL/pgSQL function inline_code_block line 6 at PERFORM
优化器选择3 (seq scan):
适合非常庞大的扫描集。
LOG: duration: 19940.394 ms plan:
Query Text: SELECT * from t_in_test where id = any (v_id)
Seq Scan on public.t_in_test (cost=0.00..2683354.80 rows=200 width=37) (actual time=4.237..19940.330 rows=199 loops=1)
Output: id, info
Filter: (t_in_test.id = ANY ('{45867,72450,95153,86233,63073,11016,56010,47158,...省略部分...,90444}'::integer[]))
Rows Removed by Filter: 9999801
Buffers: shared hit=83334
CONTEXT: SQL statement "SELECT * from t_in_test where id = any (v_id)"
PL/pgSQL function inline_code_block line 6 at PERFORM
测试SQL写法2:
-- select * from table where id in (values);
do language plpgsql
$$
declare
v_where text;
begin
select string_agg(id::text,',') into v_where from (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t;
execute 'select * from t_in_test where id in ('||v_where||')';
end;
$$
;
优化器选择1 (index scan):
LOG: duration: 0.919 ms plan:
Query Text: select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,...省略部分...,48126,44868)
Index Scan using t_in_test_pkey on public.t_in_test (cost=0.43..895.50 rows=200 width=37) (actual time=0.017..0.894 rows=200 loops=1)
Output: id, info
Index Cond: (t_in_test.id = ANY ('{8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,48126,44868}'::integer[]))
Buffers: shared hit=779
CONTEXT: SQL statement "select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,73366,48126,44868)"
PL/pgSQL function inline_code_block line 6 at EXECUTE
优化器选择2 (bitmap scan):
LOG: duration: 1.012 ms plan:
Query Text: select * from t_in_test where id in (17424,80517,35148,38245,93