http://blog.csdn.net/jiangshouzhuang/article/details/51792580
上次有个朋友咨询我一个GP数据倾斜的问题,他说查看gp_toolkit.gp_skew_coefficients表时花费了20-30分钟左右才出来结果,后来指导他分析原因并给出其他方案来查看数据倾斜。
目前他使用的版本是最新的版本为:
Greenplum Version: 'postgres (Greenplum Database) 4.3.8.2 build 1'
其实很多朋友经常使用如下的方式来检查数据分布:
- my_db_safe=# select gp_segment_id,count(1) from person_info group by 1;
-
- gp_segment_id | count
-
-
-
- 24 | 470585
-
- 0 | 470323
-
- 17 | 469175
-
- 18 | 468792
-
- 34 | 470394
-
- 3 | 469311
-
- 16 | 469157
-
- 32 | 470151
-
- 14 | 470724
-
- 29 | 470236
-
- 5 | 469697
-
- 25 | 470528
-
- 1 | 468895
-
- 11 | 470279
-
- 37 | 469776
-
- 21 | 470287
-
- 2 | 469164
-
- 26 | 470661
-
- 7 | 470072
-
- 31 | 469813
-
- 12 | 470014
-
- 22 | 470453
-
- 27 | 470984
-
- 33 | 470499
-
- 38 | 470528
-
- 13 | 470828
-
- 28 | 470931
-
- 4 | 469175
-
- 9 | 470253
-
- 10 | 470922
-
- 8 | 470305
-
- 23 | 470118
-
- 39 | 470619
-
- 20 | 470565
-
- 36 | 470086
-
- 15 | 471373
-
- 35 | 469284
-
- 19 | 468486
-
- 6 | 471063
-
- 30 | 468198
-
- (40 rows)
但是这种方法太简单,只有判断存储是否倾斜,不能够去对数据处理是否会出现倾斜做出判断。而且判断的维度很少,不直观。
后来Greenplum提供了gp_toolkit.gp_skew_coefficients等工具来进行检查判断。
首先我们来看一下gp_toolkit.gp_skew_coefficients这个视图的逻辑:
- my_db_safe=# \d+ gp_toolkit.gp_skew_coefficients
-
- View "gp_toolkit.gp_skew_coefficients"
-
- Column | Type | Modifiers | Storage | Description
-
-
-
- skcoid | oid | | plain |
-
- skcnamespace | name | | plain |
-
- skcrelname | name | | plain |
-
- skccoeff | numeric | | main |
-
- View definition:
-
- SELECT skew.skewoid AS skcoid, pgn.nspname AS skcnamespace, pgc.relname AS skcrelname, skew.skewval AS skccoeff
-
- FROM gp_toolkit.__gp_skew_coefficients() skew(skewoid, skewval)
-
- JOIN pg_class pgc ON skew.skewoid = pgc.oid
-
- JOIN pg_namespace pgn ON pgc.relnamespace = pgn.oid;
当我们使用视图gp_toolkit.gp_skew_coefficients来检查表数据倾斜时,该视图会基于表的行数据量来检查,如果表数据量越大,检查时间就会越长。
- my_db_safe=# select * from gp_toolkit.gp_skew_coefficients;
-
- skcoid | skcnamespace | skcrelname | skccoeff
-
-
-
- 46126 | my_schema | test_bigtable | 0.0004627596152847200000000
-
- 54316 | os | ao_schedule | 323.443016025055973672000
-
- 54436 | os | ao_queue | 165.29109905152748000
-
- 54480 | os | ao_ext_connection | 632.455532033675866400000
-
- 21225 | public | test_backup | 0.0016942981301548301720000
-
- 46073 | my_schema | hellogp | 0
-
- 54830 | os | ao_custom_sql | 0
-
- 63391 | os | ao_variables | 557.00665189601665000
-
- 54357 | os | ao_job | 289.53638837796634000
其中skccoeff 通过存储记录均值计算出的标准差,这个值越低说明数据存放约均匀,反之说明数据存储分布不均匀,要考虑分布键选择是否合理。
另外一个视图gp_toolkit.gp_skew_idle_fractions 通过计算表扫描过程中,系统闲置的百分比,帮助用户快速判断,是否存在分布键选择不合理,导致数据处理倾斜的问题。
- my_db_safe=# select * from gp_toolkit.gp_skew_idle_fractions;
-
- sifoid | sifnamespace | sifrelname | siffraction
-
-
-
- 46126 | my_schema | test_bigtable | 0.000007679941018052981353
-
- 54316 | os | ao_schedule | 0.93750000000000000000
-
- 54436 | os | ao_queue | 0.88472222222222222222
-
- 54480 | os | ao_ext_connection | 0.97500000000000000000
-
- 21225 | public | test_backup | 0.000033598871077931781492
-
- 46073 | my_schema | hellogp | 0
-
- 54830 | os | ao_custom_sql | 0
-
- 63391 | os | ao_variables | 0.97142857142857142857
-
- 54357 | os | ao_job | 0.93125000000000000000
siffraction字段表示表扫描过程中系统闲置的百分比,比如0.1表示10%的倾斜。
结合上面两个视图的结果,我们可以看到某些表的结论是数据倾斜很厉害,比如ao_schedule表,但是实际上这些表是因为数据量太少,只有几条,那只能分布在某几个segment节点上,其他segment节点都没有数据,比如:
- my_db_safe=# select gp_segment_id,count(1) from os.ao_schedule group by 1;
-
- gp_segment_id | count
-
-
-
- 21 | 1
-
- 30 | 1
-
- 8 | 1
-
- 36 | 2
-
- (4 rows)
可以看出,os.ao_schedule表只有5条数据,所有判断数据倾斜时要结合多方面来判断。
本文章会介绍一种替代上面两个视图低效查询数据倾斜的方式。
解决方案的原理:
这次方案也是使用视图来观察每个segment上的每个表的文件大小。它将仅仅输出那些表至少一个segment大小比预期的大20%以上。
下面一个工具,一个能够快速给出表倾斜的信息。
执行如下的创建函数的SQL:
- CREATE OR REPLACE FUNCTION my_func_for_files_skew()
-
- RETURNS void AS
-
- $$
-
- DECLARE
-
- v_function_name text := 'my_create_func_for_files_skew';
-
- v_location_id int;
-
- v_sql text;
-
- v_db_oid text;
-
- v_number_segments numeric;
-
- v_skew_amount numeric;
-
- BEGIN
-
-
-
- v_location_id := 1000;
-
-
-
-
-
- SELECT oid INTO v_db_oid
-
- FROM pg_database
-
- WHERE datname = current_database();
-
-
-
-
-
- v_location_id := 2000;
-
- v_sql := 'DROP VIEW IF EXISTS my_file_skew_view';
-
-
-
- v_location_id := 2100;
-
- EXECUTE v_sql;
-
-
-
-
-
- v_location_id := 2200;
-
- v_sql := 'DROP EXTERNAL TABLE IF EXISTS my_db_files_web_tbl';
-
-
-
- v_location_id := 2300;
-
- EXECUTE v_sql;
-
-
-
-
-
- v_location_id := 3000;
-
- v_sql := 'CREATE EXTERNAL WEB TABLE my_db_files_web_tbl ' ||
-
- '(segment_id int, relfilenode text, filename text, size numeric) ' ||
-
- 'execute E''ls -l $GP_SEG_DATADIR/base/' || v_db_oid ||
-
- ' | grep gpadmin | ' ||
-
- E'awk {''''print ENVIRON["GP_SEGMENT_ID"] "\\t" $9 "\\t" ' ||
-
- 'ENVIRON["GP_SEG_DATADIR"] "/' || v_db_oid ||
-
- E'/" $9 "\\t" $5''''}'' on all ' || 'format ''text''';
-
-
-
- v_location_id := 3100;
-
- EXECUTE v_sql;
-
-
-
-
-
- v_location_id := 4000;
-
- SELECT count(*) INTO v_number_segments
-
- FROM gp_segment_configuration
-
- WHERE preferred_role = 'p'
-
- AND content >= 0;
-
-
-
-
-
- v_location_id := 4100;
-
- v_skew_amount := 1.2*(1/v_number_segments);
-
-
-
-
-
- v_location_id := 4200;
-
- v_sql := 'CREATE OR REPLACE VIEW my_file_skew_view AS ' ||
-
- 'SELECT schema_name, ' ||
-
- 'table_name, ' ||
-
- 'max(size)/sum(size) as largest_segment_percentage, ' ||
-
- 'sum(size) as total_size ' ||
-
- 'FROM ( ' ||
-
- 'SELECT n.nspname AS schema_name, ' ||
-
- ' c.relname AS table_name, ' ||
-
- ' sum(db.size) as size ' ||
-
- ' FROM my_db_files_web_tbl db ' ||
-
- ' JOIN pg_class c ON ' ||
-
- ' split_part(db.relfilenode, ''.'', 1) = c.relfilenode ' ||
-
- ' JOIN pg_namespace n ON c.relnamespace = n.oid ' ||
-
- ' WHERE c.relkind = ''r'' ' ||
-
- ' GROUP BY n.nspname, c.relname, db.segment_id ' ||
-
- ') as sub ' ||
-
- 'GROUP BY schema_name, table_name ' ||
-
- 'HAVING sum(size) > 0 and max(size)/sum(size) > ' ||
-
- v_skew_amount::text || ' ' ||
-
- 'ORDER BY largest_segment_percentage DESC, schema_name, ' ||
-
- 'table_name';
-
-
-
- v_location_id := 4300;
-
- EXECUTE v_sql;
-
-
-
- EXCEPTION
-
- WHEN OTHERS THEN
-
- RAISE EXCEPTION '(%:%:%)', v_function_name, v_location_id, sqlerrm;
-
- END;
-
- $$ language plpgsql;
然后我们执行函数,创建相关的对象:
- my_db_safe=# select my_func_for_files_skew();
-
- NOTICE: view "my_file_skew_view" does not exist, skipping
-
- CONTEXT: SQL statement "DROP VIEW IF EXISTS my_file_skew_view"
-
- PL/pgSQL function "my_func_for_files_skew" line 22 at execute statement
-
- NOTICE: table "my_db_files_web_tbl" does not exist, skipping
-
- CONTEXT: SQL statement "DROP EXTERNAL TABLE IF EXISTS my_db_files_web_tbl"
-
- PL/pgSQL function "my_func_for_files_skew" line 29 at execute statement
-
- my_func_for_files_skew
-
-
-
-
-
- (1 row)
这时我们就可以查看我们计划的倾斜表:
- my_db_safe=# select * from my_file_skew_view ;
-
- schema_name | table_name | largest_segment_percentage | total_size
-
-
-
- os | ao_variables | 0.87500000000000000000 | 448
-
- my_schema | test | 0.50000000000000000000 | 192
-
- os | ao_queue | 0.22579365079365079365 | 20160
-
- os | ao_schedule | 0.39534883720930232558 | 344
-
- os | ao_job | 0.35305343511450381679 | 8384
-
- pg_catalog | pg_attribute_encoding | 0.03067484662576687117 | 5341184
-
- os | ao_ext_connection | 1.00000000000000000000 | 120
-
- (8 rows)
-
-
-
- my_db_safe=#
我们也可以选择按照倾斜度的大小进行排序:
- my_db_safe=# select * from my_file_skew_view order by largest_segment_percentage desc;
-
- schema_name | table_name | largest_segment_percentage | total_size
-
-
-
- os | ao_ext_connection | 1.00000000000000000000 | 120
-
- os | ao_variables | 0.87500000000000000000 | 448
-
- my_schema | test | 0.50000000000000000000 | 192
-
- os | ao_schedule | 0.39534883720930232558 | 344
-
- os | ao_job | 0.35305343511450381679 | 8384
-
- os | ao_queue | 0.22579365079365079365 | 20160
-
- pg_catalog | pg_attribute_encoding | 0.03067484662576687117 | 5341184
根据查看结果,需要我们关注的是largest_segment_percentage这个字段的值,越靠近1说明一个segment上面的数据比集群的其他节点更多,比如os.ao_variables表的largest_segment_percentage为0.875,说明87.5%的数据在一个segment上面。
我们可以验证一下:
- my_db_safe=# select gp_segment_id,count(1) from os.ao_variables group by 1;
-
- gp_segment_id | count
-
-
-
- 32 | 1
-
- 35 | 7
-
- (2 rows)
很显然,共有7条数据(总共8条数据)都在gp_segment_id为35的segment上面,占87.5%。
如果大家对Greenplum数据库熟悉的话,就会发现上面工具的一个问题,即表膨胀。
当我们对表执行DML操作时,对于删除的空间并没有立马释放给操作系统,所以我们的计算结果可能会包含这部分大小。
个人建议在执行这个查看表文件倾斜之前,对需要统计的表进行Vacuum回收空间,或使用CTAS方式进行表重建。
另外补充一点,如果你想对单个表进行统计倾斜度时,可以修改函数,添加一个参数,用来传入表名或表的oid即可。