数据库快速更新全表数据

最新推荐文章于 2024-05-16 14:53:21 发布

kmblack1

最新推荐文章于 2024-05-16 14:53:21 发布

阅读量1.1w

点赞数 3

分类专栏： postgresql 文章标签：数据库 postgresql 软件数据算法

本文链接：https://blog.csdn.net/kmblack1/article/details/78377188

版权

postgresql 专栏收录该内容

95 篇文章 13 订阅

订阅专栏

备注:以下更新的算法适合所有数据库,示例采用PostgreSQL,其它数据库只用修改语法即可.

题外话:软件思想很重要,不要局限于某种语言\工具\数据库,思想才是最重要的.有时候一个灵光一闪的想法说不定就是一个好的软件.

开始谈正事,在update全表数据时,常规写法下大家经常抱怨更新太慢,语句如下:

update 表 set 字段1=0,字段n=n

实际上这里由于在update前,数据库需要读取整个表,然后才开始update,所以上面的语句可以分成两步看

select * from 表
update 表 set 字段1=select.字段,字段n=select.字段n where 主键=select.主键

select是一个全表扫描,update慢的原因主要就是这里了.
知道了问题所在,现在要想办法避开全表扫描,使用PRIMARY KEY扫描,并且不一次更新完,而是分批次更新,直到全表数据更新完成.

示例采用PostgreSQL数据库

1.创建测试表

drop table if exists test;
create table test(
	objectid bigint not null,	
	name text not null,	
	describe jsonb,
	flag integer default(2) not null, /*演示要更新的字段*/
	constraint pk_test_objectid primary key (objectid) with (fillfactor=60)
) with (fillfactor=60);

经常更新的表建议设置fillfactor系数以提升Update和delete性能,默认为90.

2.创建随机生成中文字符函数

drop function if exists gen_random_zh(int,int);
create or replace function gen_random_zh(int,int)
	returns text
as $$
	select string_agg(chr((random()*(20901-19968)+19968 )::integer) , '')  from generate_series(1,(random()*($2-$1)+$1)::integer);
$$ language sql;

注意:这个函数是sql语言函数

3.导入数据函数

--$1为起始编号,每调用一次写入1000000条记录
drop function if exists ins_test(bigint);
create or replace function ins_test(bigint)
  returns void
as $$
	declare	
	begin
	
		for i in 1..1000 loop
			insert into test(objectid,name)
				select( 
					(id + (i-1)*1000) 
					+ ($1 - 1)
			)as objectid,gen_random_zh(8,32) as name from generate_series(1,1000) as id;
			--raise notice  '%', i;
		end loop;
	end;
$$ language plpgsql;

注意:这个函数是plpgsql语言函数

4.导入1千万数据

导入1000万测试数据,为提高效率,开10个进程导入,每个写入100万.
根据机器配置不同,每次执行时间约20秒至几分钟不等.

select ins_test(1);
select ins_test(1000001);
select ins_test(2000001);
select ins_test(3000001);
select ins_test(4000001);
select ins_test(5000001);
select ins_test(6000001);
select ins_test(7000001);
select ins_test(8000001);
select ins_test(9000001);

为节约时间全部导入完成后用以下命令统计表信息

analyze test;

5.更新数据

5.1常规方法

没有测试,肯定很慢

update test set flag=1;
analyze test;

5.2分批层逐条更新

do $$
	declare 
		v_currentid bigint;
		v_rec record;
		v_index integer;
		v_isok boolean;
	begin
		v_currentid := 0; v_index := 0;	
		loop
			v_isok := false;
			v_index := v_index + 1;			
			for v_rec in select objectid from test where objectid>v_currentid order by objectid limit 1000 loop
				update test set flag=1 where objectid=v_rec.objectid;
				v_isok:=true;
				v_currentid := v_rec.objectid;
			end loop;			
			--raise notice  '%,%', v_index,v_currentid;
			if( false = v_isok ) then
				return;
			end if;
		end loop;
	end;
$$;
--Time: 320023.745 ms
analyze test;

--第二个版本
do $$
	declare 
		v_currentid bigint;
		v_rec record;
	begin
		v_currentid := -1; 
		loop
			for v_rec in (with cte as(
				select objectid from test where objectid>v_currentid order by objectid limit 1000
			)select array_agg(objectid) as ids from cte) loop
				update test set flag=1 where objectid=any(v_rec.ids);
				v_currentid := v_rec.ids[1000];
			end loop;			
			--raise notice  '%', v_currentid;
			if( v_currentid is null ) then
				return;
			end if;
		end loop;
	end;
$$;

6.快速在两个表之间导数据

do $$
	declare 
		v_currentid bigint;
	begin
		v_currentid := 0;
		loop
			with cte as(
				insert into enterprise_mores(objectid,bank,registered,number)
					select objectid,bank,registered,number from enterprises where objectid>v_currentid order by objectid limit 1000
					returning objectid
			) select max(objectid) into v_currentid from cte;
			raise notice  '%', v_currentid;
			if( v_currentid is null ) then
				return;
			end if;
		end loop;
	end;
$$;

7.其它

更加复杂的方法是多次执行分批次更新,就如同第3节中的方法一样.
如果你有更好的方法请发邮件给我81855841@qq.com
如果你要监控过程,请删除raise之前的注释.
####请记住:大数据处理分片处理是最基本的思想,再此之上结合需求和算法从而高效的完成工作任务.

kmblack1

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
1
评论
数据库快速更新全表数据

备注:以下更新的算法适合所有数据库,示例采用PostgreSQL,其它数据库只用修改语法即可.题外话:软件思想很重要,不要局限于某种语言\工具\数据库,思想才是最重要的.有时候一个灵光一闪的想法说不定就是一个好的软件.开始谈正事,在update全表数据时,常规写法下大家经常抱怨更新太慢,语句如下:update 表 set 字段1=0,字段n=n实际上这里由于在update前,数据库需要读取整个表,然
复制链接

扫一扫