SQL批量更新时性能优化场景

seeyouwell

已于 2024-09-26 09:21:08 修改

阅读量217

点赞数 1

文章标签： oracle 数据库 sql 性能优化

于 2024-09-26 09:19:47 首次发布

本文链接：https://blog.csdn.net/seeyouwell/article/details/142546556

版权

一、问题背景

原始需求是这样的，100个计数器硬件设备每隔5秒给服务器上报计数信息，服务端收到消息后先将上报的元数据存储起来，再每隔10s批量对之前上报的所有的元数据按时间段进行批量累加统计更新（题外话，为什么要每隔10s处理一次，而不是上报一次就累加一次？是为了不频繁操作数据库）

经过优化之后，批量更新SQL从58109ms耗时下降到了124ms。

二、实现方案

在看实现方案前，我们先看一下表的设计，其余的业务相关字段大家可以先忽略，主要就是要对相应时间段的finish_count做累加计算。当前表aps_produce_statistics中有73w条数据。

CREATE TABLE `aps_produce_statistics` (
  `produce_id` bigint(20) NOT NULL COMMENT '生产任务id',
  `line_id` bigint(20) NOT NULL COMMENT '产线id',
  `shift` tinyint(2) NOT NULL COMMENT '班次 0-白班 1-夜班',
  `time_segment` tinyint(2) NOT NULL COMMENT '所在时间段，1-10',
  `plan_count` int(10) DEFAULT NULL COMMENT '时间段内的计划数',
  `finish_count` int(10) NOT NULL COMMENT '完成量',
  `plan_id` bigint(20) NOT NULL COMMENT '计划id',
  `order_id` bigint(20) NOT NULL COMMENT '订单id',
  `produce_time` date DEFAULT NULL COMMENT '生产时间'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

2.1 方案一

将多个update语句通过<foreach/> 连接起来，生成的SQL就是以；分割的多个update语句。这样的效率是最差的，每执行一个update就要进行全表扫描，如果循环200次，那么耗时就是58109ms。


<update id="updateProduceTimeCount">
	<foreach collection="list" item="item" separator=";">
		update
			aps_produce_statistics
		set
			finish_count = finish_count + #{item.finishCount}
		WHERE
			  produce_id = #{item.produceId} and time_segment = #{item.timeSegment}
	</foreach>
</update>

update
	aps_produce_statistics
set
	finish_count = finish_count + 2
WHERE
	  produce_id = 1 and time_segment = 1;
update
	aps_produce_statistics
set
	finish_count = finish_count + 3
WHERE
	  produce_id = 1 and time_segment = 2;

2.2 方案2

方案1每次都要进行全表扫描，且进行了200次循环，共扫描了200*73w=14600w的数据，速度是可想而知的，所以这次我们给表加上联合索引，联合索引的字段就是where后的两个条件子段produce_id + time_segment。加了索引之后的建表语句如下:

CREATE TABLE `aps_produce_statistics` (
  `produce_id` bigint(20) NOT NULL COMMENT '生产任务id',
  `line_id` bigint(20) NOT NULL COMMENT '产线id',
  `shift` tinyint(2) NOT NULL COMMENT '班次 0-白班 1-夜班',
  `time_segment` tinyint(2) NOT NULL COMMENT '所在时间段，1-10',
  `plan_count` int(10) DEFAULT NULL COMMENT '时间段内的计划数',
  `finish_count` int(10) NOT NULL COMMENT '完成量',
  `plan_id` bigint(20) NOT NULL COMMENT '计划id',
  `order_id` bigint(20) NOT NULL COMMENT '订单id',
  `produce_time` date DEFAULT NULL COMMENT '生产时间',
  KEY `proId_timeseg` (`produce_id`,`time_segment`) COMMENT '生产任务id加时间段的联合索引'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

SQL还是像方案1一样，这次耗时下降到了 4819ms。

2.3 方案3

虽然加了索引之后，性能提升了不少，但还是需要将近5s，显然还是要继续优化的。

思路就是 1. 先创建一张临时表，

2. 然后将要更新的数据批量插入到这张表里面，

3. 接着联合aps_produce_statistics和临时表进行批量update，

只执行一次update指令，减少了扫描的行数。最终耗时下降到了124ms。

<update id="updateProduceTimeCount">
	create temporary table statistics_tmp(produce_id bigint(20), time_segment tinyint(2), count int(10), KEY `proId_timeseg` (`produce_id`,`time_segment`));
	insert into statistics_tmp values
	<foreach collection="list" item="item" separator="," close=";">
		(#{item.produceId}, #{item.timeSegment}, #{item.finishCount})
	</foreach>
	update aps_produce_statistics st, statistics_tmp temp set st.finish_count=st.finish_count + temp.count where st.produce_id=temp.produce_id and st.time_segment = temp. time_segment;
</update>