KingbaseES例程之快速删除表数据

最新推荐文章于 2023-12-21 21:08:54 发布

Kingbase 研究院

最新推荐文章于 2023-12-21 21:08:54 发布

阅读量1k

点赞数

文章标签：数据库

本文链接：https://blog.csdn.net/lyu1026/article/details/126555695

版权

概述

快速删除表中的数据

delete语句删除数据
表中的数据被删除了，但是这个数据在硬盘上的真实存储空间不会被释放。
这种删除缺点是：删除效率比较低。
这种删除优点是：支持删除部分数据，支持回滚。
truncate语句删除数据
这种删除效率比较高，表被一次截断，物理删除。
这种删除缺点：不支持删除部分数据。
这种删除优点：快速，支持回滚。

案例：删除大表数据，但保留少量数据

一张表有100万条数据，分为1000组信息，仅保留每组的最后一条数据，如何快速删除其它99万余条数据？

方法一：删除每组非最大值的数据

explain  (analyse,buffers )
delete
from test10
where (c1,id) not in (select c1,max(id) from test10 group by c1)
returning *;


Delete on test10  (cost=36508.94..56943.94 rows=500000 width=6) (actual time=221.183..1732.834 rows=998999 loops=1)
  Buffers: shared hit=2012980
  ->  Seq Scan on test10  (cost=36508.94..56943.94 rows=500000 width=6) (actual time=221.128..583.449 rows=998999 loops=1)
        Filter: (NOT (hashed SubPlan 1))
        Rows Removed by Filter: 1001
        Buffers: shared hit=9547
        SubPlan 1
          ->  GroupAggregate  (cost=0.42..36506.44 rows=1001 width=8) (actual time=0.067..219.780 rows=1001 loops=1)
                Group Key: test10_1.c1
                Buffers: shared hit=4112
                ->  Index Only Scan using idx01 on test10 test10_1  (cost=0.42..31496.42 rows=1000000 width=8) (actual time=0.010..126.628 rows=1000000 loops=1)
                      Heap Fetches: 0
                      Buffers: shared hit=4112
Planning Time: 0.120 ms
Execution Time: 1799.063 ms

方法二：CTE获取每组最新行，删除每组非CTE的数据

explain  (analyse,buffers )
with recursive cte as (
        (select c1, ctid from test10 order by c1, id desc limit 1)
        union all
        (select test10.c1, test10.CTID
         from cte,
              lateral ( select CTID, c1
                        from test10
                        where cte.c1 < test10.c1
                        order by test10.c1, test10.id desc
                        limit 1) test10
        ))
delete from test10
where not exists (select  1 from cte where cte.ctid = test10.ctid )
returning *
;
Delete on test10  (cost=62.30..28121.41 rows=999899 width=36) (actual time=10.799..1627.548 rows=998999 loops=1)
  Buffers: shared hit=2013025
  CTE cte
    ->  Recursive Union  (cost=0.42..59.02 rows=101 width=10) (actual time=0.012..9.888 rows=1001 loops=1)
          Buffers: shared hit=4157
"          ->  Subquery Scan on ""*SELECT* 1""  (cost=0.42..0.49 rows=1 width=10) (actual time=0.010..0.013 rows=1 loops=1)"
                Buffers: shared hit=4
                ->  Limit  (cost=0.42..0.48 rows=1 width=14) (actual time=0.010..0.011 rows=1 loops=1)
                      Buffers: shared hit=4
                      ->  Index Scan using idx02 on test10 test10_1  (cost=0.42..54240.28 rows=1000000 width=14) (actual time=0.010..0.010 rows=1 loops=1)
                            Buffers: shared hit=4
          ->  Nested Loop  (cost=0.42..5.65 rows=10 width=10) (actual time=0.009..0.009 rows=1 loops=1001)
                Buffers: shared hit=4153
                ->  WorkTable Scan on cte cte_1  (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.000 rows=1 loops=1001)
                ->  Limit  (cost=0.42..0.53 rows=1 width=14) (actual time=0.009..0.009 rows=1 loops=1001)
                      Buffers: shared hit=4153
                      ->  Index Scan using idx02 on test10 test10_2  (cost=0.42..33409.58 rows=333333 width=14) (actual time=0.009..0.009 rows=1 loops=1001)
                            Index Cond: (c1 > cte_1.c1)
                            Buffers: shared hit=4153
  ->  Hash Anti Join  (cost=3.28..28062.39 rows=999899 width=36) (actual time=10.727..422.146 rows=998999 loops=1)
        Hash Cond: (test10.ctid = cte.ctid)
        Buffers: shared hit=9592
        ->  Seq Scan on test10  (cost=0.00..15435.00 rows=1000000 width=6) (actual time=0.005..141.828 rows=1000000 loops=1)
              Buffers: shared hit=5435
        ->  Hash  (cost=2.02..2.02 rows=101 width=36) (actual time=10.713..10.714 rows=1001 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 77kB
              Buffers: shared hit=4157
              ->  CTE Scan on cte  (cost=0.00..2.02 rows=101 width=36) (actual time=0.049..10.400 rows=1001 loops=1)
                    Buffers: shared hit=4157
Planning Time: 0.201 ms
Execution Time: 1691.687 ms

方法三：数组变量与truncate组合，支持事务回滚

do
$$
    declare
        v_rec test10[];
    begin
        v_rec := array(
                with recursive cte as (
                        (select id, c1, c2 from test10 order by c1, id desc limit 1)
                        union all
                        (select test10.id, test10.c1, test10.c2
                         from cte,
                              lateral ( select test10.id, test10.c1, test10.c2
                                        from test10
                                        where cte.c1 < test10.c1
                                        order by test10.c1, test10.id desc
                                        limit 1) test10
                        ))
                select (id, c1, c2)
                from cte);
        truncate test10;

        insert into test10
        select (t).*
        from (select unnest(v_rec) t) t;
        commit;
    exception
        when others then
            rollback;
    end;
$$
;

ANONYMOUS BLOCK
Time: 99.299 ms

TRUNCATE与DML操作的组合，实现通过少量数据的DML操作，实现DELETE大部分数据操作，可以减少执行时长。由于truncate支持事务回滚，可以在发生异常时回滚事务，或主动回滚事务，保证数据的完整性。

Kingbase 研究院

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
KingbaseES例程之快速删除表数据

概述快速删除表中的数据delete语句删除数据表中的数据被删除了，但是这个数据在硬盘上的真实存储空间不会被释放。这种删除缺点是：删除效率比较低。这种删除优点是：支持删除部分数据，支持回滚。truncate语句删除数据这种删除效率比较高，表被一次截断，物理删除。这种删除缺点：不支持删除部分数据。这种删除优点：快速，支持回滚。案例：删除大表数据，但保留少量数据一张表...
复制链接

扫一扫