删除分区表TB_city表中2017-03重复记录中的异常记录
TB_city(是按月分区,对每个分区大概6个G,5000万条数据)
city mobile plat_flag month_time
济南 13791135001 1 2017-03
青岛 13791135001 1 2017-03
烟台 18509230617 2 2017-03
日照 13891335067 2 2016-02
德州 13891335067 2 2016-02
济南 13791135001 2 2017-03
青岛 13791135001 2 2017-03
TB_AREA
city phone_no
济南 137911
青岛 158012
日照 138913
德州 150201
drop table TB_city;
drop table TB_AREA;
create table TB_city partition by range(month_time)
( partition t_p1 values less than ('2016-02') tablespace users,
partition t_p2 values less than ('2016-03') tablespace users,
partition t_p3 values less than ('2017-03') tablespace users,
partition t_pmax values less than (maxvalue) tablespace users )
as
select '济南' city,'13791135001' mobile,'1' plat_flag,'2017-03' month_time from dual
union all
select '青岛' city,'13791135001' mobile,'1' plat_flag,'2017-03' month_time from dual
union all
select '烟台' city,'18509230617' mobile,'2' plat_flag,'2017-03' month_time from dual
union all
select '日照' city, '13891335067' mobile,'2' plat_flag,'2016-02' month_time from dual
union all
select '德州' city, '13891335067' mobile,'2' plat_flag,'2016-02' month_time from dual
union all
select '济南' city,'13791135001' mobile,'1' plat_flag,'2014-02' month_time from dual
union all
select '青岛' city,'13791135001' mobile,'1' plat_flag,'2014-02' month_time from dual;
create table TB_AREA
as
select '济南' city,'137911' phone_no from dual
union all
select '青岛' city,'158012' phone_no from dual
union all
select '日照' city,'138913' phone_no from dual
union all
select '德州' city,'150201' phone_no from dual;
之前开发者写的SQL
这个sql是每个月月初执行一次,用于删除上个月的异常记录,
每次后自行的时间大概十个小时左右
explain plan for
delete from TB_city ti
where exists (select t.mobile,t.plat_flag
from TB_city t
where ti.mobile = t.mobile
and t.plat_flag = ti.plat_flag
and month_time = '2017-03'
group by t.mobile, t.plat_flag
having count(*) > 1)
and exists (select 1
from TB_AREA ta
where ta.phone_no = substr(ti.mobile, 1, 6)
and ta.city <> ti.city);
select * from table(dbms_xplan.display);
分析: 删除条件:
1、group by t.mobile,t.plat_flag having count(*)>1 的记录,找到青岛,济南,13791135001这条记录
2、条件1中的重复记录与字典表关联,city不相等的记录,找到青岛13791135001这条记录,删除时同时会将之前分区的满足青岛13791135001的数据删除
针对以上的SQL,经过测试,并看执行计划,缺少一个过滤条件,在最后增加where ti.month_time = '2017-03'
加了外部的限定条件,就是单独扫描指定分区的,因为分区表的优势就在于 分区裁剪不然,Oracle就会从2014年的分区开始,挨个扫描
查看执行计划:
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
---------------
Plan hash value: 110916608
-----------------------------------------------------------------------------------------------
-------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart|
Pstop |
-----------------------------------------------------------------------------------------------
-------
| 0 | DELETE STATEMENT | | 1 | 36 | 22 (0)| 00:00:01 | |
|
| 1 | DELETE | TB_CITY | | | | | |
|
|* 2 | FILTER | | | | | | |
|
|* 3 | HASH JOIN SEMI | | 1 | 36 | 15 (0)| 00:00:01 | |
|
| 4 | PARTITION RANGE ALL | | 5 | 110 | 12 (0)| 00:00:01 | 1 |
4 |
| 5 | TABLE ACCESS FULL | TB_CITY | 5 | 110 | 12 (0)| 00:00:01 | 1 |
4 |
| 6 | TABLE ACCESS FULL | TB_AREA | 4 | 56 | 3 (0)| 00:00:01 | |
|
|* 7 | FILTER | | | | | | |
|
| 8 | SORT GROUP BY NOSORT | | 1 | 25 | 7 (0)| 00:00:01 | |
|
| 9 | PARTITION RANGE SINGLE| | 1 | 25 | 7 (0)| 00:00:01 | 4 |
4 |
|* 10 | TABLE ACCESS FULL | TB_CITY | 1 | 25 | 7 (0)| 00:00:01 | 4 |
4 |
-----------------------------------------------------------------------------------------------
-------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "TB_CITY" "T" WHERE "T"."MOBILE"=:B1 AND
"T"."PLAT_FLAG"=:B2 AND "MONTH_TIME"='2017-03' GROUP BY "T"."MOBILE","T"."PLAT_FL
AG" HAVING
COUNT(*)>1))
3 - access("TA"."PHONE_NO"=SUBSTR("TI"."MOBILE",1,6))
filter("TA"."CITY"<>"TI"."CITY")
7 - filter(COUNT(*)>1)
10 - filter("T"."MOBILE"=:B1 AND "T"."PLAT_FLAG"=:B2 AND "MONTH_TIME"='2017-03')
Note
-----
- dynamic sampling used for this statement (level=2)
已选择32行。
存在两点问题:
①:PARTITION RANGE ALL 的意思是扫描所有分区,但是这里我们只需要扫描3月份这个分区就可以了
id=2,FILTER,oracle数据库中只有一个节点的filter主要是起一个过滤的作用比如id=7这个地方,
再看id=2这里,是走了FILTER,就是说3456返回多少唯一结果集,那么78910就会被执行多少次
②缺少时间的过滤条件,导致将3月之前的分区的重复的错误数据删除了,但是这里是要求每月执行一次的,不用管之前分区的数据。
由于每个月返回的数据大概是5000条数据,相对与5千万条数据是比较少的所以这里可以通过两种方式进行改写
① 使用in来改写exists并加过滤条件
delete/*+ gather_plan_statistics */ from TB_city where(city,mobile)in
(select s.city,s.mobile
from (select c.city,
c.mobile,
count(*) over(partition by c.mobile, c.plat_flag) rn
from TB_city c
where c.month_time = '2017-03') s
inner join TB_AREA a
on (substr(s.mobile, 1, 6) = a.phone_no)
where rn > 1
and a.city <> s.city)
and month_time = '2017-03'
;
执行计划
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | | 0 |00:00:00.01 | 13 | | | |
| 1 | DELETE | TB_CITY | 1 | | 0 |00:00:00.01 | 13 | | | |
|* 2 | HASH JOIN SEMI | | 1 | 5 | 2 |00:00:00.01 | 9 | 1087K| 1087K| 644K (0)|
| 3 | PARTITION RANGE SINGLE | | 1 | 5 | 5 |00:00:00.01 | 3 | | | |
|* 4 | TABLE ACCESS FULL | TB_CITY | 1 | 5 | 5 |00:00:00.01 | 3 | | | |
| 5 | VIEW | VW_NSO_1 | 1 | 4 | 2 |00:00:00.01 | 6 | | | |
|* 6 | HASH JOIN | | 1 | 4 | 2 |00:00:00.01 | 6 | 1599K| 1599K| 1067K (0)|
| 7 | TABLE ACCESS FULL | TB_AREA | 1 | 4 | 4 |00:00:00.01 | 3 | | | |
|* 8 | VIEW | | 1 | 5 | 4 |00:00:00.01 | 3 | | | |
| 9 | WINDOW SORT | | 1 | 5 | 5 |00:00:00.01 | 3 | 73728 | 73728 | |
| 10 | PARTITION RANGE SINGLE| | 1 | 5 | 5 |00:00:00.01 | 3 | | | |
|* 11 | TABLE ACCESS FULL | TB_CITY | 1 | 5 | 5 |00:00:00.01 | 3 | | | |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CITY"="CITY" AND "MOBILE"="MOBILE")
4 - filter("MONTH_TIME"='2017-03')
6 - access("A"."PHONE_NO"=SUBSTR("S"."MOBILE",1,6))
filter("A"."CITY"<>"S"."CITY")
8 - filter("S"."RN">1)
11 - filter("C"."MONTH_TIME"='2017-03')
Note
-----
- dynamic sampling used for this statement (level=2)
②删除的话使用这里使用NL还是比较快的
delete /*+ gather_plan_statistics */ /*+ qb_name(outer) */ /*+ leading(s@inner) use_nl(t@outer)*/
from TB_city t
where rowid in (select /*+ qb_name(inner) */
s.rowid
from (select /*+ full(c) */
c.city,
c.mobile,
count(*) over(partition by c.mobile, c.plat_flag) rn
from TB_city c
where c.month_time = '2017-03') s
inner join TB_AREA a
on (substr(s.mobile, 1, 6) = a.phone_no)
where rn > 1
and a.city <> s.city);
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | | 0 |00:00:00.01 | 11 | | | |
| 1 | DELETE | TB_CITY | 1 | | 0 |00:00:00.01 | 11 | | | |
| 2 | NESTED LOOPS | | 1 | 1 | 2 |00:00:00.01 | 7 | | | |
| 3 | VIEW | VW_NSO_1 | 1 | 4 | 2 |00:00:00.01 | 6 | | | |
| 4 | SORT UNIQUE | | 1 | 1 | 2 |00:00:00.01 | 6 | 73728 | 73728 | |
|* 5 | HASH JOIN | | 1 | 4 | 2 |00:00:00.01 | 6 | 1599K| 1599K| 1057K (0)|
| 6 | TABLE ACCESS FULL | TB_AREA | 1 | 4 | 4 |00:00:00.01 | 3 | | | |
|* 7 | VIEW | | 1 | 5 | 4 |00:00:00.01 | 3 | | | |
| 8 | WINDOW SORT | | 1 | 5 | 5 |00:00:00.01 | 3 | 73728 | 73728 | |
| 9 | PARTITION RANGE SINGLE| | 1 | 5 | 5 |00:00:00.01 | 3 | | | |
|* 10 | TABLE ACCESS FULL | TB_CITY | 1 | 5 | 5 |00:00:00.01 | 3 | | | |
| 11 | TABLE ACCESS BY USER ROWID | TB_CITY | 2 | 1 | 2 |00:00:00.01 | 1 | | | |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."PHONE_NO"=SUBSTR("S"."MOBILE",1,6))
filter("A"."CITY"<>"S"."CITY")
7 - filter("S"."RN">1)
10 - filter("C"."MONTH_TIME"='2017-03')
Note
-----
- dynamic sampling used for this statement (level=2)