我有—-的查询要求.试图用CONNECT BY解决它,但似乎无法获得我需要的结果.
表格(简体):
create table CSS.USER_DESC (
USER_ID VARCHAR2(30) not null,
NEW_USER_ID VARCHAR2(30),
GLOBAL_HR_ID CHAR(8)
)
-- USER_ID is the primary key
-- NEW_USER_ID is a self-referencing key
-- GLOBAL_HR_ID is an ID field from another system
用户数据有两个来源(数据馈送)…更新信息时,我必须注意其中两个错误.
场景:
>为用户提供了新的用户ID …相应地设置了旧记录并停用了该记录(通常是全职承包商的重命名)
>用户离开并稍后返回. HR无法向我们发送旧的用户ID,因此我们可以连接帐户.
>系统搞砸了,并且没有在旧记录上设置新的用户ID.
>数据可能以其他一百种方式出现问题
我需要知道以下用户是同一用户,而且我不能依赖名称或其他字段…它们在匹配记录之间有所不同:
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 2 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 2 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
GL110456 1 1 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
EXOT1100和EX000005通过NEW_USER_ID字段正确连接.重命名是在没有全局HR ID之前发生的,因此EX0T1100没有. EX000005被赋予了新的用户ID“ GL110456”,并且两者仅通过具有相同的全局HR ID进行连接.
清理数据不是一种选择.
到目前为止的查询:
select connect_by_root cud.user_id RootUser,
count(connect_by_root cud.user_id) over (partition by connect_by_root cud.user_id) NumRoots,
level NodeLevel, connect_by_isleaf IsLeaf, --connect_by_iscycle IsCycle,
cud.user_id, cud.new_user_id, cud.global_hr_id,
cud.user_type_code UserType, ccud.last_name, cud.first_name
from css.user_desc cud
where cud.user_id in ('EX000005','EX0T1100','GL110456')
-- Using this so I don't get sub-users in my list of root users...
-- It complicates the matches with GLOBAL_HR_ID, however
start with cud.user_id not in (select cudsub.new_user_id
from css.user_desc cudsub
where cudsub.new_user_id is not null)
connect by nocycle (prior new_user_id = user_id);
我已经尝试了各种CONNECT BY子句,但是没有一个是正确的:
-- As a multiple CONNECT BY
connect by nocycle (prior global_hr_id = global_hr_id)
connect by nocycle (prior new_user_id = user_id)
-- As a compound CONNECT BY
connect by nocycle ((prior new_user_id = user_id)
or (prior global_hr_id = global_hr_id
and user_id != prior user_Id))
UNIONing两个CONNECT BY查询不起作用…我没有得到水平.
这是我想看到的…我可以使用必须区分并用作子查询的结果集.我也对ROOTUSER列中的三个用户ID中的任何一个都还好…我只需要知道他们是相同的用户即可.
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 3 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 3 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
EX0T1100 3 (2 or 3) 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
有想法吗?
更新资料
尼古拉斯(Nicholas),您的代码看起来非常像正确的轨道……目前,当global_hr_id为null时,(由global_hr_id分区的)lead(user_id)会产生错误命中.例如:
USER_ID NEW_USER_ID CHAINNEWUSER GLOBAL_HR_ID LAST_NAME FIRST_NAME
FP004468 FP004469 AARON TIMOTHY
FP004469 FOONG KOK WAH
我经常想将空值视为分区中的单独记录,但是我从未找到一种使忽略空值起作用的方法.这做了我想要的:
decode(global_hr_id,null,null,lead(cud.user_id ignore nulls) over (partition by global_hr_id order by user_id)
…但是必须有更好的方法.我还无法针对完整的用户数据(约40,000个用户)完成查询. global_hr_id和new_user_id均已建立索引.
更新资料
查询会在大约750秒后返回……虽然很长,但是可以管理.它返回93k记录,因为我没有从根中过滤掉2级匹配的好方法-您从global_hr_id开始为null,但是不幸的是,并非总是如此.我将不得不考虑如何过滤掉它们.
我之前曾尝试添加更复杂的开头子句,但我发现它们分别运行< 1秒...在一起,耗时90分钟最佳答案
您仅提供了一个用户的数据样本.多一点会更好.无论如何,让我们来看看这样的事情.
SQL> with user_desc(USER_ID, NEW_USER_ID, GLOBAL_HR_ID)as(
2 select 'EX0T1100', 'EX000005', null from dual union all
3 select 'EX000005', null, 00126121 from dual union all
4 select 'GL110456', null, 00126121 from dual
5 )
6 select connect_by_root(user_id) rootuser
7 , count(connect_by_root(user_id)) over(partition by connect_by_root(user_id)) numroot
8 , level nodlevel
9 , connect_by_isleaf
10 , user_id
11 , new_user_id
12 , global_hr_id
13 from (select user_id
14 , coalesce(new_user_id, usr) new_user_id1
15 , new_user_id
16 , global_hr_id
17 from ( select user_id
18 , new_user_id
19 , global_hr_id
20 , decode(global_hr_id,null,null,lead(user_id) over (partition by global_hr_id order by user_id)) usr
21 from user_desc
22 )
23 )
24 start with global_hr_id is null
25 connect by prior new_user_id1 = user_id
26 ;
结果:
ROOTUSER NUMROOT NODLEVEL CONNECT_BY_ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID
-------- ---------- ---------- ----------------- -------- ----------- ------------
EX0T1100 3 1 0 EX0T1100 EX000005
EX0T1100 3 2 0 EX000005 126121
EX0T1100 3 3 1 GL110456 126121