从邮箱中找出存在相同前缀名不同后缀名,认为两邮箱实体存在关系,并指定一种前缀相同的邮箱指向其余相同的前缀邮箱地址
例如:
AR_NAME_START AR_NAME
CouwetIzotofo CouwetIzotofo@112.com
CouwetIzotofo CouwetIzotofo@163.com
CouwetIzotofo CouwetIzotofo@o2.pl
54561545 54561545@qq.com
54561545 54561545@123.com
HUH9329 HUH9329@GMAIL.COM
HUH9329 HUH9329@qq.COM
youxiangyan youxiangyan@111.com
youxiangyan youxiangyan@126.com
youxiangyan youxiangyan@gmail.com
转换变成一下结果
CouwetIzotofo@112.com CouwetIzotofo@163.com
CouwetIzotofo@112.com CouwetIzotofo@o2.pl
54561545@qq.com 54561545@123.com
HUH9329@GMAIL.COM HUH9329@qq.COM
youxiangyan@111.com youxiangyan@126.com
youxiangyan@111.com youxiangyan@gmail.com
思路:
1.按照@切分取出前缀,找到相同前缀的邮箱关系
2.按照邮箱前缀划分,取出出现次数大于1的全部数据
3.取出任意一类邮箱,并让指向其余同类邮箱
SQL
1.按照@切分取出前缀,找到相同前缀的邮箱关系
select
*,split(AR_NAME, '@')[0] as AR_NAME_START
from
a
2.按照邮箱前缀划分,取出出现次数大于1的全部数据
SELECT t.AR_NAME_START,
AR_NAME
FROM a t
JOIN ( SELECT s.AR_NAME_START
FROM a s
GROUP BY s.AR_NAME_START
HAVING COUNT(1) > 1
) d
ON d.AR_NAME_START = t.AR_NAME_START
3.取出任意一类邮箱,并让指向其余同类邮箱
select
aaa.AR_NAME as src_ar_name,c.AR_NAME as des_ar_name
from
(select *,row_number()over(partition by AR_NAME_START order by AR_NAME_START) rn
from a)as aaa
join
(
select
bbb.*
from
(select *,row_number()over(partition by AR_NAME_START order by AR_NAME_START) rn
from a) as bbb
where bbb.rn > 1
) as c
on aaa.AR_NAME_START = c.AR_NAME_START
where aaa.rn =1