Oracle 行列转换实例 列转行报表

简单说明:

本博文不面向Oracle行列转换零基础的筒子,这是一个高级列转行之后的报表实例:
有张表存在冗余数据
  id1,url1
  id2,url1&url2&url3&url3&url2...
  id3,url1&url2&url3&url3&url2...
  id4,url1&url4&url5&url5...
现在要根据url进行记录滤重,将重复出现的url对应的id查出来拼接在一起
url是不确定的,可能有上万个url甚至更多,一条记录中的url可能存在重复,相邻的记录之间的url可能是完全重复的

准备测试数据:

create table id_urls(id number,urls varchar2(255));
set define off
insert into ID_URLS values(1,'URL1');
insert into ID_URLS values(2,'URL1');
insert into ID_URLS values(3,'URL1&URL2&URL3');
insert into ID_URLS values(4,'URL1&URL2&URL3');
insert into ID_URLS values(5,'URL2&URL3&URL2');
insert into ID_URLS values(6,'URL2&URL3&URL2');
insert into ID_URLS values(7,'URL6&URL7&URL8&URL9&URL0');
insert into ID_URLS values(8,'URL10');
commit;
set define on

查询推导过程:

1° 将URL替换为空,查看该字段最多有几个URL

select regexp_replace(urls, '[^&]+', '') from ID_URLS;
select length(regexp_replace(urls, '[^&]+', '')) from ID_URLS;
select max(length(regexp_replace(urls, '[^&]+', ''))) + 1 as max_len from ID_URLS;
-- 最大数量至少要加1,如果URL数量最多的记录以分隔符&为行尾,加1会导致最大URL个数比实际个数大1
-- 求出来的最大URL个数是做记录笛卡尔积的依据
-- 比如当前记录只有一个URL,最大URL个数为10个,那么该行记录会笛卡尔积10次
-- 然后将10次笛卡尔积的结果集根据&符切分出第1组到第10组,会切出9组空记录,滤空即可解决
-- 如果求出的最大URL个数比实际小的话,则会造成切分时丢失数据
-- 比如当前记录有10个URL,但是求出的最大URL个数为5,那么该记录只会笛卡尔积5次
-- 然后将5次笛卡尔积的结果集根据&符切分出第1组到第5组,就会丢失5组数据
-- 因此求出的最大URL个数比实际个数大是可以的,小是不行的
-- 只有判断表中所有记录包含或者不包含以&分隔符结尾的情况才能得出正好相等的最大URL个数

2° 如果要将所有的URL做列传行的还原,至少需要每条记录重复max_len次

with max_length as
 (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1 as max_len from ID_URLS)
select level as lvl from dual connect by level <= (select max_len from max_length);

3° 每条记录都重复max_len次,列传行切出来所有的URL

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length))
select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat order by id;

4° 因为并不是所有的记录的URL都有max_len个,所以存在URL为空的记录,过滤掉

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat)
select id, single_url from cut_url where single_url is not null order  by id;

5° 因为同一条记录的URL有可能存在重复,因此需要滤重

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat)
select distinct id, single_url from cut_url where single_url is not null order  by id;

6° 查看一下有多少URL重复,每个URL重复了多少次

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat),
distinct_record as
  (select distinct id, single_url from cut_url where single_url is not null order  by id)
select single_url,count(*) from distinct_record where single_url is not null group by single_url having count(*)>1;

7° 将重复的URL对应的ID拼接在一起,丢到需求人脸上

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat),
distinct_record as
  (select distinct id, single_url from cut_url where single_url is not null order  by id)
select single_url,to_char(wm_concat(id)) from distinct_record where single_url is not null group by single_url having count(*)>1;

收尾:

需求产生原因:

这是一个真实的生产需求,该表存的URL是图片URL
用户前台页面上传图片,一次上传的图片可能是多个
图片上传到第三方,第三方将上传的图片存储的URL返回给后台
因为网络原因或者第三方的BUG,造成多次将用户的一次上传行为返回给后台
且返回信息中的图片URL存在重复
后台未做处理直接将这些图片URL入库,造成了本次需求的产生

解决办法:

后台将收到的图片URL存入Redis,一个图片URL存一个KEY,一分钟超时
当收到图片URL时检索Redis中的当前一分钟内的图片URL,如果存在则不再重复入库
如果不存在,新建该图片URL的KEY,设置一分钟超时,入库

题外话:

从解决办法上可以明显看出锅到底是谁的
该难度的SQL业务开发无法写出(他们少点BUG就谢天谢地了)
普通的维护向DBA无法写出,专司BI报表的DBA或SQL开发能够写出
但是作为DBA,应当追求熟练揉捏手中数据的能力

[TOC]

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值