日常工作中,我们常常需要开发报表或者统计一些数据的百分比、合计,这时候,下面的几个函数可以很快地解决问题 。它们不仅sql语句少,而且性能更好。下图即为我们这次要统计的结果展示。
常用的函数如下:case when...then...else...end, regexp_like, ratio_to_report(score) OVER(), rollup, grouping,本文只是展示在sql中如何用这些函数巧妙地统计出这种效果,具体用法可自查阅资料了解更多细节。
首先,需要你统计的东西,类似这样子:
本人的表由于数据需要转换,所以需要使用下面的sql语句进行处理。如有不同,请跳过这一段。
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
接下来,你统计的数据展示可以再次转换为这样,即这一步可以求百分比:
本人的数据需要转换,所以需要使用下面的sql语句进行处理。其实就是在这一步用ratio_to_report(aa.score) OVER()处理。如有不同,请跳过这一段。
SELECT aa.region, aa.score
, round(ratio_to_report(aa.score) OVER (), 4) AS percents
FROM (
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
) aa
ORDER BY aa.region
接下来,就需要我们分组合计了,这个时候,rollup派上用场了。
使用了rollup后,我们的数据展示变成了:
下面的sql语句就是为了展示rollup使用后的效果,并非最后的效果,可以跳过这一段:
SELECT region, SUM(score) AS score, SUM(percents) AS percents
FROM (
SELECT aa.region, aa.score
, round(ratio_to_report(aa.score) OVER (), 4) AS percents
FROM (
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
) aa
ORDER BY aa.region
) t_test
GROUP BY region WITH ROLLUP;
当我们使用了rollup后,我们发现,最下面的合计这一行,要进行合计的分类字段region最下面的值是为空的,所以,配合grouping这个函数使用,即将可以达到我们最后想要的效果。
grouping函数可以接受一列,返回0或者1。如果列值为空,那么grouping()返回1;如果列值非空,那么返回0。grouping只能在使用rollup或cube的查询中使用。当需要在返回空值的地方显示某个值时,grouping()就非常有用。
下图是最终的sql语句:
SELECT CASE
WHEN grouping(region) = 1 THEN '合计'
ELSE region
END AS region, SUM(score) AS score, SUM(percents) AS percents
FROM (
SELECT aa.region, aa.score
, round(ratio_to_report(aa.score) OVER (), 4) AS percents
FROM (
SELECT region, SUM(CASE region
WHEN 'Africa' THEN 1
WHEN 'Asia' THEN 1
WHEN 'Europe' THEN 1
WHEN 'NorthAmerica' THEN 1
WHEN 'Oceania' THEN 1
WHEN 'SouthAmerica' THEN 1
WHEN 'unkonwn' THEN 1
ELSE 0
END) AS score
FROM (
SELECT CASE
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[A-C]') THEN 'Africa'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[J-R]') THEN 'Asia'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[S-Z]') THEN 'Europe'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[1-5]') THEN 'NorthAmerica'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[6-7]') THEN 'Oceania'
WHEN regexp_like(substr(TRANSDAY, 0, 1), '[8-9]') THEN 'SouthAmerica'
ELSE 'unkonwn'
END AS region
FROM test
)
GROUP BY region
) aa
ORDER BY aa.region
) t_test
GROUP BY region WITH ROLLUP;
以上就是我们数据最后的展示效果。sql语句可以进一步优化,由于时间问题,后续有时间进一步优化。