面试的时候一直被问到union all 和union 的区别。一直停留在书面上的理解,今天碰到需求,刚好符合union all的用法,于是便有了今天的这篇文章。
具体业务是把两张表的集合合并到一张表里,然后向报表写入数据。这里采用存储过程的方式实现,使用Spring quartz 实现定时调用。由于用的是mysql数据库,所以下面的事例也是遵循mysql语法。具体代码如下:
DELIMITER $$
USE `mydb` $$
DROP PROCEDURE IF EXISTS `proc_summary_report` $$
CREATE PROCEDURE `proc_summary_report` ()
BEGIN
DECLARE error INT DEFAULT 0 ;
DECLARE last_date DATE ;
DECLARE update_timestamp DATE ;
DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SET error = 1 ;
-- begin transaction
START TRANSACTION ;
-- set auto commit is 0;
SET autocommit = 0 ;
SELECT IFNULL(MAX(static_date), '1970-01-01') INTO last_date FROM t_report_summary ;
-- search max data in dataset
SELECT MAX(static_date) INTO update_timestamp
FROM (
(SELECT rt.static_date,0 AS totalIncome,SUM(rt.income) AS not_direct_total_income,0 AS direct_income,SUM(CASE WHEN rt.ssp_id = 8 THEN rt.income ELSE 0 END) AS 'lenovo_income',
SUM(CASE WHEN rt.ssp_id != 1 AND rt.ssp_id != 8 THEN rt.income ELSE 0 END) AS 'other_income'
FROM t_report_ssp_byday rt GROUP BY rt.static_date
)UNION ALL
(SELECT t.static_date,0 AS totalIncome,0 AS not_direct_total_income,SUM(cost) AS 'direct_income',0 AS 'lenovoIncome',0 AS 'otherIncome'
FROM t_static_ad_byday t WHERE ssp_id = 1 GROUP BY t.static_date
)
) st;
-- begin process insert operation
IF(update_timestamp > last_date)
THEN
INSERT INTO t_report_summary (
static_date,
total_income,
direct_income,
lenovo_income,
others_income
)
SELECT
static_date,
SUM((not_direct_total_income + direct_income)) total_income,
direct_income,
lenovo_income,
other_income
FROM
(
(SELECT rt.static_date,
0 AS totalIncome,
SUM(rt.income) AS not_direct_total_income,
0 AS direct_income,
SUM(CASE WHEN rt.ssp_id = 8 THEN rt.income ELSE 0 END) AS 'lenovo_income',
SUM(CASE WHEN rt.ssp_id != 1 AND rt.ssp_id != 8 THEN rt.income ELSE 0 END) AS 'other_income'
FROM t_report_ssp_byday rt WHERE rt.static_date > last_date GROUP BY rt.static_date
)
UNION ALL
(SELECT t.static_date,
0 AS totalIncome,
0 AS not_direct_total_income,
SUM(cost) AS 'direct_income',
0 AS 'lenovoIncome',
0 AS 'otherIncome'
FROM t_static_ad_byday t WHERE ssp_id = 1 AND t.static_date > last_date GROUP BY t.static_date
)
) st GROUP BY static_date;
IF error = 1
THEN ROLLBACK ;
ELSE COMMIT ;
END IF ;
END IF ;
END $$
DELIMITER ;
这样就实现两张表数据集的合并及计算
下面补充一下union的用法:在数据库中,UNION和UNION ALL关键字都是将两个结果集合并为一个。UNION在进行表链接后会筛选掉重复的记录,所以在表链接后会对所产生的结果集进行排序运算,删除重复的记录再返回结果,但实际大部分应用中是不会产生重复的记录。
UNION ALL只是简单的将两个结果合并后就返回。这样,如果返回的两个结果集中有重复的数据,那么返回的结果集就会包含重复的数据了。
从效率上说,UNION ALL 要比UNION快很多,所以,如果可以确认合并的两个结果集中不包含重复的数据的话,那么就使用UNION ALL。