在数据库中,表与表之间的关联,通过JOIN连接。可以理解为“横向关联”,如果是多个大表,“横向关联”,效率比较慢;
“纵向关联”:UNION每个表,再GROUPBY去重,得到“关联”的效果。“纵向关联”效率比“横向关联”强很多。
举例:T1,T2,T3,T4,T5,每个表有5000万条数据。
“横向关联”: JOIN关联,实际是5000万*5000万*5000万*5000万*5000万,实际是在笛卡尔集基础上,再过滤条件。【Hadoop:超过几个小时】
“纵向关联”:5000万+5000万+5000万+5000万+5000万。【Hadoop:30分内】
“纵向关联”示例代码:
INSERT OVERWRITE TABLET_DM_SCORE
SELECT
'20160606',
CUST_NO,
ID_NUMBER,
NAME,
MAX(SCORE1) AS SCORE1,
MAX(SCORE2) AS SCORE2,
MAX(SCORE3) AS SCORE3,
MAX(SCORE4) AS SCORE4,
MAX(SCORE5) AS SCORE5,
0 AS SCORE6,
0 AS SCORE7,
(MAX(SCORE1) + MAX(SCORE2) + MAX(SCORE3) + MAX(SCORE4) + MAX(SCORE5)) ASSCORE_ALL,
MAX(ASSET_MAVG_SUM) AS ASSET_MAVG_SUM
FROM
(SELECTCUST_NO, CERT_ID AS ID_NUMBER, CUST_NAME AS NAME,
-90000 AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3, -90000 AS SCORE4, -90000AS SCORE5, NVL(ASSET_MAVG_SUM,0) AS ASSET_MAVG_SUM
FROM T_GET_WAIVER_ALL_CUST_BPH_INFO T0
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, SCORE AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3,-90000 AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_ABILITY_SCORET1
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, SCORE AS SCORE2, -90000 AS SCORE3,-90000 AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_ACTIVETY_SCORET2
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, -90000 AS SCORE2, SCORE AS SCORE3,-90000 AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_BEHAVIOR_SCORET3
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3,SCORE AS SCORE4, -90000 AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_CREDIT_SCORET4
UNION ALL
SELECTCUST_NO, ID_NUMBER, NAME, -90000 AS SCORE1, -90000 AS SCORE2, -90000 AS SCORE3,-90000 AS SCORE4, SCORE AS SCORE5, -90000 AS ASSET_MAVG_SUM
FROM T_DM_IDENTITY_SCORET5
)TA
GROUPBY CUST_NO, ID_NUMBER, NAME;
“纵向关联”理解:第2步骤为UNION合并表,第3步GROUP BY去重,达到“关联”的目的。
注:不适用一对多的关系表关联