join
join_table:
table_reference [INNER] JOIN table_factor [join_condition]
| table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition
| table_reference LEFT SEMI JOIN table_reference join_condition
| table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10)
table_reference:
table_factor
| join_table
table_factor:
tbl_name [alias]
| table_subquery alias
| ( table_references )
join_condition:
ON expression
普通join。只支持等值Join!
SELECT a.* FROM a JOIN b ON (a.id = b.id)
SELECT a.* FROM a JOIN b ON (a.id = b.id AND a.department = b.department)
SELECT a.* FROM a LEFT OUTER JOIN b ON (a.id <> b.id)
多表join
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
LEFT SEMI JOIN
SELECT a.key, a.value
FROM a
WHERE a.key in
(SELECT b.key
FROM B);
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b ON (a.key = b.key)
LEFT SEMI JOIN关键字前面的表为主表,返回主表的KEY也在副表中的记录
LEFT JOIN左连接,左表所有row,都有,如果跟右表没关联上,显示右表字段为空
RIGHT JOIN 右连接,右表所有row,都有,如果跟左表没关联上,显示左表字段为空
FULL JOIN 左右两张表的row都有,没有关联上的字段为空
如果join的table有分区,在on里面指定分区信息
SELECT page_views.*
FROM page_views JOIN dim_users
ON (page_views.user_id = dim_users.id AND page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31')
union 求并集、需要有相同的表结构。
SELECT key FROM (SELECT key FROM src ORDER BY key LIMIT 10)subq1
UNION
SELECT key FROM (SELECT key FROM src1 ORDER BY key LIMIT 10)subq2