Hive提供了类SQL语法的功能,可通过它来检索Hadoop存储数据,查询操作是基于MapReduce来完成的
2.通过where语句过滤查询条件
3.通过group by语句将查询结果进行分组
4.执行join查询操作
使用写有Join操作的查询语句时有一条原则:应该将条目少的表/子查询放在Join操作符的左边。原因是在Join操作的Reduce阶段,位于Join操作符左边的表的内容会被加载进内存,将条目少的表放在左边,可以有效减少发生OOM错误的几率。
inner join:
SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
outer join:
SELECT sales.*, things.* FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);
SELECT sales.*, things.* FROM sales RIGHT OUTER JOIN things ON (sales.id = things.id);
SELECT sales.*, things.* FROM sales FULL OUTER JOIN things ON (sales.id = things.id);
semi join:等同于in函数
SELECT * FROM things LEFT SEMI JOIN sales ON (sales.id = things.id);
相当于sql语句:SELECT * FROM things WHERE things.id IN (SELECT id from sales);
map join:
Join操作在map阶段完成,不再需要reduce操作,因此map join不能结合RIGHT OUTER JOIN和FULL OUTER JOIN使用(需要reduce进行聚合)
SELECT /*+ MAPJOIN(things) */ sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
*前提:在join字段,join左边的记录集合(sales)是join右边(things)的子集*,例如:
sales things
1 1
2 2
3 3
Hive功能
1.通过select语句查询指定column的数据2.通过where语句过滤查询条件
3.通过group by语句将查询结果进行分组
4.执行join查询操作
使用写有Join操作的查询语句时有一条原则:应该将条目少的表/子查询放在Join操作符的左边。原因是在Join操作的Reduce阶段,位于Join操作符左边的表的内容会被加载进内存,将条目少的表放在左边,可以有效减少发生OOM错误的几率。
inner join:
SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
outer join:
SELECT sales.*, things.* FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);
SELECT sales.*, things.* FROM sales RIGHT OUTER JOIN things ON (sales.id = things.id);
SELECT sales.*, things.* FROM sales FULL OUTER JOIN things ON (sales.id = things.id);
semi join:等同于in函数
SELECT * FROM things LEFT SEMI JOIN sales ON (sales.id = things.id);
相当于sql语句:SELECT * FROM things WHERE things.id IN (SELECT id from sales);
map join:
Join操作在map阶段完成,不再需要reduce操作,因此map join不能结合RIGHT OUTER JOIN和FULL OUTER JOIN使用(需要reduce进行聚合)
SELECT /*+ MAPJOIN(things) */ sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
*前提:在join字段,join左边的记录集合(sales)是join右边(things)的子集*,例如:
sales things
1 1
2 2
3 3