HIVE查询

查询的一些例子:
1.query
hive> SELECT name, subordinates[0] FROM employees;
John Doe Mary Smith
Mary Smith Bill King
Todd Jones NULL
2.expression
hive> SELECT upper(name), salary, deductions["Federal Taxes"],
round(salary * (1 - deductions["Federal Taxes"])) FROM employees;
3.expression
SELECT count(*), avg(salary) FROM employees;
4.distinct
SELECT count(DISTINCT symbol) FROM stocks;
5.limit
hive> SELECT upper(name), salary, deductions["Federal Taxes"],
>round(salary * (1 - deductions["Federal Taxes"])) FROM employees
> LIMIT 2;
JOHN DOE 100000.0 0.2 80000
MARY SMITH 80000.0 0.2 64000
6.列名 别名
SELECT upper(name), salary, deductions["Federal Taxes"] as fed_taxes,
>round(salary * (1 - deductions["Federal Taxes"])) as
salary_minus_fed_taxes
> FROM employees LIMIT 2;
7.嵌套select (不可有having在内部select)
hive> FROM (

> SELECT upper(name), salary, deductions["Federal Taxes"] as fed_taxes,
>round(salary * (1 - deductions["Federal Taxes"])) as
salary_minus_fed_taxes
> FROM employees
> ) e
> SELECT e.name, e.salary_minus_fed_taxes
> WHERE e.salary_minus_fed_taxes > 70000;
JOHN DOE 100000.0 0.2 80000
8. case when then
hive> SELECT name, salary,
> CASE
> WHEN salary < 50000.0 THEN 'low'
> WHEN salary >= 50000.0 AND salary < 70000.0 THEN 'middle'
> WHEN salary >= 70000.0 AND salary < 100000.0 THEN 'high'
> ELSE 'very high'
> END AS bracket FROM employees;
John Doe 100000.0 very high
Mary Smith 80000.0 high
Todd Jones 70000.0 high
Bill King 60000.0 middle
Boss Man 200000.0 very high
9.hive不适用map reduce
SELECT * FROM employees;
SELECT * FROM employees
WHERE country = 'US' AND state = 'CA'
LIMIT 100;
10.使用like和rlick
like是如同sql语句
hive> SELECT name, address.street FROM employees WHERE address.street LIKE
'%Chi%';
rlick可以使用如同java的正则
hive> SELECT name, address.street
> FROM employees WHERE address.street RLIKE '.*(Chicago|Ontario).*';
Mary Smith 100 Ontario St.
Todd Jones 200 Chicago Ave.
11 group by语句
hive> SELECT year(ymd), avg(price_close) FROM stocks
> WHERE exchange = 'NASDAQ' AND symbol = 'AAPL'
> GROUP BY year(ymd);
1984 25.578625440597534
12 having语句
hive> SELECT year(ymd), avg(price_close) FROM stocks
> WHERE exchange = 'NASDAQ' AND symbol = 'AAPL'
> GROUP BY year(ymd)
> HAVING avg(price_close) > 50.0;
1987 53.88968399108163
1991 52.49553383386182
13 join inner(建议将最大的table放在最后)
hive> SELECT a.ymd, a.price_close, b.price_close
> FROM stocks a JOIN stocks b ON a.ymd = b.ymd
> WHERE a.symbol = 'AAPL' AND b.symbol = 'IBM';
(注意,像比较这样的join条件是不允许的,条件中也不能用or)
SELECT a.ymd, a.price_close, b.price_close
FROM stocks a JOIN stocks b
ON a.ymd <= b.ymd
WHERE a.symbol = 'AAPL' AND b.symbol = 'IBM';
14.left outer join
hive> SELECT s.ymd, s.symbol, s.price_close, d.dividend
> FROM stocks s LEFT OUTER JOIN dividends d ON s.ymd = d.ymd AND s.symbol =
d.symbol
> WHERE s.symbol = 'AAPL';
...
1987-05-01 AAPL 80.0 NULL
1987-05-04 AAPL 79.75 NULL
1987-05-05 AAPL 80.25 NULL
不同类型比较
不同类型的数字float double做比较,要注意0.2float大于0.2double
可以cat(0.2 as float)
order by and sort by
hive的order by是全部数据的排序,在一个reduce中处理排序,默认升序。效率比较低,通常跟limit一起用
可以用hive.mapred.mode=strict来强制,order后跟着limit
sort by是在每个reduce中,进行排序,
(是否在一个reduce中,由groupcomparator决定,如果没有就是key的compare,在hive中UDAF在控制mapper的key输出到reducer上,普通的compact是hash分布到key上,或是自定义compact的hash均匀分布ketama算法)
distribut by是让相同的key归到同一个reducer中,这样sort可以进行reducer中的排序

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值