文章标题

Hive

基于Hadoop的一个数据仓库工具,构建于hadoop的hdfs和mapred之上,用于管理和查询结构化/非结构化数据的数据仓库。可以将结构化的数据文件映射为一张数据库表,并提供类SQL查询功能。

1.建表
create table city(
province_code INT,
province_name string,
city_code INT,
city_name string
)
row FORMAT delimited
fields terminated by ‘,’
lines terminated by ‘\n’;
2.查看表信息
show create table city;
3.查看表内容
select * from city limit 10;
4.最大10个
select * from city order by city_code desc limit 10;
5.省份里市最多的排序。
select province_name,count(city_name) as sum from city group by province_name order by sum limit 10;
6.去重
select count(distinct province_name) from city;

select count(*) from (select province_name from city group by province_name) a;
7.只有一个市的省份
select province_name,sum from (
select province_name,count(*) as sum
from city group by province_name) a where sum=1;

select province_name,count(*) as sum
from city group by province_name having sum=1;
8.内表与外表
外表
create external table city_ex
(province_code int,
province_name string,
city_code int,
city_name string
)
row format delimited fields terminated by ‘,’
lines terminated by ‘\n’
location’/user/hdfs/lyy/city’;
内表:
CREATE TABLE user(
uid INT,
city_code INT,
model string,
access string
)
row FORMAT delimited
fields terminated by ‘,’
lines terminated by ‘\n’;
将文件导入:
load data local inpath ‘/home/bigdata/hive/user.txt’ into table user;

求比例(除以总数):
select sum(if(access=’2G’,1,0))/count(1) from user;

按条件查询:
select
case
when uid % 10 in (0, 1, 2, 3) then ‘0-3’
when uid % 10 in (4, 5, 6, 7) then ‘4-7’
else ‘8-9’
end as interval,
count(*) as cnt
from user
group by
case
when uid % 10 in (0, 1, 2, 3) then ‘0-3’
when uid % 10 in (4, 5, 6, 7) then ‘4-7’
else ‘8-9’
end;

集合:
去重:select collect_set(access) from user;
不去重:select collect_list(access) from user;
表的关联方式:
select user.uid, user.city_code, city.city_name
from
(select * from user where uid <= 100) user
full join
(select * from city where province_code <= 30) city
on (user.city_code = city.city_code)
limit 20;

分组topN;
select access,city_code,uid
from
(
select uid,access,city_code,
row_number()over (partition by access order by city_code desc)as
row_num
from user
)a
where row_num=1;

select p_date,
sum(cnt) over(order by p_date asc rows between unbounded preceding and current row)
from(
select p_date,count(*)as cnt
from user_daily
where p_date between ‘2017-09-01’ and ‘2017-09-30’
group by p_date
)a;
unbounded preceding

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值