文章标题

1:建立数据库//create database lu;show databases,use lu;导入代码。
cli命令行模式。
CREATE TABLE city(
province_code INT,
province_name string,
city_code INT,
city_name string
)
row FORMAT delimited
fields terminated by ’,’
lines terminated by ‘\n’;

load data local inpath ‘/home/bigdata/hive/city.txt’ into table city;
select * from city;//查询是否导入进去。
select * from city limit 10;

45,广西,450500,北海
41,河南,411600,周口
62,甘肃,620500,天水

create database lu;
select province_name,count(1) as cnt
from city group by province_name
order by cnt desc limit 10;
select * from city order by city_code desc limit 10;取最多的10个省份。order by 通过…的排序。group by:将查询结果按后面的字段分组。
select province_name,count(city_name) as cnt from city group by province_name order by cnt desc limit 10;省里面有多少个市。
select count(distinct province_name)from city;去重统计,统计有多少个省。缺点:数据操作都在reduce里面执行,数据量大的时候会非常慢。
select count(*) from(
select province_name from city group by province_name//多层嵌套查询。
)a;//a:临时表,查询的时候方便而建立。优势在于数据量大的时候容易操作。
方法一:select province_name,count(city_name) as cnt from city group by province_name having cnt=1;只有一个市的省。
方法二:select province_name,cnt from(
select province_name,count(1) as cnt from city group by province_name
)a
where cnt=1
;用子嵌套来实现。
外表:
CREATE EXTERNAL TABLE city_ex(
province_code int,
province_name string,
city_code int,
city_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
LOCATION ‘/user/hdfs/tanqi/city’;
内表:
CREATE TABLE city_in(
province_code int,
province_name string,
city_code int,
city_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
LOCATION ‘/user/hdfs/tanqi/city’;
分区:
CREATE TABLE user(
uid INT,
city_code INT,
model string,
access string
)
row FORMAT delimited
fields terminated by ‘,’
lines terminated by ‘\n’;

load data local inpath ‘/home/bigdata/tanqi/hive/user.txt’ into table user;
分区表:
CREATE TABLE user_daily(
uid int,
city_code int,
model string,
access string
)
partitioned by (p_date string);
INSERT OVERWRITE TABLE user_daily PARTITION (p_date=’2017-09-01’)
SELECT * FROM user;
set hive.exec.dynamic.partition.mode=nonstrict;
动态分区
INSERT OVERWRITE TABLE user_daily PARTITION (p_date)
SELECT *,’2017-09-02’ FROM user
UNION ALL
SELECT *,’2017-09-03’ FROM user
ALTER TABLE user_daily PARTITION (p_date=’2017-01-01’) RENAME TO PARTITION (p_date=’20170101’);
总结:
今天内容有点多,感觉学起来较为吃力,一分心就不懂了,今后还得多加联系。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值