hive 基础(1)

$hive -help

Usage: hive [--config confdir] [-hiveconf x=y]* [-i <init-filename>]*[<-f filename>|<-e query-string>] [-S]

-u username
-p [password]
-i <filename> init Sql file
-e 'quoted query string' Sql from command line
-f <filename> Sql from files
-S Silent mode in interactive shell

-e and -f cannot be specified together. In the absence of these
options, interactive shell is started

-f 可以把命令写在文件里
-S 只输出最终结果


官方文档内容

建表
可以整表注释也可对单字段注释。partition为分区,并不属于数据部分,决定了存储目录结构。cluster为了优化性能,预先排列,方便快速查询。ASCII 001(ctrl-A) 默认字段分隔,换行为默认行分隔。


CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
friends ARRAY<BIGINT>, properties MAP<STRING, STRING>
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '1'
COLLECTION ITEMS TERMINATED BY '2'
MAP KEYS TERMINATED BY '3'
STORED AS SEQUENCEFILE;




查看表和分区信息
加extended可以查看详细信息,里面有location:hdfs://可以看到文件存储路径

SHOW TABLES;
SHOW TABLES 'page.*';
SHOW PARTITIONS page_view;
DESCRIBE page_view;
DESCRIBE EXTENDED page_view;
DESCRIBE EXTENDED page_view PARTITION (ds='2008-08-08');



将文件中数据导入hive表
当文件中数据和表结构不一样时,先按文件格式建表,将文件put到表的路径下。
用select语句选出所需信息,插入到最终表。External关键字表示和文件独立。把表drop掉数据文件还在。

CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User',
country STRING COMMENT 'country of origination')
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
STORED AS TEXTFILE
LOCATION '/user/data/staging/page_view';

hadoop dfs -put /tmp/pv_2008-06-08.txt /user/data/staging/page_view

FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='US')
SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip
WHERE pvs.country = 'US';




当文件格式和表格式一致,直接导入。

本地路径形式,加local关键字

LOAD DATA LOCAL INPATH {{/tmp/pv_2008-06-08_us.txt}} INTO TABLE page_view PARTITION(date='2008-06-08', country='US')


分布式集群路径

LOAD DATA INPATH '/user/data/pv_2008-06-08_us.txt' INTO TABLE page_view PARTITION(date='2008-06-08', country='US')


Join操作

tb1 数据
id province
1 neimenggu
2 zhejiang
3 jiangsu
4 shandong
5 sichuan

tb2 数据
id capital
2 hangzhou
3 nanjing
4 jinan
5 chengdu
6 kunming

select a.id,a.capital,b.province from tb1 a left outer join tb2 b on (a.id = b.id);

1 neimenggu NULL
2 zhejiang hangzhou
3 jiangsu nanjing
4 shandong jinan
5 sichuan chengdu

select a.id,a.capital,b.province from tb1 a right outer join tb2 b on (a.id = b.id);

2 zhejiang hangzhou
3 jiangsu nanjing
4 shandong jinan
5 sichuan chengdu
NULL NULL kunming

select a.id,a.capital,b.province from tb1 a full outer join tb2 b on (a.id = b.id);

1 neimenggu NULL
2 zhejiang hangzhou
3 jiangsu nanjing
4 shandong jinan
5 sichuan chengdu
NULL NULL kunming

select a.id,a.capital,b.province from tb1 a join tb2 b on (a.id = b.id);

2 zhejiang hangzhou
3 jiangsu nanjing
4 shandong jinan
5 sichuan chengdu

select a.id,a.capital from tb1 a left semi join tb2 b on (a.id = b.id);

2 zhejiang
3 jiangsu
4 shandong
5 sichuan

总结:left outer join 左侧所有会出现 right outer join 右侧所有会出现
full outer join 两侧所有会出现 join 两侧共有的会出现
left semi join 在右侧出现过的,但只能select左侧属性,上述两侧属性均可select
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值