hive 操作笔记

1、查看数据库的描述及路径

hive> describe database database_name; 

2、建表
创建脚本 /opt/createtable.sql

create table IF NOT EXISTS lining_test.emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

执行脚本

hive>source /opt/createtable.sql;

加载数据

load data local inpath '/opt/test/emp.txt' overwrite into table emp;

查询

use lining_test;select * from emp;

增加列

use lining_test;alter table emp add columns(year string);

3、分区表
创建脚本/opt/createparttable.sql

create table lining_test.emp_part(
empno int,
empname string,
job string,
mgr int,
hiredate string,
salary double,
comm double,
deptno int)
partitioned by (year string, month string)
row format delimited
fields terminated by '\t'; 

执行脚本

hive>source /opt/createparttable.sql;

加载数据

load data local inpath '/opt/emp2.txt' into table lining_test.emp_part partition(year='2018',month='8');
load data local inpath '/opt/emp3.txt' into table lining_test.emp_part partition(year='2018',month='3');
load data local inpath '/opt/emp4.txt' into table lining_test.emp_part partition(year='2017',month='12');

查询

use lining_test;select * from emp_part;

查询表结构

describe lining_test.emp_part;

4、直接用hive命令

hive -e "use lining_test;ALTER TABLE emp CHANGE payyear month string;" 
hive -f "select * from db_hive01.emp" 

5、导出结果

insert overwrite local directory '/opt/test/'  
row format delimited fields terminated by '\t'
select * from emp_part; 
hive -e 'select * from emp_part'  >> ./emp_export.txt

6、导出到HDFS

insert overwrite directory '/export_data'  
select * from lining_test.emp; 

有local的速度明显比没有local慢

7、显示所有函数:

hive>show functions;

查看函数的用法;

hive>describe function fun_name;

常用函数

select upper(empname) from emp;
select unix_timestamp(trackTime) from bflog limit 3 ;
select year(hiredate) from emp ;
select month(hiredate) from emp ;
select hour(hiredate) from emp ;
select substr(hiredate,1,4) from .emp ;
select split(hiredate,'-')[1] from emp ;
select reverse(hiredate) from emp ;
select concat(empno,'-',empname) from emp ;

case when 条件1  then  ...
     when 条件2  then  ...
     else  end  

自定义UDF

add jar /opt/test/mylower.jar ;
CREATE TEMPORARY FUNCTION mylower AS 'org.gh.hadoop.hive.MyLower'; 

8、内连接:
hive> SELECT sales., things. FROM sales JOIN things ON (sales.id = things.id);

9、查看hive为某个查询使用多少个MapReduce作业
hive> Explain SELECT sales., things. FROM sales JOIN things ON (sales.id = things.id);

10、左连接:
hive> SELECT sales., things. FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);

11、Map连接:Hive可以把较小的表放入每个Mapper的内存来执行连接操作
hive> SELECT /+ MAPJOIN(things) / sales., things. FROM sales JOIN things ON (sales.id = things.id);

12、insert 导入

INSERT OVERWRITE TABLE stations_by_year SELECT year, COUNT(DISTINCT station) GROUP BY year

13、创建视图:

CREATE VIEW valid_records AS SELECT * FROM records2 WHERE temperature !=9999;

14、hive中使用正则表达式 选出所有列名以price作为前缀的列
(1) hive>select 'price.*' from product;
(2) 用Like或者RLike

15、order by 全局排序

insert overwrite local directory '/opt/test/local'  
row format delimited fields terminated by '\t'
select * from emp order by empno; 

16、sort by 与 distributed by
类似MR中partition,进行分区,结合sort by使用每个reduce内部进行排序,全局不是排序, distribute by 一定是放在sort by 前面,且必须要指定mapreduce.job.reduces数量,否则导出结果还是在一个文件中

set mapreduce.job.reduces=3;
insert overwrite local directory '/opt/test/local'  
row format delimited fields terminated by '\t'
select * from emp distribute by deptno sort by empno; 

当distributed by和sort by 字段一样的时候,直接使用cluster by

17、聚合函数
可以通过设置属性hive.map.aggr值为true来提高聚合的性能:
hive>hive.map.aggr=true;

18、word count

select word,count(*)
from(
select
explode(split(sentence,' '))
as word
from article
)t
group by word;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值