hive 操作笔记

最新推荐文章于 2024-08-16 15:29:20 发布

vanni01

最新推荐文章于 2024-08-16 15:29:20 发布

阅读量157

点赞数

分类专栏： hive 文章标签： hive 大数据

本文链接：https://blog.csdn.net/vanni01/article/details/107129945

版权

hive 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1、查看数据库的描述及路径

hive> describe database database_name;

2、建表
创建脚本 /opt/createtable.sql

create table IF NOT EXISTS lining_test.emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

执行脚本

hive>source /opt/createtable.sql;

加载数据

load data local inpath '/opt/test/emp.txt' overwrite into table emp;

查询

use lining_test;select * from emp;

增加列

use lining_test;alter table emp add columns(year string);

3、分区表
创建脚本/opt/createparttable.sql

create table lining_test.emp_part(
empno int,
empname string,
job string,
mgr int,
hiredate string,
salary double,
comm double,
deptno int)
partitioned by (year string, month string)
row format delimited
fields terminated by '\t';

执行脚本

hive>source /opt/createparttable.sql;

加载数据

load data local inpath '/opt/emp2.txt' into table lining_test.emp_part partition(year='2018',month='8');
load data local inpath '/opt/emp3.txt' into table lining_test.emp_part partition(year='2018',month='3');
load data local inpath '/opt/emp4.txt' into table lining_test.emp_part partition(year='2017',month='12');

查询

use lining_test;select * from emp_part;

查询表结构

describe lining_test.emp_part;

4、直接用hive命令

hive -e "use lining_test;ALTER TABLE emp CHANGE payyear month string;" 
hive -f "select * from db_hive01.emp"

5、导出结果

insert overwrite local directory '/opt/test/'  
row format delimited fields terminated by '\t'
select * from emp_part;

hive -e 'select * from emp_part'  >> ./emp_export.txt

6、导出到HDFS

insert overwrite directory '/export_data'  
select * from lining_test.emp;

有local的速度明显比没有local慢

7、显示所有函数：

hive>show functions;

查看函数的用法；

hive>describe function fun_name;

常用函数

select upper(empname) from emp;
select unix_timestamp(trackTime) from bflog limit 3 ;
select year(hiredate) from emp ;
select month(hiredate) from emp ;
select hour(hiredate) from emp ;
select substr(hiredate,1,4) from .emp ;
select split(hiredate,'-')[1] from emp ;
select reverse(hiredate) from emp ;
select concat(empno,'-',empname) from emp ;

case when 条件1  then  ...
     when 条件2  then  ...
     else  end

自定义UDF

add jar /opt/test/mylower.jar ;
CREATE TEMPORARY FUNCTION mylower AS 'org.gh.hadoop.hive.MyLower';

8、内连接：
hive> SELECT sales., things. FROM sales JOIN things ON (sales.id = things.id);

9、查看hive为某个查询使用多少个MapReduce作业
hive> Explain SELECT sales., things. FROM sales JOIN things ON (sales.id = things.id);

10、左连接：
hive> SELECT sales., things. FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);

11、Map连接：Hive可以把较小的表放入每个Mapper的内存来执行连接操作
hive> SELECT /+ MAPJOIN(things) / sales., things. FROM sales JOIN things ON (sales.id = things.id);

12、insert 导入

INSERT OVERWRITE TABLE stations_by_year SELECT year, COUNT(DISTINCT station) GROUP BY year

13、创建视图：

CREATE VIEW valid_records AS SELECT * FROM records2 WHERE temperature !=9999;

14、hive中使用正则表达式选出所有列名以price作为前缀的列
（1） hive>select 'price.*' from product;
（2） 用Like或者RLike

15、order by 全局排序

insert overwrite local directory '/opt/test/local'  
row format delimited fields terminated by '\t'
select * from emp order by empno;

16、sort by 与 distributed by
类似MR中partition，进行分区，结合sort by使用每个reduce内部进行排序，全局不是排序， distribute by 一定是放在sort by 前面，且必须要指定mapreduce.job.reduces数量，否则导出结果还是在一个文件中

set mapreduce.job.reduces=3;
insert overwrite local directory '/opt/test/local'  
row format delimited fields terminated by '\t'
select * from emp distribute by deptno sort by empno;

当distributed by和sort by 字段一样的时候，直接使用cluster by

17、聚合函数
可以通过设置属性hive.map.aggr值为true来提高聚合的性能：
hive>hive.map.aggr=true;

18、word count

select word,count(*)
from(
select
explode(split(sentence,' '))
as word
from article
)t
group by word;

vanni01

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive 操作笔记

1、CDH HIVE 5.3.6 配置1.1 下载 CDH 组件环境http://archive.cloudera.com/cdh5/cdh/5/jdk-7u79-linux-x64.tar.gzhttp://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.6.tar.gzhttp://archive.cloudera.com/cdh5/cdh/5/hive-0.13.1-cdh5.3.6.tar.gzhttp://archive.cloud
复制链接

扫一扫

专栏目录