Hive基本使用

 基本操作

1)查看数据库

show databases;

2)切换数据库

use mongdb;

3)查看表

show tables;

4)创建表

create table movie_world(id int, name string);

5)插入数据

insert into movie_world values(1, "1942");

6)删除表结构

drop table movie_world;

7)删除表数据

delete from movie_world;

8)准备数据

1,Alex
2,Amy
3,Mia

9)建立文件导入的表结构

create table student (id int, name string) row format delimited fields terminated by "\t"; //\t不知道我的为何有问题,缓存空格或者,正常插入,数据正常

9)记载数据

 load data local inpath  "/root/student_4.txt" into table student3;

10)修改mysql 密码

UPDATE user SET Password = password('123456') WHERE User = 'root';

11)hive的数据类型

数据类型
java类型hive数据类型长度
byteTINTINT1
shortSMALINT2
int INT4
longBIGINT8
floatFLOAT单精度浮点数
doubleDOUBLE双精度浮点数
stringSTRING2
 TIMESTAMP时间类型
 binary字节数组

13)数据库操作

标准写法:

create database if not exist hive_db; 

14)创建数据库到制定hdfs下

create database hive_db location '/root';如果不指定则创建为/user/hive/warehouse

15)查看数据库

 desc database test_01;
OK
test_01         hdfs://192.168.8.128:9000/test  root    USER
Time taken: 0.011 seconds, Fetched: 1 row(s)

16)添加描述信息
hive> alter database test_01 set dbproperties("dataname"="first test");
OK
Time taken: 0.058 seconds
hive> desc database test_01;
OK
test_01         hdfs://192.168.8.128:9000/test  root    USER
Time taken: 0.013 seconds, Fetched: 1 row(s)

17)深度查询
hive> desc database extended test_01;
OK
test_01         hdfs://192.168.8.128:9000/test  root    USER    {dataname=first test}
Time taken: 0.01 seconds, Fetched: 1 row(s)

18)查询数据库

->show databases;

-》统配查询

show database like 'db*';

19) 删除数据库

drop database test_01;  //即使删除了也会报错

drop database if exists test_01; //不会报错

2. 表的操作

hive> select *from student3;
OK
1       Alex
2       Amy
3       Mia
Time taken: 1.817 seconds, Fetched: 3 row(s)
hive> select *from student3 where name = 'Alex';
OK
1       Alex
Time taken: 0.513 seconds, Fetched: 1 row(s)
hive> select count(*) from student3;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20181221232124_d56e8c6a-2319-445c-921f-81c4670f42bb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1545394147917_0001, Tracking URL = http://Master:8088/proxy/application_1545394147917_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job  -kill job_1545394147917_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-12-21 23:22:28,764 Stage-1 map = 0%,  reduce = 0%
2018-12-21 23:23:05,060 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.67 sec
2018-12-21 23:23:11,398 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.67 sec(有计算过程需要走MR流程
MapReduce Total cumulative CPU time: 5 seconds 810 msec
Ended Job = job_1545394147917_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.81 sec   HDFS Read: 7681 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 810 msec
OK
3
Time taken: 109.159 seconds, Fetched: 1 row(s)

 

20)如果将其中一个表数据插入到另外一个表中

create table if not student_5 as select * from student3 where name = 'Amy';
FAILED: ParseException line 1:20 missing KW_EXISTS at 'student_5' near '<EOF>'
hive> create table if not exists student_5 as select * from student3 where name = 'Amy';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20181221232652_0c868120-6b8e-47b3-94de-41c4510d488a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1545394147917_0002, Tracking URL = http://Master:8088/proxy/application_1545394147917_0002/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job  -kill job_1545394147917_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-12-21 23:27:23,314 Stage-1 map = 0%,  reduce = 0%
2018-12-21 23:27:56,371 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 19.44 sec(只走map即可)
MapReduce Total cumulative CPU time: 19 seconds 440 msec
Ended Job = job_1545394147917_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/.hive-staging_hive_2018-12-21_23-26-52_474_4470609876393742486-1/-ext-10002
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student_5
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 19.44 sec   HDFS Read: 4090 HDFS Write: 78 SUCCESS
Total MapReduce CPU Time Spent: 19 seconds 440 msec
OK
Time taken: 65.47 seconds

21) 管理表(内部表)

 desc formatted student3;
OK
# col_name              data_type               comment             
                 
id                      int                                         
name                    string                                      
                 
# Detailed Table Information             
Database:               mongdb                   
Owner:                  root                     
CreateTime:             Fri Dec 21 20:58:02 CST 2018     
LastAccessTime:         UNKNOWN                  
Retention:              0                        
Location:               hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student3         
Table Type:             MANAGED_TABLE        

总结:删除表后,hdfs里的表也不存在,不适合数据共享 

22)外部表

create external table if not exists doctor (id int, name string) 
    > row format 
    > delimited fields
    > terminated by ' ';
OK
Time taken: 3.927 seconds
hive> load data local inpath "/root/student_4.txt" into table doctor;
Loading data to table default.doctor
OK
Time taken: 1.256 seconds
hive> select * from doctor;
OK
1       Alex
2       Amy
3       Mia
Time taken: 1.363 seconds, Fetched: 3 row(s)
hive> desc formatted doctor;
OK
# col_name              data_type               comment             
                 
id                      int                                         
name                    string                                      
                 
# Detailed Table Information             
Database:               default                  
Owner:                  root                     
CreateTime:             Sat Dec 22 09:51:51 CST 2018     
LastAccessTime:         UNKNOWN                  
Retention:              0                        
Location:               hdfs://192.168.8.128:9000/user/hive/warehouse/doctor     
Table Type:             EXTERNAL_TABLE           
Table Parameters:                
        EXTERNAL                TRUE                
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               19                  
        transient_lastDdlTime   1545443594          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe       
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        field.delim                                 
        serialization.format                        
Time taken: 0.047 seconds, Fetched: 32 row(s)
hive> drop table doctor;
OK
Time taken: 1.176 seconds
hive> select * from doctor;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'doctor'
总结:外部表删除时候,hdfs里的表依然存在,数据共享,如果再次创建一样的表,自动创建关联关系;

内部表demo01已经被删除,demo02外部表还存在

23)分区表

1-》创建

create table if not exists program(id int, age int, name string)

partitioned by (day string)

row format delimited fields terminated by ' ';
OK
Time taken: 0.052 seconds

2-》导入数据

load data local inpath "/root/student_5" into table program partition(day="1222");
FAILED: SemanticException Line 1:23 Invalid path '"/root/student_5"': No files matching path file:/root/student_5
hive> load data local inpath "/root/student_5.txt" into table program partition(day="1222");
Loading data to table default.program partition (day=1222)
OK
Time taken: 0.448 seconds
hive> load data local inpath "/root/student_4.txt" into table program partition(day="1223");
Loading data to table default.program partition (day=1223)
OK
Time taken: 0.397 seconds
hive> select * from program;
OK
1       NULL    NULL    1222
2       NULL    NULL    1222
3       NULL    NULL    1222
1       NULL    NULL    1223
2       NULL    NULL    1223
3       NULL    NULL    1223
Time taken: 0.111 seconds, Fetched: 6 row(s)
hive> load data local inpath "/root/student_6.txt" into table program partition(day="1224");
Loading data to table default.program partition (day=1224)
OK
Time taken: 0.412 seconds

3-》查询

select * from program;
OK
1       NULL    NULL    1222
2       NULL    NULL    1222
3       NULL    NULL    1222
1       NULL    NULL    1223
2       NULL    NULL    1223
3       NULL    NULL    1223
1       35      Alex    1224
2       3       Amy     1224
3       1       Mia     1224
Time taken: 0.124 seconds, Fetched: 9 row(s)
hive> select * from program where day = '1224';
OK
1       35      Alex    1224
2       3       Amy     1224
3       1       Mia     1224
Time taken: 0.426 seconds, Fetched: 3 row(s)
hive> desc program;
OK
id                      int                                         
age                     int                                         
name                    string                                      
day                     string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
day                     string                      

4-》单独创建分区

alter table program add partition (day = "1225");//两种方式,第一种创建表时直接创建,第二种事先创建分区好

5-》删除分区

alter table program drop partition( day='1222'); //删除分区

修改表

1-》修改表名

alter table program rename to project;

 alter table program rename to project;
OK
Time taken: 0.136 seconds
hive> select * from program;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'program'
hive> select * from project;
OK
1       NULL    NULL    1223
2       NULL    NULL    1223
3       NULL    NULL    1223
1       35      Alex    1224
2       3       Amy     1224
3       1       Mia     1224

2-》添加字段

alter table project add columns (school string);
OK
Time taken: 0.067 seconds
hive> select * from project;
OK
1       NULL    NULL    NULL    1223
2       NULL    NULL    NULL    1223
3       NULL    NULL    NULL    1223
1       35      Alex    NULL    1224
2       3       Amy     NULL    1224
3       1       Mia     NULL    1224
Time taken: 0.123 seconds, Fetched: 6 row(s)
hive> desc project;
OK
id                      int                                         
age                     int                                         
name                    string                                      
school                  string                                      
day                     string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
day                     string                       

3-》修改表字段

alter table project change 旧列 新列 类型;

alter table project change id number int;

4-》替换表,相当于重建表结构

desc t1;
OK
tid                     int                                         
tname                   string                                      
age                     int                                         
Time taken: 0.028 seconds, Fetched: 3 row(s)
hive> alter table t1 repl
replace           replace(          replicate_rows(   
hive> alter table t1 replace (name string);
FAILED: ParseException line 1:23 missing COLUMNS at '(' near '<EOF>'
hive> alter table t1 replace col(name string);
collect_list   collect_set    collection     columns        
hive> alter table t1 replace columns(name string);
OK
Time taken: 0.043 seconds
hive> desc t1;
OK
name                    string                                      
Time taken: 0.032 seconds, Fetched: 1 row(s)

DML操作

 

-》从本地加载

load data local inpath '/root/student_6.txt' into table project;

-》从hdfs加载  相当于剪切到其他目录下面

load data inpath '/root/student_6.txt' into table project; ---追加 

-》覆盖数据

load data inpath '/root/student_6.txt' overwrite into table project;    ---覆盖

-》插入数据

insert into table t2 values(88,'Andy',18);  ---需要走map流程

-》插入分区

 insert into table project partition(day='1226') values (8, 18,'Andy','qinghua');

-》创建表时插入数据

create table。。。。

location 'hdfs的数据路径';

-》查询结果导出数据到local

insert overwrite local directory '/root/test/1223' select * from project where name='Amy';

 hive -e 'select * from project ' >/root/student000000000000002.txt   

dfs -get 目标文件目录  下载目录

->根据已知表创建新表

create table if not exists student6 as select * from student;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190330171827_a7c5bd01-17af-4e34-944a-f4e9c0b04e05
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553915032892_0001, Tracking URL = http://Master:8088/proxy/application_1553915032892_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job  -kill job_1553915032892_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-30 17:19:29,920 Stage-1 map = 0%,  reduce = 0%
2019-03-30 17:20:01,520 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.57 sec
MapReduce Total cumulative CPU time: 2 seconds 570 msec
Ended Job = job_1553915032892_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/.hive-staging_hive_2019-03-30_17-18-27_412_9053636091525904584-1/-ext-10002
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student6
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.57 sec   HDFS Read: 3643 HDFS Write: 104 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 570 msec
OK
student.id      student.name
Time taken: 95.97 seconds
hive (mongdb)> select * from student6;
OK
student6.id     student6.name
4       Tonny
1       Alex
2       Amy
3       Mia
NULL    NULL
Time taken: 0.104 seconds, Fetched: 5 row(s)

》错误处理:

1. Exception in thread "main" java.lang.NoSuchMethodError: com.ibm.icu.impl.ICUBinary.getRequiredData(Ljava/lang/String;)Lj

是由于phoenix部署导致的问题,首先将

step1:添加phoenix-xxx-hbase-xxx-hive.jar、phoenix-xxx-hbase-xxx-client.jar 、phoenix-core-xxx-HBase-xxx.jar三个jar到hive环境(此步修改需同步客户端)

step2: 由于phoenix jar中和hive/lib中的icu4j jar冲突,启动hive服务时会发生如下异常:
Exception in thread "main" java.lang.NoSuchMethodError: com.ibm.icu.impl.ICUBinary.getRequiredData(Ljava/lang/String;)Ljava/nio/ByteBuffer;
解决方案:移除hive/lib下的icu4j-4.8.1.jar

参考链接:

https://www.jianshu.com/p/90e423f0722c 

2. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeExcept

解决方案:hdfs dfsadmin -safemode leave 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值