Hive基本使用

最新推荐文章于 2024-05-10 15:19:29 发布

Supper宝宝

最新推荐文章于 2024-05-10 15:19:29 发布

阅读量856

点赞数

分类专栏：大数据文章标签： hiv

本文链接：https://blog.csdn.net/kebexue/article/details/85172049

版权

大数据专栏收录该内容

21 篇文章 0 订阅

订阅专栏

基本操作

1)查看数据库

show databases；

2）切换数据库

use mongdb；

3）查看表

show tables；

4）创建表

create table movie_world(id int, name string);

5)插入数据

insert into movie_world values(1, "1942");

6)删除表结构

drop table movie_world；

7）删除表数据

delete from movie_world；

8）准备数据

1,Alex
2,Amy
3,Mia

9)建立文件导入的表结构

create table student (id int, name string) row format delimited fields terminated by "\t"; //\t不知道我的为何有问题，缓存空格或者，正常插入，数据正常

9)记载数据

load data local inpath "/root/student_4.txt" into table student3;

10）修改mysql 密码

UPDATE user SET Password = password('123456') WHERE User = 'root';

11）hive的数据类型

数据类型
java类型	hive数据类型	长度
byte	TINTINT	1
short	SMALINT	2
int	INT	4
long	BIGINT	8
float	FLOAT	单精度浮点数
double	DOUBLE	双精度浮点数
string	STRING	2

TIMESTAMP

时间类型

binary

字节数组

13）数据库操作

标准写法：

create database if not exist hive_db;

14）创建数据库到制定hdfs下

create database hive_db location '/root';如果不指定则创建为/user/hive/warehouse

15）查看数据库

desc database test_01;
OK
test_01 hdfs://192.168.8.128:9000/test root USER
Time taken: 0.011 seconds, Fetched: 1 row(s)

16）添加描述信息
hive> alter database test_01 set dbproperties("dataname"="first test");
OK
Time taken: 0.058 seconds
hive> desc database test_01;
OK
test_01 hdfs://192.168.8.128:9000/test root USER
Time taken: 0.013 seconds, Fetched: 1 row(s)

17）深度查询
hive> desc database extended test_01;
OK
test_01 hdfs://192.168.8.128:9000/test root USER {dataname=first test}
Time taken: 0.01 seconds, Fetched: 1 row(s)

18）查询数据库

->show databases;

-》统配查询

show database like 'db*';

19) 删除数据库

drop database test_01; //即使删除了也会报错

drop database if exists test_01; //不会报错

2. 表的操作

hive> select *from student3;
OK
1 Alex
2 Amy
3 Mia
Time taken: 1.817 seconds, Fetched: 3 row(s)
hive> select *from student3 where name = 'Alex';
OK
1 Alex
Time taken: 0.513 seconds, Fetched: 1 row(s)
hive> select count(*) from student3;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20181221232124_d56e8c6a-2319-445c-921f-81c4670f42bb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1545394147917_0001, Tracking URL = http://Master:8088/proxy/application_1545394147917_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1545394147917_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-12-21 23:22:28,764 Stage-1 map = 0%, reduce = 0%
2018-12-21 23:23:05,060 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.67 sec
2018-12-21 23:23:11,398 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.67 sec（有计算过程需要走MR流程）
MapReduce Total cumulative CPU time: 5 seconds 810 msec
Ended Job = job_1545394147917_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.81 sec HDFS Read: 7681 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 810 msec
OK
3
Time taken: 109.159 seconds, Fetched: 1 row(s)

20）如果将其中一个表数据插入到另外一个表中

create table if not student_5 as select * from student3 where name = 'Amy';
FAILED: ParseException line 1:20 missing KW_EXISTS at 'student_5' near '<EOF>'
hive> create table if not exists student_5 as select * from student3 where name = 'Amy';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20181221232652_0c868120-6b8e-47b3-94de-41c4510d488a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1545394147917_0002, Tracking URL = http://Master:8088/proxy/application_1545394147917_0002/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1545394147917_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-12-21 23:27:23,314 Stage-1 map = 0%, reduce = 0%
2018-12-21 23:27:56,371 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 19.44 sec（只走map即可）
MapReduce Total cumulative CPU time: 19 seconds 440 msec
Ended Job = job_1545394147917_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/.hive-staging_hive_2018-12-21_23-26-52_474_4470609876393742486-1/-ext-10002
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student_5
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 19.44 sec HDFS Read: 4090 HDFS Write: 78 SUCCESS
Total MapReduce CPU Time Spent: 19 seconds 440 msec
OK
Time taken: 65.47 seconds

21) 管理表(内部表)

desc formatted student3;
OK
# col_name data_type comment

id int
name string

# Detailed Table Information
Database: mongdb
Owner: root
CreateTime: Fri Dec 21 20:58:02 CST 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student3
Table Type: MANAGED_TABLE

总结：删除表后，hdfs里的表也不存在，不适合数据共享

22）外部表

create external table if not exists doctor (id int, name string)
> row format
> delimited fields
> terminated by ' ';
OK
Time taken: 3.927 seconds
hive> load data local inpath "/root/student_4.txt" into table doctor;
Loading data to table default.doctor
OK
Time taken: 1.256 seconds
hive> select * from doctor;
OK
1 Alex
2 Amy
3 Mia
Time taken: 1.363 seconds, Fetched: 3 row(s)
hive> desc formatted doctor;
OK
# col_name data_type comment

id int
name string

# Detailed Table Information
Database: default
Owner: root
CreateTime: Sat Dec 22 09:51:51 CST 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://192.168.8.128:9000/user/hive/warehouse/doctor
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
numFiles 1
numRows 0
rawDataSize 0
totalSize 19
transient_lastDdlTime 1545443594

# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim
serialization.format
Time taken: 0.047 seconds, Fetched: 32 row(s)
hive> drop table doctor;
OK
Time taken: 1.176 seconds
hive> select * from doctor;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'doctor'
总结：外部表删除时候，hdfs里的表依然存在，数据共享，如果再次创建一样的表，自动创建关联关系；

内部表demo01已经被删除，demo02外部表还存在

23）分区表

1-》创建

create table if not exists program(id int, age int, name string)

partitioned by (day string)

row format delimited fields terminated by ' ';
OK
Time taken: 0.052 seconds

2-》导入数据

load data local inpath "/root/student_5" into table program partition(day="1222");
FAILED: SemanticException Line 1:23 Invalid path '"/root/student_5"': No files matching path file:/root/student_5
hive> load data local inpath "/root/student_5.txt" into table program partition(day="1222");
Loading data to table default.program partition (day=1222)
OK
Time taken: 0.448 seconds
hive> load data local inpath "/root/student_4.txt" into table program partition(day="1223");
Loading data to table default.program partition (day=1223)
OK
Time taken: 0.397 seconds
hive> select * from program;
OK
1 NULL NULL 1222
2 NULL NULL 1222
3 NULL NULL 1222
1 NULL NULL 1223
2 NULL NULL 1223
3 NULL NULL 1223
Time taken: 0.111 seconds, Fetched: 6 row(s)
hive> load data local inpath "/root/student_6.txt" into table program partition(day="1224");
Loading data to table default.program partition (day=1224)
OK
Time taken: 0.412 seconds

3-》查询

select * from program;
OK
1 NULL NULL 1222
2 NULL NULL 1222
3 NULL NULL 1222
1 NULL NULL 1223
2 NULL NULL 1223
3 NULL NULL 1223
1 35 Alex 1224
2 3 Amy 1224
3 1 Mia 1224
Time taken: 0.124 seconds, Fetched: 9 row(s)
hive> select * from program where day = '1224';
OK
1 35 Alex 1224
2 3 Amy 1224
3 1 Mia 1224
Time taken: 0.426 seconds, Fetched: 3 row(s)
hive> desc program;
OK
id int
age int
name string
day string

# Partition Information
# col_name data_type comment

day string

4-》单独创建分区

alter table program add partition （day = "1225"）;//两种方式，第一种创建表时直接创建，第二种事先创建分区好

5-》删除分区

alter table program drop partition( day='1222'); //删除分区

修改表

1-》修改表名

alter table program rename to project；

alter table program rename to project;
OK
Time taken: 0.136 seconds
hive> select * from program;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'program'
hive> select * from project;
OK
1 NULL NULL 1223
2 NULL NULL 1223
3 NULL NULL 1223
1 35 Alex 1224
2 3 Amy 1224
3 1 Mia 1224

2-》添加字段

alter table project add columns (school string);
OK
Time taken: 0.067 seconds
hive> select * from project;
OK
1 NULL NULL NULL 1223
2 NULL NULL NULL 1223
3 NULL NULL NULL 1223
1 35 Alex NULL 1224
2 3 Amy NULL 1224
3 1 Mia NULL 1224
Time taken: 0.123 seconds, Fetched: 6 row(s)
hive> desc project;
OK
id int
age int
name string
school string
day string

# Partition Information
# col_name data_type comment

day string

3-》修改表字段

alter table project change 旧列新列类型；

alter table project change id number int；

4-》替换表，相当于重建表结构

desc t1;
OK
tid int
tname string
age int
Time taken: 0.028 seconds, Fetched: 3 row(s)
hive> alter table t1 repl
replace replace( replicate_rows(
hive> alter table t1 replace (name string);
FAILED: ParseException line 1:23 missing COLUMNS at '(' near '<EOF>'
hive> alter table t1 replace col(name string);
collect_list collect_set collection columns
hive> alter table t1 replace columns(name string);
OK
Time taken: 0.043 seconds
hive> desc t1;
OK
name string
Time taken: 0.032 seconds, Fetched: 1 row(s)

DML操作

-》从本地加载

load data local inpath '/root/student_6.txt' into table project;

-》从hdfs加载相当于剪切到其他目录下面

load data inpath '/root/student_6.txt' into table project; ---追加

-》覆盖数据

load data inpath '/root/student_6.txt' overwrite into table project; ---覆盖

-》插入数据

insert into table t2 values(88,'Andy',18); ---需要走map流程

-》插入分区

insert into table project partition(day='1226') values (8, 18,'Andy','qinghua');

-》创建表时插入数据

create table。。。。

location 'hdfs的数据路径';

-》查询结果导出数据到local

insert overwrite local directory '/root/test/1223' select * from project where name='Amy';

hive -e 'select * from project ' >/root/student000000000000002.txt

->根据已知表创建新表

create table if not exists student6 as select * from student;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190330171827_a7c5bd01-17af-4e34-944a-f4e9c0b04e05
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553915032892_0001, Tracking URL = http://Master:8088/proxy/application_1553915032892_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1553915032892_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-30 17:19:29,920 Stage-1 map = 0%, reduce = 0%
2019-03-30 17:20:01,520 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.57 sec
MapReduce Total cumulative CPU time: 2 seconds 570 msec
Ended Job = job_1553915032892_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/.hive-staging_hive_2019-03-30_17-18-27_412_9053636091525904584-1/-ext-10002
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student6
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.57 sec HDFS Read: 3643 HDFS Write: 104 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 570 msec
OK
student.id student.name
Time taken: 95.97 seconds
hive (mongdb)> select * from student6;
OK
student6.id student6.name
4 Tonny
1 Alex
2 Amy
3 Mia
NULL NULL
Time taken: 0.104 seconds, Fetched: 5 row(s)

》错误处理：

1. Exception in thread "main" java.lang.NoSuchMethodError: com.ibm.icu.impl.ICUBinary.getRequiredData(Ljava/lang/String;)Lj

是由于phoenix部署导致的问题，首先将

step1:添加phoenix-xxx-hbase-xxx-hive.jar、phoenix-xxx-hbase-xxx-client.jar 、phoenix-core-xxx-HBase-xxx.jar三个jar到hive环境(此步修改需同步客户端)

step2: 由于phoenix jar中和hive/lib中的icu4j jar冲突，启动hive服务时会发生如下异常：
Exception in thread "main" java.lang.NoSuchMethodError: com.ibm.icu.impl.ICUBinary.getRequiredData(Ljava/lang/String;)Ljava/nio/ByteBuffer;
解决方案:移除hive/lib下的icu4j-4.8.1.jar

参考链接：

https://www.jianshu.com/p/90e423f0722c

2. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeExcept

解决方案：hdfs dfsadmin -safemode leave

Supper宝宝

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive基本使用

基本操作1)查看数据库show databases；2）切换数据库use mongdb；3）查看表show tables；4）创建表create table movie_world(id int, name string);5)插入数据insert into movie_world values(1, "1942");6)删除表结构drop tab...
复制链接

扫一扫