基本操作
1)查看数据库
show databases;
2)切换数据库
use mongdb;
3)查看表
show tables;
4)创建表
create table movie_world(id int, name string);
5)插入数据
insert into movie_world values(1, "1942");
6)删除表结构
drop table movie_world;
7)删除表数据
delete from movie_world;
8)准备数据
1,Alex
2,Amy
3,Mia
9)建立文件导入的表结构
create table student (id int, name string) row format delimited fields terminated by "\t"; //\t不知道我的为何有问题,缓存空格或者,正常插入,数据正常
9)记载数据
load data local inpath "/root/student_4.txt" into table student3;
10)修改mysql 密码
UPDATE user SET Password = password('123456') WHERE User = 'root';
11)hive的数据类型
java类型 | hive数据类型 | 长度 |
byte | TINTINT | 1 |
short | SMALINT | 2 |
int | INT | 4 |
long | BIGINT | 8 |
float | FLOAT | 单精度浮点数 |
double | DOUBLE | 双精度浮点数 |
string | STRING | 2 |
TIMESTAMP | 时间类型 |
binary | 字节数组 |
13)数据库操作
标准写法:
create database if not exist hive_db;
14)创建数据库到制定hdfs下
create database hive_db location '/root';如果不指定则创建为/user/hive/warehouse
15)查看数据库
desc database test_01;
OK
test_01 hdfs://192.168.8.128:9000/test root USER
Time taken: 0.011 seconds, Fetched: 1 row(s)
16)添加描述信息
hive> alter database test_01 set dbproperties("dataname"="first test");
OK
Time taken: 0.058 seconds
hive> desc database test_01;
OK
test_01 hdfs://192.168.8.128:9000/test root USER
Time taken: 0.013 seconds, Fetched: 1 row(s)
17)深度查询
hive> desc database extended test_01;
OK
test_01 hdfs://192.168.8.128:9000/test root USER {dataname=first test}
Time taken: 0.01 seconds, Fetched: 1 row(s)
18)查询数据库
->show databases;
-》统配查询
show database like 'db*';
19) 删除数据库
drop database test_01; //即使删除了也会报错
drop database if exists test_01; //不会报错
2. 表的操作
hive> select *from student3;
OK
1 Alex
2 Amy
3 Mia
Time taken: 1.817 seconds, Fetched: 3 row(s)
hive> select *from student3 where name = 'Alex';
OK
1 Alex
Time taken: 0.513 seconds, Fetched: 1 row(s)
hive> select count(*) from student3;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20181221232124_d56e8c6a-2319-445c-921f-81c4670f42bb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1545394147917_0001, Tracking URL = http://Master:8088/proxy/application_1545394147917_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1545394147917_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-12-21 23:22:28,764 Stage-1 map = 0%, reduce = 0%
2018-12-21 23:23:05,060 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.67 sec
2018-12-21 23:23:11,398 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.67 sec(有计算过程需要走MR流程)
MapReduce Total cumulative CPU time: 5 seconds 810 msec
Ended Job = job_1545394147917_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.81 sec HDFS Read: 7681 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 810 msec
OK
3
Time taken: 109.159 seconds, Fetched: 1 row(s)
20)如果将其中一个表数据插入到另外一个表中
create table if not student_5 as select * from student3 where name = 'Amy';
FAILED: ParseException line 1:20 missing KW_EXISTS at 'student_5' near '<EOF>'
hive> create table if not exists student_5 as select * from student3 where name = 'Amy';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20181221232652_0c868120-6b8e-47b3-94de-41c4510d488a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1545394147917_0002, Tracking URL = http://Master:8088/proxy/application_1545394147917_0002/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1545394147917_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-12-21 23:27:23,314 Stage-1 map = 0%, reduce = 0%
2018-12-21 23:27:56,371 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 19.44 sec(只走map即可)
MapReduce Total cumulative CPU time: 19 seconds 440 msec
Ended Job = job_1545394147917_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/.hive-staging_hive_2018-12-21_23-26-52_474_4470609876393742486-1/-ext-10002
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student_5
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 19.44 sec HDFS Read: 4090 HDFS Write: 78 SUCCESS
Total MapReduce CPU Time Spent: 19 seconds 440 msec
OK
Time taken: 65.47 seconds
21) 管理表(内部表)
desc formatted student3;
OK
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: mongdb
Owner: root
CreateTime: Fri Dec 21 20:58:02 CST 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student3
Table Type: MANAGED_TABLE
总结:删除表后,hdfs里的表也不存在,不适合数据共享
22)外部表
create external table if not exists doctor (id int, name string)
> row format
> delimited fields
> terminated by ' ';
OK
Time taken: 3.927 seconds
hive> load data local inpath "/root/student_4.txt" into table doctor;
Loading data to table default.doctor
OK
Time taken: 1.256 seconds
hive> select * from doctor;
OK
1 Alex
2 Amy
3 Mia
Time taken: 1.363 seconds, Fetched: 3 row(s)
hive> desc formatted doctor;
OK
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: default
Owner: root
CreateTime: Sat Dec 22 09:51:51 CST 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://192.168.8.128:9000/user/hive/warehouse/doctor
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
numFiles 1
numRows 0
rawDataSize 0
totalSize 19
transient_lastDdlTime 1545443594
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim
serialization.format
Time taken: 0.047 seconds, Fetched: 32 row(s)
hive> drop table doctor;
OK
Time taken: 1.176 seconds
hive> select * from doctor;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'doctor'
总结:外部表删除时候,hdfs里的表依然存在,数据共享,如果再次创建一样的表,自动创建关联关系;
内部表demo01已经被删除,demo02外部表还存在
23)分区表
1-》创建
create table if not exists program(id int, age int, name string)
partitioned by (day string)
row format delimited fields terminated by ' ';
OK
Time taken: 0.052 seconds
2-》导入数据
load data local inpath "/root/student_5" into table program partition(day="1222");
FAILED: SemanticException Line 1:23 Invalid path '"/root/student_5"': No files matching path file:/root/student_5
hive> load data local inpath "/root/student_5.txt" into table program partition(day="1222");
Loading data to table default.program partition (day=1222)
OK
Time taken: 0.448 seconds
hive> load data local inpath "/root/student_4.txt" into table program partition(day="1223");
Loading data to table default.program partition (day=1223)
OK
Time taken: 0.397 seconds
hive> select * from program;
OK
1 NULL NULL 1222
2 NULL NULL 1222
3 NULL NULL 1222
1 NULL NULL 1223
2 NULL NULL 1223
3 NULL NULL 1223
Time taken: 0.111 seconds, Fetched: 6 row(s)
hive> load data local inpath "/root/student_6.txt" into table program partition(day="1224");
Loading data to table default.program partition (day=1224)
OK
Time taken: 0.412 seconds
3-》查询
select * from program;
OK
1 NULL NULL 1222
2 NULL NULL 1222
3 NULL NULL 1222
1 NULL NULL 1223
2 NULL NULL 1223
3 NULL NULL 1223
1 35 Alex 1224
2 3 Amy 1224
3 1 Mia 1224
Time taken: 0.124 seconds, Fetched: 9 row(s)
hive> select * from program where day = '1224';
OK
1 35 Alex 1224
2 3 Amy 1224
3 1 Mia 1224
Time taken: 0.426 seconds, Fetched: 3 row(s)
hive> desc program;
OK
id int
age int
name string
day string
# Partition Information
# col_name data_type comment
day string
4-》单独创建分区
alter table program add partition (day = "1225");//两种方式,第一种创建表时直接创建,第二种事先创建分区好
5-》删除分区
alter table program drop partition( day='1222'); //删除分区
修改表
1-》修改表名
alter table program rename to project;
alter table program rename to project;
OK
Time taken: 0.136 seconds
hive> select * from program;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'program'
hive> select * from project;
OK
1 NULL NULL 1223
2 NULL NULL 1223
3 NULL NULL 1223
1 35 Alex 1224
2 3 Amy 1224
3 1 Mia 1224
2-》添加字段
alter table project add columns (school string);
OK
Time taken: 0.067 seconds
hive> select * from project;
OK
1 NULL NULL NULL 1223
2 NULL NULL NULL 1223
3 NULL NULL NULL 1223
1 35 Alex NULL 1224
2 3 Amy NULL 1224
3 1 Mia NULL 1224
Time taken: 0.123 seconds, Fetched: 6 row(s)
hive> desc project;
OK
id int
age int
name string
school string
day string
# Partition Information
# col_name data_type comment
day string
3-》修改表字段
alter table project change 旧列 新列 类型;
alter table project change id number int;
4-》替换表,相当于重建表结构
desc t1;
OK
tid int
tname string
age int
Time taken: 0.028 seconds, Fetched: 3 row(s)
hive> alter table t1 repl
replace replace( replicate_rows(
hive> alter table t1 replace (name string);
FAILED: ParseException line 1:23 missing COLUMNS at '(' near '<EOF>'
hive> alter table t1 replace col(name string);
collect_list collect_set collection columns
hive> alter table t1 replace columns(name string);
OK
Time taken: 0.043 seconds
hive> desc t1;
OK
name string
Time taken: 0.032 seconds, Fetched: 1 row(s)
DML操作
-》从本地加载
load data local inpath '/root/student_6.txt' into table project;
-》从hdfs加载 相当于剪切到其他目录下面
load data inpath '/root/student_6.txt' into table project; ---追加
-》覆盖数据
load data inpath '/root/student_6.txt' overwrite into table project; ---覆盖
-》插入数据
insert into table t2 values(88,'Andy',18); ---需要走map流程
-》插入分区
insert into table project partition(day='1226') values (8, 18,'Andy','qinghua');
-》创建表时插入数据
create table。。。。
location 'hdfs的数据路径';
-》查询结果导出数据到local
insert overwrite local directory '/root/test/1223' select * from project where name='Amy';
hive -e 'select * from project ' >/root/student000000000000002.txt
dfs -get 目标文件目录 下载目录
->根据已知表创建新表
create table if not exists student6 as select * from student;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190330171827_a7c5bd01-17af-4e34-944a-f4e9c0b04e05
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1553915032892_0001, Tracking URL = http://Master:8088/proxy/application_1553915032892_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1553915032892_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-30 17:19:29,920 Stage-1 map = 0%, reduce = 0%
2019-03-30 17:20:01,520 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.57 sec
MapReduce Total cumulative CPU time: 2 seconds 570 msec
Ended Job = job_1553915032892_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/.hive-staging_hive_2019-03-30_17-18-27_412_9053636091525904584-1/-ext-10002
Moving data to directory hdfs://192.168.8.128:9000/user/hive/warehouse/mongdb.db/student6
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.57 sec HDFS Read: 3643 HDFS Write: 104 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 570 msec
OK
student.id student.name
Time taken: 95.97 seconds
hive (mongdb)> select * from student6;
OK
student6.id student6.name
4 Tonny
1 Alex
2 Amy
3 Mia
NULL NULL
Time taken: 0.104 seconds, Fetched: 5 row(s)
》错误处理:
1. Exception in thread "main" java.lang.NoSuchMethodError: com.ibm.icu.impl.ICUBinary.getRequiredData(Ljava/lang/String;)Lj
是由于phoenix部署导致的问题,首先将
step1:添加phoenix-xxx-hbase-xxx-hive.jar、phoenix-xxx-hbase-xxx-client.jar 、phoenix-core-xxx-HBase-xxx.jar三个jar到hive环境(此步修改需同步客户端)
step2: 由于phoenix jar中和hive/lib中的icu4j jar冲突,启动hive服务时会发生如下异常:
Exception in thread "main" java.lang.NoSuchMethodError: com.ibm.icu.impl.ICUBinary.getRequiredData(Ljava/lang/String;)Ljava/nio/ByteBuffer;
解决方案:移除hive/lib下的icu4j-4.8.1.jar
参考链接:
https://www.jianshu.com/p/90e423f0722c
2. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeExcept
解决方案:hdfs dfsadmin -safemode leave