数据库
库选择:use 库名;
创建数据库:create basedata 库名;
删除数据库:drop basedata 库名; #注意库中没有表或者无数据才可以删除!
hive 暂不支持数据库重命名操作。
1.用户可以使用 ALTER DATABASE 命令来为某个数据库的
DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。
EG:
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);
2. 也可以使用该命令修改数据库的用户或权限;
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
3.但是数据库的其他元数据信息都是不可以更改的,包括数据库名和数据库所在的目录位置。
2)查看库下面的表:show tables;
具体表字段:desc 表名;
具体表信息:desc formatted 表名;
Apache Hive Load Data
数据导入方式:
- 从本地文件系统中导入数据到hive 表
- 从HDFS上导入数据到hive表
- 从别的表中查询出相应的数据并导入到hive 表
- 在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中。
本地文件系统
⊙创建数据表
数据库:hive
表名: nsf
文件格式:textfile
hive> create table nsf (id string, name string, tel string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 0.139 seconds
- 导入数据
hive> load data local inpath '/root/workspace/a.txt' into table nsf;
Loading data to table default.nsf
OK
Time taken: 1.792 seconds
hive> select * from nsf
> ;
OK
1 abc 18812341234
2 aaa 18943214312
3 a2d 15525256868
4 asd 18511990099
5 aac 16656568989
Time taken: 1.613 seconds, Fetched: 5 row(s)
查看目录数据
hive> desc formatted nsf;
OK
# col_name data_type comment
id string
name string
tel string
# Detailed Table Information
Database: default
Owner: root
CreateTime: Tue Aug 27 06:49:27 GMT 2019
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoop-maste:9000/home/hive/warehouse/nsf
Table Type: MANAGED_TABLE
Table Parameters:
numFiles 1
numRows 0
rawDataSize 0
totalSize 140
transient_lastDdlTime 1566888697
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.12 seconds, Fetched: 32 row(s)
在创建表得时候,通过查询别的表得数据,并且插入到新得表中
hive> create table nsf_1 as select * from nsf limit 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190827065758_55500fa4-8601-462e-a00b-966b3dd660a9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1566877322709_0001, Tracking URL = http://hadoop-maste:8088/proxy/application_1566877322709_0001/
Kill Command = /usr/local/hadoop-2.7.5/bin/hadoop job -kill job_1566877322709_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-08-27 06:58:11,580 Stage-1 map = 0%, reduce = 0%
2019-08-27 06:58:20,294 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.92 sec
2019-08-27 06:58:26,650 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.75 sec
MapReduce Total cumulative CPU time: 4 seconds 750 msec
Ended Job = job_1566877322709_0001
Moving data to directory hdfs://hadoop-maste:9000/home/hive/warehouse/nsf_1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.75 sec HDFS Read: 7472 HDFS Write: 137 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 750 msec
OK
Time taken: 30.637 seconds
外部表
- 无分区
CREATE EXTERNAL TABLE USER
(id string, age string, sex string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/hadoop/warespace/tmp/user';
- 有分区
CREATE EXTERNAL TABLE USER
(pid string, age string, sex string) partitioned by (day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/hadoop/warespace/tmp/user';
临时表
create temporary table seed as
select hash(id) as pid
from seed_orig;