hive数据库操作、数据表操作、数据的导入和导出命令

最新推荐文章于 2024-07-30 08:58:28 发布

lds_include

最新推荐文章于 2024-07-30 08:58:28 发布

阅读量1.1k

点赞数 3

分类专栏：大数据 Hive 文章标签： hive数据库操作、数据表操作、数据的导入和导出命令 Hadoop知识大数据

本文链接：https://blog.csdn.net/lds_include/article/details/88735721

版权

大数据同时被 2 个专栏收录

70 篇文章 4 订阅

订阅专栏

Hive

17 篇文章 0 订阅

订阅专栏

hive数据库操作、数据表操作、数据的导入和导出命令

数据库的创建

定义

本质上是在hdfs上创建一个目录，使用comment加入数据库的描述信息，描述信息放在引号里。数据库的属性信息放在描述信息之后用with dbproperties 加入，属性信息放在括号内，属性名和属性值放在引号里，用等号连接有多条属性用逗号分隔

例子

##创建一个数据库名为myhive,加入描述信息及属性信息
create database myhive comment 'this is myhive db'
with dbproperties ('author'='luodesong','date'='2018-4-21')
;

##查看属性信息
describe database extended myhive;

##在原有数据库基础上加入新的属性信息
alter database myhive set dbproperties ('id'='1');

##切换库
use myhive;

##删除数据库
drop database myhive;

表的创建

定义

默认创建到当前数据库(default是hive默认库)，创建表的本质也是在hdfs上创建一个目录

例子

使用array，本地数据加载。

##创建数据array.txt映射表t_array
create table if not exists t_array(
id int comment 'this is id',
score array<tinyint>
)
comment 'this is my table'
row format delimited fields terminated by ','
collection items terminated by '|'
tblproperties ('id'='11','author'='luodesong')
;
##从本地加载数据array.txt文件
load data local inpath '/testdata/array.txt' into table t_array;
##查询表里面的数据
select * from t_array;
##查询id=1的第一条成绩信息
select score[0] from t_array where id=1;
##查询id=2的成绩条数
select size(score) from t_array where id=2;
##查询一共有多少条数据
select count(*) from t_array;
##把arra1.txt追加的方式从本地加载进这个表中
load data local inpath '/testdata/array1.txt' into table t_array;
##把test.txt追加的方式从本地加载进这个表中
load data local inpath '/testdata/test.txt' into table t_array;
##从本地覆盖方式加载数据array.txt文件至t_array表中
load data local inpath '/testdata/array.txt' overwrite into table t_array;

map的使用，查看表的创建过程，创建表的同时指定数据位置

##创建数据map.txt的映射表t_map
create table if not exists t_map(
id int,
score map<string,int>
)
row format delimited fields terminated by ','
collection items terminated by '|'
map keys terminated by ':'
stored as textfile
;
##从hdfs加载数据，map.txt在hdfs上的位置位置被移动。
load data local inpath '/testdata/map.txt' into table t_map;
##查询id=1的数学成绩
select score['math'] from t_map where id=1;
##查询每个人考了多少科
select size(score) from t_map;
##查看表的创建过程
show create table t_map;
CREATE TABLE `t_map1`(
  `id` int, 
  `score` map<string,int>)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
  COLLECTION ITEMS TERMINATED BY '|' 
  MAP KEYS TERMINATED BY ':' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://linux5:8020/user/hive/warehouse/t_map'
;
##创建表的同时指定数据的位置
create table if not exists t_map2(
id int,
score map<string,int>
)
row format delimited fields terminated by ','
collection items terminated by '|'
map keys terminated by ':'
stored as textfile
location '/test'
;
##删除表
drop table test2;

struct的使用，外部表的创建，总结内部表外部表的区别

##创建数据struct.txt的映射表t_struct(使用external关键字并指定数据位置创建外部表)
create external table if not exists t_struct(
id int,
grade struct<score:int,desc:string,point:string>
)
row format delimited fields terminated by ','
collection items terminated by '|'
location '/external'
##查看score>90的信息
select * from t_struct where grade.score>90;
##创建外部表t_struct1
create external table if not exists t_struct1(
id int,
grade struct<score:int,desc:string,point:string>
)
row format delimited fields terminated by ','
collection items terminated by '|'
;
##insert into 方式追加数据
insert into table t_struct1 select * from t_struct;
##删除表：只有元数据被删除，数据文件仍然存储在hdfs上
drop table t_struct;

注意：外部表和内部表的区别就是，如果为外部表的话，删除表的时候只有元数据被删除，数据文件仍然存储在hdfs上。如果为内部表的话，删除表的时候表结构和存在hdfs上的数据会一起被删除。

修改表的属性

例子：首先创建一个表

##创建表log2
CREATE external TABLE log2(
id             string COMMENT 'this is id column',
phonenumber     bigint,
mac             string,
ip               string,
url              string,
status1             string,
status2          string,
up           int,
down           int,
code            int,
dt         String
)
COMMENT 'this is log table'  ##加入描述信息
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'
stored as textfile;
##加载数据
load local data inpath '/home/data.log.txt' into table log2;

修改表名：rename to

alter table原名rename to 新名

alter table log2 rename to log4;

修改列名：change column

alter table 表名 change column 字段名新字段名字段类型【描述信息】;

##修改列名
alter table log4 change column ip myip String;
##修改列名同时加入列的描述
alter table log4 change column myip ip String comment 'this is mysip' ;
##使用after关键字，将修改后的字段放在某个字段后
alter table log4 change column myip ip String comment 'this is myip' after code;
##使用first关键字。将修改的字段调整到第一个字段
alter table log4 change column ip myip int comment 'this is myip' first;

添加列：add columns

##添加列，使用add columns,后面跟括号，括号里面加要加入的字段及字段描述，多个字段用逗号分开
alter table log4 add columns(
x int comment 'this x',
y int
);

删除列：

##删除列，使用replace columns,后面跟括号，括号里面加要删除的字段，多个字段用逗号分开
alter table log4 replace columns(x int,y int);
alter table log4 replace columns(
myip int,
id string, 
phonenumber bigint,
mac string,
url string,
status1 string,
status2 string,
up int,
down int,  
code int,
dt string
);

将内部表转换为外部表:

alter table log4 set tblproperties(
'EXTERNAL' = 'TRUE'
);
alter table log4 set tblproperties(
'EXTERNAL' = 'false'
);
alter table log4 set tblproperties(
'EXTERNAL' = 'FALSE'
);

数据的导入

为创建的hive表添加数据

注意：将数据文件copy到对应的表目录下面(如果是hdfs上的目录，将是剪切)。

从本地加载数据上去

##load方式从本地加载数据，会将数据拷贝到表所对应的hdfs目录
#追加
load data local inpath '本地数据路径' into table tablename
#覆盖
load data local inpath '本地数据路径' overwrite into table tablename

从hdfs上加载数据上去

##load方式从hdfs加载数据,会将数据移动到对应的hdfs目录
#追加
load data inpath 'hdfs数据路径' into table tablename
#覆盖
load data inpath 'hdfs数据路径' into table tablename

从别的表查询数据加载上去

##通过查询语句向表中插入数据
#追加
insert into table table1 select * from table2
#覆盖
insert overwrite into table table1 select * from table2

数据的导出

1 将查询的结果导出到本地

范例：

insert overwrite local directory '/data/export/dept'
select * from dept_partition2;

2 将查询结果格式化导出到本地

例子：

insert overwrite local directory '/data/export/dept2'
row format delimited fields terminated by '\t'
select * from dept_partition2;

3 将查询结果导出到hdfs上

例子：

insert overwrite directory '/data/export/dept2'
row format delimited fields terminated by '\t'
select * from dept_partition2;

4 通过hive的shell命令导出（使用 hive -e 参数）

例子：

[hadoop@hadoop02 apache-hive-1.2.1-bin]$ bin/hive -e 'select * from test3.dept_partition2;' >> /data/dept3.txt

内部表和外部表

内部表： 在Hive 中创建表时，默认情况下Hive 负责管理数据。即，Hive 把数据移入它的"仓库目录" (warehouse directory)

外部表： 由用户来控制数据的创建和删除。外部数据的位置需要在创建表的时候指明。使用EXTERNAL 关键字以后， Hìve 知道数据并不由自己管理，因此不会把数据移到自己的仓库目录。事实上，在定义时，它甚至不会检查这一外部位置是否存在。这是一个非常重要的特性，因为这意味着你可以把创建数据推迟到创建表之后才进行。

区别：丢弃内部表时，这个表(包括它的元数据和数据)会被一起删除。丢弃外部表时，Hive 不会碰数据，只会删除元数据，而不会删除数据文件本身