Hive Table 操作命令汇总

houzhizhen

已于 2023-02-21 11:09:00 修改

阅读量2.1k

点赞数 1

分类专栏： hive 文章标签： hive hadoop 数据仓库

于 2021-11-04 14:52:57 首次发布

本文链接：https://blog.csdn.net/houzhizhen/article/details/121118121

版权

hive 专栏收录该内容

154 篇文章 15 订阅

订阅专栏

1. 数据类型

1.1 数字类

类型	长度	备注
TINYINT	1字节	有符号整型
SMALLINT	2字节	有符号整型
INT	4字节	有符号整型
BIGINT	8字节	有符号整型
FLOAT	4字节	有符号单精度浮点数
DOUBLE	8字节	有符号双精度浮点数
DECIMAL	–	可带小数的精确数字字符串

1.2 日期时间类

类型	长度	备注
TIMESTAMP	–	时间戳，内容格式：yyyy-mm-dd hh:mm:ss[.f…]
DATE	–	日期，内容格式：YYYYMMDD
INTERVAL	–	–

1.3 字符串类

类型	长度	备注
STRING	–	字符串
VARCHAR	字符数范围1 - 65535	长度不定字符串
CHAR	最大的字符数：255	长度固定字符串

1.4 Misc类

类型	长度	备注
BOOLEAN	–	布尔类型 TRUE/FALSE
BINARY	–	字节序列

1.5 复合类

类型	长度	备注
ARRAY	–	包含同类型元素的数组，索引从0开始 ARRAY<data_type>
MAP	–	字典 MAP<primitive_type, data_type>
STRUCT	– 结构体 STRUCT<col_name : data_type [COMMENT col_comment], …>
UNIONTYPE	–	联合体 UNIONTYPE<data_type, data_type, …>

1.5.1 复合类测试

1.5.1.1 创建表

drop table if exists complex_table;
create table complex_table(
    c_int int,
    c_array array<string>,
    c_map map<string, string>,
    c_struct struct<name:string,age:int>
)
row format delimited
FIELDS TERMINATED BY '\t'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile;

1.5.1.2 插入数据

insert into complex_table values(
  1, 
  array("array_value1","array_value2"), 
  str_to_map("key1:value1,key2:value2"),
  named_struct("name","Alice", "age",18)
);

1.5.1.3 查询数据

1.5.1.3.1 array

语法: A[n]
操作类型: A为array类型，n为int类型
说明：返回数组A中的第n个变量值，数组的起始下标为0，下标超过长度返回null 值。

hive> select c_array[0], c_array[1], c_array[2] from complex_table ;
OK
array_value1	array_value2	NULL

size()函数可以查询数组中元素的个数，下标超过长度返回null 值

hive> select size(c_array) from complex_table;
OK
2

array_contains()函数可以查询数组中是否包含某个元素

array_contains(数组名，值)
返回 true 或 false

hive> select array_contains(c_array,'array_value1') v1, array_contains(c_array,'array_value3') from complex_table;
OK
true	false

1.5.1.3.2 map类型

数据访问

语法: M[key]
操作类型: M为map类型，key为map中的key值
说明：返回map类型M中key值为指定值的value值，没有的值返回 NULL。

hive> select c_map['key1'], c_map['key3'] from complex_table;
OK
value1	NULL

获取map中的键、值

map_keys()
map_values()

hive> select map_keys(c_map), map_values(c_map) from complex_table;
OK
["key1","key2"]	["value1","value2"]

3.size()函数获取map中键值对的个数

hive> select size(c_map) from complex_table;
OK
2

查询map中是否包含某个键、值

array_contains(map_keys(字段名), 键名)
array_contains(map_values(字段名), 值名)

hive> select 
          array_contains(map_keys(c_map), 'key1'),
          array_contains(map_keys(c_map), 'key3'),
          array_contains(map_values(c_map), 'value1'),
          array_contains(map_values(c_map), 'value3')
     from complex_table;
OK
true	false	true	false

1.5.1.3.3 struct类型

语法: S.x
操作类型: S为struct类型
说明：返回集合S中的x字段

hive> select c_struct.name, c_struct.age from complex_table;
OK
Alice	18

1.5.2 union 测试

1.5.2.1 创建表

create table union_testnew(
  foo uniontype<int, double, string, array<string>, map<string, string>>
)
row format delimited
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile;

1.5.2.2 数据准备

vim union_test.log 
0,1
1,3.0
2,world
3,wade:tom:polly
4,k1^Dv1:k2^Dv2

1.5.2.3 导入数据

hive> load data local inpath './union_test.log' overwrite into table union_testnew;

1.5.2.4 查询数据

hive> select * from union_testnew;
OK
union_testnew.foo
{0:1}
{1:3.0}
{2:"world"}
{3:["wade","tom","polly"]}
{4:{"k1":"v1","k2":"v2"}}
Time taken: 0.225 seconds, Fetched: 5 row(s)

Hive 官方文档：LanguageManual Types

2 创建表

所有的操作在 mydb。

CREATE DATABASE IF NOT EXISTS mydb;
use mydb;

2.1 CREATE TABLE

指定文件格式和分割符。

CREATE TABLE IF NOT EXISTS employees (
 name STRING,
 salary FLOAT,
 subordinates ARRAY<STRING>,
 deductions MAP<STRING, FLOAT>,
 address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

2.2 CREATE TABLE LIKE

用于创建表结构相同的表。

CREATE TABLE IF NOT EXISTS mydb.employee2 LIKE mydb.employees;

SHOW TABLES

显示当前数据库的所有的表。

hive> SHOW TABLES;
OK
employee2
employees
Time taken: 0.063 seconds, Fetched: 2 row(s)
hive>

显示指定数据库的所有的表。

hive> SHOW TABLES IN mydb;
OK
employee2
employees
Time taken: 0.05 seconds, Fetched: 2 row(s)

使用通配符，‘.’ 代表任意一个字符，‘*’ 代表 0 个或多个字符

hive> SHOW TABLES 'empl.*';
OK
employee2
employees
Time taken: 0.049 seconds, Fetched: 2 row(s)

DESC table 显示表的信息

DESC

普通 desc 只显示表的所有字段

hive> DESC employees;
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
Time taken: 0.078 seconds, Fetched: 5 row(s)

DESC EXTENDED

加上 EXTENDED 关键字则显示表的详细信息。如表在在的数据库，owner,创建时间，序列化信息，统计信息（文件数，总数据量，行数等）。但是详细信息显示在一行，不容易阅读，用 FORMATTED 关键字以比较容易阅读的方式输出。

hive> DESC EXTENDED employees;
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
	 	 
Detailed Table Information	Table(tableName:employees, dbName:mydb, owner:houzhizhen, createTime:1635924848, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)], location:file:/user/hive/warehouse/mydb.db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{mapkey.delim=, collection.delim=, serialization.format=, line.delim=\n, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"deductions\":\"true\",\"name\":\"true\",\"salary\":\"true\",\"subordinates\":\"true\"}}, numFiles=0, transient_lastDdlTime=1635924848, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)	
Time taken: 0.178 seconds, Fetched: 7 row(s)

DESC FORMATTED

加上 FORMATTED 关键字，比较容易阅读。

hive> DESC FORMATTED employees;
OK
# col_name            	data_type           	comment             
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
	 	 
# Detailed Table Information	 	 
Database:           	mydb                	 
OwnerType:          	USER                	 
Owner:              	houzhizhen          	 
CreateTime:         	Wed Nov 03 15:34:08 CST 2021	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	file:/user/hive/warehouse/mydb.db/employees	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"deductions\":\"true\",\"name\":\"true\",\"salary\":\"true\",\"subordinates\":\"true\"}}
	bucketing_version   	2                   
	numFiles            	0                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	0                   
	transient_lastDdlTime	1635924848          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	collection.delim    	\u0002              
	field.delim         	\u0001              
	line.delim          	\n                  
	mapkey.delim        	\u0003              
	serialization.format	\u0001              
Time taken: 0.099 seconds, Fetched: 39 row(s)

Managed Tables and External Tables

Managed Tables

Hive 管理的表，创建表时，没有 external 关键字。当删除表时，不仅删除对应的元数据，还删除对应的文件。

External Tables

外部表，和 Managed Tables 对应，Hive 不管理数据。当删除表时，不删除对应的文件，仅删除对应的元数据。

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (
c_exchange string,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/stocks';

外部表转为内部表

alter table stocks set tblproperties('EXTERNAL'='FALSE');

删除外部表时删除数据

Hive 4.0 及以后版本才支持：HIVE-19753

alter table table_name set tblproperties('external.table.purge'='TRUE');

Partitioned Managed Tables

分区表，创建分区表加上 PARTITIONED BY 关键字。

CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);

SHOW PARTITIONS

显示表的所有分区

hive> SHOW PARTITIONS employees;
OK
country=US/state=AB
country=US/state=AC
country=US/state=CA

SHOW PARTITIONS

添加其中一个限制分区

hive> SHOW PARTITIONS employees partition(country='US');
OK
country=US/state=AB
country=US/state=AC
country=US/state=CA

DESC PARTITION

显示分区的列信息。

hive> DESC employees PARTITION(country='US',state='AB');
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
Time taken: 0.162 seconds, Fetched: 12 row(s)

DESC EXTENDED PARTITION

和desc extended TABLE 一样，输出分区的详细信息。

hive> DESC EXTENDED employees PARTITION(country='US',state='AB');
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
	 	 
Detailed Partition Information	Partition(values:[US, AB], dbName:mydb, tableName:employees, createTime:1635928495, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null), FieldSchema(name:country, type:string, comment:null), FieldSchema(name:state, type:string, comment:null)], location:file:/user/hive/warehouse/mydb.db/employees/country=US/state=AB, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{transient_lastDdlTime=1635928495, totalSize=1, numRows=0, rawDataSize=0, numFiles=1}, catName:hive)	
Time taken: 0.192 seconds, Fetched: 14 row(s)

DESC FORMATTED PARTITION

和 DESC FORMATTED TABLE 显示类似。

ALTER TABLE – 修改表

PARTITION – 和分区相关的修改

ADD PARTITION – 增加分区

ALTER TABLE employees ADD IF NOT EXISTS PARTITION(country='US',state='AD') LOCATION 'file:/user/hive/warehouse/mydb.db/employees/country=US/state=AD';

SET LOCATION OF PARTITION – 设置分区的位置

ALTER TABLE employees PARTITION(country='US',state='AB') SET LOCATION 'file:/user/hive/warehouse/mydb.db/employees/country=US/state=AC';

DROP PARTITION – 删除分区

ALTER TABLE employees DROP IF EXISTS PARTITION(country='US',state='AB');

CHANGE COLUMNS – 修改列

hive> DESC employees;
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
Time taken: 0.069 seconds, Fetched: 12 row(s)

CHANGE COLUMN NAME AND TYPE – 修改列名和类型

ALTER TABLE employees CHANGE COLUMN name full_name VARCHAR(255);

hive> DESC employees;
OK
full_name           	varchar(255)        	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
Time taken: 0.056 seconds, Fetched: 12 row(s)

CHANGE COLUMN POSITION – 修改列位置

SET `name` after `salary` – 放到指定的列之后

hive> ALTER TABLE employees CHANGE COLUMN full_name name String COMMENT 'comment'  AFTER salary;
OK
Time taken: 0.119 seconds

SET `name` first – 修改为第一列

hive> ALTER TABLE employees CHANGE COLUMN name name String COMMENT 'comment'  FIRST;
OK
Time taken: 0.119 seconds

Adding columns – 增加列

Add new columns to the end of the existing columns.

ALTER TABLE employees ADD COLUMNS (
     age INT,
     sex boolean);

Replacing Columns – 删除或替换所有的列

Removes all the existing columns and replaces them with the new columns specified.

ALTER TABLE employees REPLACE COLUMNS (
`name` string COMMENT 'comment', 
  `salary` varchar(255), 
  `subordinates1` array<string>, 
  `deductions` map<string,float>, 
  `address` struct<street:string,city:string,state:string,zip:int>, 
  `age` int, 
  `sex` boolean,
  married boolean);

如果因为格式对不上报错，只能先删除，再重建。

Alter Table Properties – 修改列的属性

ALTER TABLE employees SET TBLPROPERTIES(
'notes'=' The employees of the whole company');

Alter Storage Properties – 修改存储的属性

如表 employees 有 3 个分区。

hive> show partitions employees;
OK
country=US/state=AC
country=US/state=AD
country=US/state=CA

设置表的文件格式

alter table employees set fileformat textfile;

文件格式可以选择 textile, orc, parquet， SEQUENCEFILE。如果表是分区表，则只有新建的分区使用表默认的存储格式，以前的分区不会改。

设置分区文件格式

ALTER TABLE employees PARTITION(country='US',state='AC') SET FILEFORMAT SEQUENCEFILE;

SET SERDE – 设置序列化信息

ALTER TABLE table_name SET SERDE 'com.example.JSONSerDe' 
WITH SERDEPROPERTIES(
  'prop1' = 'value1',
  'prop2' = 'value2',
)

SET Serde Properties – 设置序列化器用到的属性

ALTER TABLE table_name
SET SERDEPROPERTIES(
  'prop1' = 'value1',
  'prop2' = 'value2',
)

Cluster by, Sorted by, Buckets – 设置Cluster by, Sorted by, Buckets

ALTER TABLE employees CLUSTERED BY (name)
SORTED BY (age)
INTO 48 BUCKETS;

修改表的 comment

ALTER TABLE test_t SET TBLPROPERTIES ('comment' = '中文表-comment');

修改字段的 comment (注意字段名写 2 次)

ALTER TABLE test_t CHANGE c1 c1 double  COMMENT '中文字段-c1-comment';

TOUCH – 会重新触发增加分区的 hook

ALTER TABLE ... TOUCH statement is used to trigger these hooks.

ALTER TABLE employees TOUCH PARTITION(country='US',state='AC');

NODROP (Disabled in Hive 3)

ALTER TABLE employees PARTITION(country='US',state='AC') ENABLE NO_DROP;

OFFLINE (Disabled in Hive 3)

ALTER TABLE employees PARTITION(country='US',state='AC') ENABLE OFFLINE;

ARCHIVE and UNCHARIVE (Disabled in Hive 3)

ARCHIVE captures the partition files into a Hadoop archive(HAR) file.

ALTER TABLE employees ARCHIVE PARTITION(country='US',state='AC');

houzhizhen

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
Hive Table 操作命令汇总

如表在在的数据库，owner,创建时间，序列化信息，统计信息（文件数，总数据量，行数等）。但是详细信息显示在一行，不容易阅读，用 FORMATTED 关键字以比较容易阅读的方式输出。如果表是分区表，则只有新建的分区使用表默认的存储格式，以前的分区不会改。外部表，和 Managed Tables 对应，Hive 不管理数据。当删除表时，不删除对应的文件，仅删除对应的元数据。Hive 管理的表，创建表时，没有 external 关键字。当删除表时，不仅删除对应的元数据，还删除对应的文件。指定文件格式和分割符。.
复制链接

扫一扫