Hive Table 操作命令汇总

1. 数据类型

1.1 数字类

类型长度备注
TINYINT1字节有符号整型
SMALLINT2字节有符号整型
INT4字节有符号整型
BIGINT8字节有符号整型
FLOAT4字节有符号单精度浮点数
DOUBLE8字节有符号双精度浮点数
DECIMAL可带小数的精确数字字符串

1.2 日期时间类

类型长度备注
TIMESTAMP时间戳,内容格式:yyyy-mm-dd hh:mm:ss[.f…]
DATE日期,内容格式:YYYY­MM­DD
INTERVAL

1.3 字符串类

类型长度备注
STRING字符串
VARCHAR字符数范围1 - 65535长度不定字符串
CHAR最大的字符数:255长度固定字符串

1.4 Misc类

类型长度备注
BOOLEAN布尔类型 TRUE/FALSE
BINARY字节序列

1.5 复合类

类型长度备注
ARRAY包含同类型元素的数组,索引从0开始 ARRAY<data_type>
MAP字典 MAP<primitive_type, data_type>
STRUCT– 结构体 STRUCT<col_name : data_type [COMMENT col_comment], …>
UNIONTYPE联合体 UNIONTYPE<data_type, data_type, …>
1.5.1 复合类测试
1.5.1.1 创建表
drop table if exists complex_table;
create table complex_table(
    c_int int,
    c_array array<string>,
    c_map map<string, string>,
    c_struct struct<name:string,age:int>
)
row format delimited
FIELDS TERMINATED BY '\t'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile;
1.5.1.2 插入数据
insert into complex_table values(
  1, 
  array("array_value1","array_value2"), 
  str_to_map("key1:value1,key2:value2"),
  named_struct("name","Alice", "age",18)
);
1.5.1.3 查询数据
1.5.1.3.1 array
  • 语法: A[n]
    操作类型: A为array类型,n为int类型
    说明:返回数组A中的第n个变量值,数组的起始下标为0,下标超过长度返回null 值。
hive> select c_array[0], c_array[1], c_array[2] from complex_table ;
OK
array_value1	array_value2	NULL
  • size()函数可以查询数组中元素的个数,下标超过长度返回null 值
hive> select size(c_array) from complex_table;
OK
2
  • array_contains()函数可以查询数组中是否包含某个元素

array_contains(数组名,值)
返回 true 或 false

hive> select array_contains(c_array,'array_value1') v1, array_contains(c_array,'array_value3') from complex_table;
OK
true	false
1.5.1.3.2 map类型
  • 数据访问

语法: M[key]
操作类型: M为map类型,key为map中的key值
说明:返回map类型M中key值为指定值的value值,没有的值返回 NULL。

hive> select c_map['key1'], c_map['key3'] from complex_table;
OK
value1	NULL
  • 获取map中的键、值

map_keys()
map_values()

hive> select map_keys(c_map), map_values(c_map) from complex_table;
OK
["key1","key2"]	["value1","value2"]

3.size()函数获取map中键值对的个数

hive> select size(c_map) from complex_table;
OK
2
  • 查询map中是否包含某个键、值

array_contains(map_keys(字段名), 键名)
array_contains(map_values(字段名), 值名)

hive> select 
          array_contains(map_keys(c_map), 'key1'),
          array_contains(map_keys(c_map), 'key3'),
          array_contains(map_values(c_map), 'value1'),
          array_contains(map_values(c_map), 'value3')
     from complex_table;
OK
true	false	true	false
1.5.1.3.3 struct类型

语法: S.x
操作类型: S为struct类型
说明:返回集合S中的x字段

hive> select c_struct.name, c_struct.age from complex_table;
OK
Alice	18
1.5.2 union 测试
1.5.2.1 创建表
create table union_testnew(
  foo uniontype<int, double, string, array<string>, map<string, string>>
)
row format delimited
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile;
1.5.2.2 数据准备
vim union_test.log 
0,1
1,3.0
2,world
3,wade:tom:polly
4,k1^Dv1:k2^Dv2
1.5.2.3 导入数据
hive> load data local inpath './union_test.log' overwrite into table union_testnew;
1.5.2.4 查询数据
hive> select * from union_testnew;
OK
union_testnew.foo
{0:1}
{1:3.0}
{2:"world"}
{3:["wade","tom","polly"]}
{4:{"k1":"v1","k2":"v2"}}
Time taken: 0.225 seconds, Fetched: 5 row(s)

Hive 官方文档:LanguageManual Types

2 创建表

所有的操作在 mydb。

CREATE DATABASE IF NOT EXISTS mydb;
use mydb;

2.1 CREATE TABLE

指定文件格式和分割符。

CREATE TABLE IF NOT EXISTS employees (
 name STRING,
 salary FLOAT,
 subordinates ARRAY<STRING>,
 deductions MAP<STRING, FLOAT>,
 address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

2.2 CREATE TABLE LIKE

用于创建表结构相同的表。

CREATE TABLE IF NOT EXISTS mydb.employee2 LIKE mydb.employees;

SHOW TABLES

  • 显示当前数据库的所有的表。
hive> SHOW TABLES;
OK
employee2
employees
Time taken: 0.063 seconds, Fetched: 2 row(s)
hive> 
  • 显示指定数据库的所有的表。
hive> SHOW TABLES IN mydb;
OK
employee2
employees
Time taken: 0.05 seconds, Fetched: 2 row(s)
  • 使用通配符,‘.’ 代表任意一个字符,‘*’ 代表 0 个或多个字符
hive> SHOW TABLES 'empl.*';
OK
employee2
employees
Time taken: 0.049 seconds, Fetched: 2 row(s)

DESC table 显示表的信息

DESC

普通 desc 只显示表的所有字段

hive> DESC employees;
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
Time taken: 0.078 seconds, Fetched: 5 row(s)

DESC EXTENDED

加上 EXTENDED 关键字则显示表的详细信息。如表在在的数据库,owner,创建时间,序列化信息,统计信息(文件数,总数据量,行数等)。但是详细信息显示在一行,不容易阅读,用 FORMATTED 关键字以比较容易阅读的方式输出。

hive> DESC EXTENDED employees;
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
	 	 
Detailed Table Information	Table(tableName:employees, dbName:mydb, owner:houzhizhen, createTime:1635924848, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)], location:file:/user/hive/warehouse/mydb.db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{mapkey.delim=, collection.delim=, serialization.format=, line.delim=\n, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"deductions\":\"true\",\"name\":\"true\",\"salary\":\"true\",\"subordinates\":\"true\"}}, numFiles=0, transient_lastDdlTime=1635924848, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)	
Time taken: 0.178 seconds, Fetched: 7 row(s)

DESC FORMATTED

加上 FORMATTED 关键字,比较容易阅读。

hive> DESC FORMATTED employees;
OK
# col_name            	data_type           	comment             
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
	 	 
# Detailed Table Information	 	 
Database:           	mydb                	 
OwnerType:          	USER                	 
Owner:              	houzhizhen          	 
CreateTime:         	Wed Nov 03 15:34:08 CST 2021	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	file:/user/hive/warehouse/mydb.db/employees	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"deductions\":\"true\",\"name\":\"true\",\"salary\":\"true\",\"subordinates\":\"true\"}}
	bucketing_version   	2                   
	numFiles            	0                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	0                   
	transient_lastDdlTime	1635924848          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	collection.delim    	\u0002              
	field.delim         	\u0001              
	line.delim          	\n                  
	mapkey.delim        	\u0003              
	serialization.format	\u0001              
Time taken: 0.099 seconds, Fetched: 39 row(s)

Managed Tables and External Tables

Managed Tables

Hive 管理的表,创建表时,没有 external 关键字。当删除表时,不仅删除对应的元数据,还删除对应的文件。

External Tables

外部表,和 Managed Tables 对应,Hive 不管理数据。当删除表时,不删除对应的文件,仅删除对应的元数据。

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (
c_exchange string,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/stocks';

外部表转为内部表

alter table stocks set tblproperties('EXTERNAL'='FALSE');

删除外部表时删除数据

Hive 4.0 及以后版本才支持:HIVE-19753

alter table table_name set tblproperties('external.table.purge'='TRUE');

Partitioned Managed Tables

分区表,创建分区表加上 PARTITIONED BY 关键字。

CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);

SHOW PARTITIONS

显示表的所有分区

hive> SHOW PARTITIONS employees;
OK
country=US/state=AB
country=US/state=AC
country=US/state=CA

SHOW PARTITIONS

添加其中一个限制分区

hive> SHOW PARTITIONS employees partition(country='US');
OK
country=US/state=AB
country=US/state=AC
country=US/state=CA
DESC PARTITION

显示分区的列信息。

hive> DESC employees PARTITION(country='US',state='AB');
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
Time taken: 0.162 seconds, Fetched: 12 row(s)
DESC EXTENDED PARTITION

和desc extended TABLE 一样,输出分区的详细信息。

hive> DESC EXTENDED employees PARTITION(country='US',state='AB');
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
	 	 
Detailed Partition Information	Partition(values:[US, AB], dbName:mydb, tableName:employees, createTime:1635928495, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null), FieldSchema(name:country, type:string, comment:null), FieldSchema(name:state, type:string, comment:null)], location:file:/user/hive/warehouse/mydb.db/employees/country=US/state=AB, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{transient_lastDdlTime=1635928495, totalSize=1, numRows=0, rawDataSize=0, numFiles=1}, catName:hive)	
Time taken: 0.192 seconds, Fetched: 14 row(s)
DESC FORMATTED PARTITION

和 DESC FORMATTED TABLE 显示类似。

ALTER TABLE – 修改表

PARTITION – 和分区相关的修改

ADD PARTITION – 增加分区
ALTER TABLE employees ADD IF NOT EXISTS PARTITION(country='US',state='AD') LOCATION 'file:/user/hive/warehouse/mydb.db/employees/country=US/state=AD';
SET LOCATION OF PARTITION – 设置分区的位置
ALTER TABLE employees PARTITION(country='US',state='AB') SET LOCATION 'file:/user/hive/warehouse/mydb.db/employees/country=US/state=AC';
DROP PARTITION – 删除分区
ALTER TABLE employees DROP IF EXISTS PARTITION(country='US',state='AB');

CHANGE COLUMNS – 修改列

hive> DESC employees;
OK
name                	string              	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
Time taken: 0.069 seconds, Fetched: 12 row(s)
CHANGE COLUMN NAME AND TYPE – 修改列名和类型
ALTER TABLE employees CHANGE COLUMN name full_name VARCHAR(255);
hive> DESC employees;
OK
full_name           	varchar(255)        	                    
salary              	float               	                    
subordinates        	array<string>       	                    
deductions          	map<string,float>   	                    
address             	struct<street:string,city:string,state:string,zip:int>	                    
country             	string              	                    
state               	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
country             	string              	                    
state               	string              	                    
Time taken: 0.056 seconds, Fetched: 12 row(s)
CHANGE COLUMN POSITION – 修改列位置
SET name after salary – 放到指定的列之后
hive> ALTER TABLE employees CHANGE COLUMN full_name name String COMMENT 'comment'  AFTER salary;
OK
Time taken: 0.119 seconds
SET name first – 修改为第一列
hive> ALTER TABLE employees CHANGE COLUMN name name String COMMENT 'comment'  FIRST;
OK
Time taken: 0.119 seconds

Adding columns – 增加列

Add new columns to the end of the existing columns.

ALTER TABLE employees ADD COLUMNS (
     age INT,
     sex boolean);

Replacing Columns – 删除或替换所有的列

Removes all the existing columns and replaces them with the new columns specified.

ALTER TABLE employees REPLACE COLUMNS (
`name` string COMMENT 'comment', 
  `salary` varchar(255), 
  `subordinates1` array<string>, 
  `deductions` map<string,float>, 
  `address` struct<street:string,city:string,state:string,zip:int>, 
  `age` int, 
  `sex` boolean,
  married boolean);

如果因为格式对不上报错,只能先删除,再重建。

Alter Table Properties – 修改列的属性

ALTER TABLE employees SET TBLPROPERTIES(
'notes'=' The employees of the whole company');

Alter Storage Properties – 修改存储的属性

如表 employees 有 3 个分区。

hive> show partitions employees;
OK
country=US/state=AC
country=US/state=AD
country=US/state=CA
设置表的文件格式
alter table employees set fileformat textfile;

文件格式可以选择 textile, orc, parquet, SEQUENCEFILE。如果表是分区表,则只有新建的分区使用表默认的存储格式,以前的分区不会改。

设置分区文件格式
ALTER TABLE employees PARTITION(country='US',state='AC') SET FILEFORMAT SEQUENCEFILE;
SET SERDE – 设置序列化信息
ALTER TABLE table_name SET SERDE 'com.example.JSONSerDe' 
WITH SERDEPROPERTIES(
  'prop1' = 'value1',
  'prop2' = 'value2',
)
SET Serde Properties – 设置序列化器用到的属性
ALTER TABLE table_name
SET SERDEPROPERTIES(
  'prop1' = 'value1',
  'prop2' = 'value2',
)
Cluster by, Sorted by, Buckets – 设置Cluster by, Sorted by, Buckets
ALTER TABLE employees CLUSTERED BY (name)
SORTED BY (age)
INTO 48 BUCKETS;
修改表的 comment
ALTER TABLE test_t SET TBLPROPERTIES ('comment' = '中文表-comment');
修改字段的 comment (注意字段名写 2 次)
ALTER TABLE test_t CHANGE c1 c1 double  COMMENT '中文字段-c1-comment';
TOUCH – 会重新触发增加分区的 hook

ALTER TABLE ... TOUCH statement is used to trigger these hooks.

ALTER TABLE employees TOUCH PARTITION(country='US',state='AC');

NODROP (Disabled in Hive 3)

ALTER TABLE employees PARTITION(country='US',state='AC') ENABLE NO_DROP;

OFFLINE (Disabled in Hive 3)

ALTER TABLE employees PARTITION(country='US',state='AC') ENABLE OFFLINE;
ARCHIVE and UNCHARIVE (Disabled in Hive 3)

ARCHIVE captures the partition files into a Hadoop archive(HAR) file.

ALTER TABLE employees ARCHIVE PARTITION(country='US',state='AC');
  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hive是一个基于Hadoop数据仓库工具,用于进行大规模数据分析和查询。下面是Hive的一些基本操作命令: 1. 使用命令`show databases;`可以查看当前所有的数据库。 2. 使用命令`CREATE DATABASE park;`可以创建一个名为park的数据库。实际上,创建数据库相当于在Hadoop的HDFS文件系统中创建了一个目录节点,统一存在`/usr/hive/warehouse`目录下。 3. 使用命令`USE park;`可以进入park数据库。 4. 使用命令`show tables;`可以查看当前数据库下的所有表。 5. 使用命令`CREATE TABLE stu (id INT, name STRING);`可以创建一个名为stu的表,其中包含id和name两个字段。在Hive中,使用的是STRING类型来表示字符,而不是CHAR或VARCHAR类型。所创建的表实际上也是HDFS中的一个目录节点。默认情况下,所有在default数据库下创建的表都直接存在`/usr/hive/warehouse`目录下。 6. 使用命令`INSERT INTO TABLE stu VALUES (1, 'John');`可以向stu表中插入数据。HDFS不支持数据的修改和删除,但在Hive 2.0版本后开始支持数据的追加,可以使用`INSERT INTO`语句执行追加操作。Hive支持查询和行级别的插入,但不支持行级别的删除和修改。实际上,Hive的操作是通过执行MapReduce任务来完成的。插入数据后,我们可以在HDFS的stu目录下发现多了一个文件,其中存储了插入的数据。因此,可以得出结论:Hive存储的数据是通过HDFS的文件来存储的。 7. 使用命令`SELECT id FROM stu;`可以查询stu表中的数据,并只返回id字段的值。 需要注意的是,如果想在HDFS目录下自己创建一个分区,并在该分区下上传文件,需要执行一些额外的操作。首先,手动创建的分区在Hive中是无法使用的,因为原数据库中没有记录该分区的信息。要让自己创建的分区被识别,需要执行命令`ALTER TABLE book ADD PARTITION (category = 'jp') LOCATION '/user/hive/warehouse/park.db/book/category=jp';`。这条命令的作用是在原数据表book中创建对应的分区信息。另外,还可以使用命令`ALTER TABLE book****** 'nn');`来修改分区。 希望以上信息能对你有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值