Hive基本使用(1)

# Hive DDL数据定义

1.显示数据库

hive (dyhtest)> show databases;
OK
database_name
default
dyhtest
Time taken: 0.022 seconds, Fetched: 2 row(s)

--- 过滤显示查询的数据库,使用模糊匹配
hive (dyhtest)> show databases like 'db_hive*';
OK
database_name
db_hive
db_hive_1
Time taken: 0.034 seconds, Fetched: 2 row(s)

2.创建数据库

语法:

CREATE DATABASE [IF NOT EXISTS] database_name #库是否已经存在
[COMMENT database_comment] # 库注释信息
[LOCATION hdfs_path] # 可指定路径
[WITH DBPROPERTIES (property_name=property_value, ...)];# 给创建的库加属性描述

注意:
未指定location,默认的是:

---创建一个数据库,数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.db。

hive (dyhtest)> desc database db_hive;
OK
db_name	comment	location	owner_name	owner_type	parameters
db_hive		hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db	atdyh	USER	
Time taken: 0.027 seconds, Fetched: 1 row(s)

-- 创建数据库
hive (dyhtest)>  create database if not exists mydb   
              > comment "my first db"
              > with dbproperties("createtime"="2021-04-24");
OK
Time taken: 0.077 seconds

-- 查看下是否创建成功
hive (dyhtest)> show databases;
OK
database_name
db_hive
db_hive_1
default
dyhtest
mydb
Time taken: 0.021 seconds, Fetched: 5 row(s)

注意:
避免要创建的数据库已经存在错误,增加if not exists判断。

-- 显示库已经存在
hive (dyhtest)> create database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists

---加上就不会出错
hive (dyhtest)> create database if not exists  db_hive;
OK
Time taken: 0.018 seconds
hive (dyhtest)> show databases;
OK
database_name
db_hive
db_hive_1
default
dyhtest
Time taken: 0.024 seconds, Fetched: 4 row(s)

3.查看数据库详情

  • 显示数据库信息
hive (dyhtest)> desc database db_hive;
OK
db_name	comment	location	owner_name	owner_type	parameters
db_hive		hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db	atdyh	USER	
Time taken: 0.027 seconds, Fetched: 1 row(s)

  • 显示数据库详细信息,extended
hive (dyhtest)> desc database extended mydb ; 
OK
db_name	comment	location	owner_name	owner_type	parameters
mydb	my first db	hdfs://hadoop102:9820/user/hive/warehouse/mydb.db	atdyh	USER	{createtime=2021-04-24}
Time taken: 0.033 seconds, Fetched: 1 row(s)

注意:加上extended关键字,把创建库的时候添加的参数都会显示出来

  • 切换当前数据库
hive (dyhtest)> use db_hive;
OK
Time taken: 0.026 seconds
-- 数据库由 dyhtest切换为db_hive
hive (db_hive)> 

注意:
1.hive创建查看数据库相关的指令其实就是去元数据库去查询相关的信息

--  连接mysql
[atdyh@hadoop102 ~]$ mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 83
Server version: 5.7.28 MySQL Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
-- 切换元数据库
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| metastore          |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.01 sec)

mysql> use metastore;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
--- 查看元数据库下的表
mysql> show tables;
+-------------------------------+
| Tables_in_metastore           |
+-------------------------------+
| AUX_TABLE                     |
| BUCKETING_COLS                |
| CDS                           |
| COLUMNS_V2                    |
| COMPACTION_QUEUE              |
| COMPLETED_COMPACTIONS         |
| COMPLETED_TXN_COMPONENTS      |
| CTLGS                         |
| DATABASE_PARAMS               |
| DBS                           |
| DB_PRIVS                      |
| DELEGATION_TOKENS             |
| FUNCS                         |
| FUNC_RU                       |
| GLOBAL_PRIVS                  |
| HIVE_LOCKS                    |
| IDXS                          |
| INDEX_PARAMS                  |
| I_SCHEMA                      |
| KEY_CONSTRAINTS               |
| MASTER_KEYS                   |
| MATERIALIZATION_REBUILD_LOCKS |
| METASTORE_DB_PROPERTIES       |
| MIN_HISTORY_LEVEL             |
| MV_CREATION_METADATA          |
| MV_TABLES_USED                |
| NEXT_COMPACTION_QUEUE_ID      |
| NEXT_LOCK_ID                  |
| NEXT_TXN_ID                   |
| NEXT_WRITE_ID                 |
| NOTIFICATION_LOG              |
| NOTIFICATION_SEQUENCE         |
| NUCLEUS_TABLES                |
| PARTITIONS                    |
| PARTITION_EVENTS              |
| PARTITION_KEYS                |
| PARTITION_KEY_VALS            |
| PARTITION_PARAMS              |
| PART_COL_PRIVS                |
| PART_COL_STATS                |
| PART_PRIVS                    |
| REPL_TXN_MAP                  |
| ROLES                         |
| ROLE_MAP                      |
| RUNTIME_STATS                 |
| SCHEMA_VERSION                |
| SDS                           |
| SD_PARAMS                     |
| SEQUENCE_TABLE                |
| SERDES                        |
| SERDE_PARAMS                  |
| SKEWED_COL_NAMES              |
| SKEWED_COL_VALUE_LOC_MAP      |
| SKEWED_STRING_LIST            |
| SKEWED_STRING_LIST_VALUES     |
| SKEWED_VALUES                 |
| SORT_COLS                     |
| TABLE_PARAMS                  |
| TAB_COL_STATS                 |
| TBLS                          |
| TBL_COL_PRIVS                 |
| TBL_PRIVS                     |
| TXNS                          |
| TXN_COMPONENTS                |
| TXN_TO_WRITE_ID               |
| TYPES                         |
| TYPE_FIELDS                   |
| VERSION                       |
| WM_MAPPING                    |
| WM_POOL                       |
| WM_POOL_TO_TRIGGER            |
| WM_RESOURCEPLAN               |
| WM_TRIGGER                    |
| WRITE_SET                     |
+-------------------------------+
74 rows in set (0.00 sec)
--- 查看元数据存储的表DBS
mysql> show create table DBS;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| DBS   | CREATE TABLE `DBS` (
  `DB_ID` bigint(20) NOT NULL,
  `DESC` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `DB_LOCATION_URI` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  `NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `OWNER_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `OWNER_TYPE` varchar(10) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `CTLG_NAME` varchar(256) NOT NULL DEFAULT 'hive',
  PRIMARY KEY (`DB_ID`),
  UNIQUE KEY `UNIQUE_DATABASE` (`NAME`,`CTLG_NAME`),
  KEY `CTLG_FK1` (`CTLG_NAME`),
  CONSTRAINT `CTLG_FK1` FOREIGN KEY (`CTLG_NAME`) REFERENCES `CTLGS` (`NAME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

--- 查看表内容,存储的都是hive数据库相关的信息
		-- id 库的注释 location 库名字 创建用户名 创建用户类型 客户端 
mysql> mysql> select * from DBS;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| DB_ID | DESC                  | DB_LOCATION_URI                                        | NAME      | OWNER_NAME | OWNER_TYPE | CTLG_NAME |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
|     1 | Default Hive database | hdfs://hadoop102:9820/user/hive/warehouse              | default   | public     | ROLE       | hive      |
|     6 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db   | dyhtest   | atdyh      | USER       | hive      |
|    11 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db   | db_hive   | atdyh      | USER       | hive      |
|    12 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive_1.db | db_hive_1 | atdyh      | USER       | hive      |
|    13 | my first db           | hdfs://hadoop102:9820/user/hive/warehouse/mydb.db      | mydb      | atdyh      | USER       | hive      |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
5 rows in set (0.00 sec)

4.修改数据库

用户可以使用ALTER DATABASE命令为某个数据库的DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。

hive (db_hive)> alter database mydb set dbproperties("createtime"="2020-04-24","author"="wyh");
OK
Time taken: 0.098 seconds
-- 查看是否修改成功
hive (db_hive)> desc database extended mydb ; 
OK
db_name	comment	location	owner_name	owner_type	parameters
mydb	my first db	hdfs://hadoop102:9820/user/hive/warehouse/mydb.db	atdyh	USER	{createtime=2020-04-24, author=wyh}
Time taken: 0.034 seconds, Fetched: 1 row(s)

上述操作其实就是hive底层修改元数据,可在元数据存储(mysql)查看到修改记录,


mysql> select * from DBS;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
| DB_ID | DESC                  | DB_LOCATION_URI                                        | NAME      | OWNER_NAME | OWNER_TYPE | CTLG_NAME |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
|     1 | Default Hive database | hdfs://hadoop102:9820/user/hive/warehouse              | default   | public     | ROLE       | hive      |
|     6 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db   | dyhtest   | atdyh      | USER       | hive      |
|    11 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive.db   | db_hive   | atdyh      | USER       | hive      |
|    12 | NULL                  | hdfs://hadoop102:9820/user/hive/warehouse/db_hive_1.db | db_hive_1 | atdyh      | USER       | hive      |
|    13 | my first db           | hdfs://hadoop102:9820/user/hive/warehouse/mydb.db      | mydb      | atdyh      | USER       | hive      |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+-----------+
5 rows in set (0.00 sec)

-- 修改记录
mysql> 
select * from DATABASE_PARAMS;
+-------+------------+-------------+
| DB_ID | PARAM_KEY  | PARAM_VALUE |
+-------+------------+-------------+
|    13 | author     | wyh         |
|    13 | createtime | 2020-04-24  |
+-------+------------+-------------+
2 rows in set (0.00 sec)

5.删除数据库

  • 如果删除的数据库不存在,最好采用 if exists判断数据库是否存在
hive (dyhtest)> drop database if  exists db_hive_1;
OK
Time taken: 0.026 seconds
  • 如果数据库不为空,可以采用cascade命令,强制删除
-- db_hive 不为空
hive (db_hive)> show tables;
OK
tab_name
mytbl
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive (db_hive)> drop database db_hive;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
-- 级联删除
hive (db_hive)> drop database db_hive cascade ; 
OK
Time taken: 0.427 seconds
-- 查看数据库
hive (db_hive)> use dyhtest;
OK
Time taken: 0.027 seconds
-- 成功删除
hive (dyhtest)> show databases;;
OK
database_name
default
dyhtest
mydb
Time taken: 0.019 seconds, Fetched: 3 row(s)
  • 删除空数据库
hive (dyhtest)> drop database db_hive_1;
OK
Time taken: 0.259 seconds

6.建表

  • 建表语法
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name   -- EXTERANL: 外部表
[(col_name data_type [COMMENT col_comment], ...)]  -- 列名 列类型 列描述信息  ....
[COMMENT table_comment] -- 表描述信息
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] -- 创建分区表指定分区字段  分区列名  列类型
[CLUSTERED BY (col_name, col_name, ...) -- 创建分桶表指定分桶字段   分桶列名
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]  -- 指定分桶数
[ROW FORMAT delimited fields terminated by ... ] -- 指定一条数据字段与字段的分割符
[collection items terminated by  ... ] -- 指定集合元素与元素的分割符
[map keys terminated by ... ] -- 指定map的kv的分割符
[STORED AS file_format] -- 指定文件存储格式,默认为 textfile
[LOCATION hdfs_path] -- 指定表在hdfs中对应的路径
[TBLPROPERTIES (property_name=property_value, ...)] -- 指定表的属性
[AS select_statement] -- 基于某个查询建表
  • 建表
-- 创建表
hive (dyhtest)> create table if not exists test2(
              >   id int comment "this's id ",
              >   name string  comment "this 's  name"
              > )
              > comment "测试用"
              > row format delimited fields terminated by ','
              > STORED as textfile 
              > TBLPROPERTIES("createtime"="2022-04-24") ;
OK
Time taken: 0.299 seconds
-- 查看表是否已创建
hive (dyhtest)> desc test2;
OK
col_name	data_type	comment
id                  	int                 	this's id           
name                	string              	this 's  name       
Time taken: 0.055 seconds, Fetched: 2 row(s)


  • 查看表
    show tables
    desc test2;
    desc formatted test2;
hive (dyhtest)> desc test2;
OK
col_name	data_type	comment
id                  	int                 	this's id           
name                	string              	this 's  name       
Time taken: 0.055 seconds, Fetched: 2 row(s)
hive (dyhtest)> desc formatted test2;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	this's id           
name                	string              	this 's  name       
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 15:38:29 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test2	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"name\":\"true\"}}
	bucketing_version   	2                   
	comment             	???                 
	createtime          	2022-04-24          
	numFiles            	0                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	0                   
	transient_lastDdlTime	1655624309          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.163 seconds, Fetched: 35 row(s)

注意查看表也是hive底层查看元数据信息:
1.先查看下TBLS,查找到SD_ID

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
|      1 |  1654416053 |     6 |                0 | atdyh | USER       |         0 |     1 | mytbl    | MANAGED_TABLE | NULL               | NULL               |                    |
|      6 |  1654430751 |     6 |                0 | atdyh | USER       |         0 |     6 | test1    | MANAGED_TABLE | NULL               | NULL               |                    |
|      8 |  1654432371 |     6 |                0 | atdyh | USER       |         0 |     8 | test     | MANAGED_TABLE | NULL               | NULL               |                    |
|     12 |  1655624309 |     6 |                0 | atdyh | USER       |         0 |    12 | test2    | MANAGED_TABLE | NULL               | NULL               |                    |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
4 rows in set (0.01 sec)

2.然后查看下SDS,可以看到表相关的跑MR的输入/输出存储类型、location等信息

mysql> select * from SDS;
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT                             | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION                                                   | NUM_BUCKETS | OUTPUT_FORMAT                                              | SERDE_ID |
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
|     1 |     1 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/mytbl |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        1 |
|     6 |     6 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test1 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        6 |
|     8 |     8 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test  |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        8 |
|    12 |    12 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test2 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |       12 |
+-------+-------+------------------------------------------+---------------+---------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
4 rows in set (0.00 sec)
  • DML - 数据导入
    数据加载的几种方式
    1.load 方式
    load data local inpath ‘文件夹/文件’ into table 表名;
    例如:
    load data local inpath ‘/opt/module/hive-3.1.2/datas/testdata.txt’ into table test2;
--- 准备数据
[atdyh@hadoop102 datas]$ sudo vim testdata.txt

1001,zhangsan
1002,lisi
1003,wangwu
~    
-- 加载数据
hive (dyhtest)> load data local inpath '/opt/module/hive-3.1.2/datas/testdata.txt' into table test2;
Loading data to table dyhtest.test2
OK
Time taken: 0.538 seconds
hive (dyhtest)> select * from test2;
OK
test2.id	test2.name
1001	zhangsan
1002	lisi
1003	wangwu
Time taken: 0.133 seconds, Fetched: 3 row(s)
                     

可以看到test2的目录下有我们刚刚通过load方式加载过来的数据
在这里插入图片描述
由此就可以推出下一个方式,直接把准备好的数据上传到对应的目录下
2.把数据直接上传到对应表的hdfs上

-- 把准备好的数据上传到hdfs
[atdyh@hadoop102 datas]$ hadoop fs -put testdata.txt  /user/hive/warehouse/dyhtest.db/test2
2022-06-19 16:44:22,471 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[atdyh@hadoop102 datas]$ 

-- 查找数据
hive (dyhtest)> select * from test2;
OK
test2.id	test2.name
1001	zhangsan
1002	lisi
1003	wangwu
Time taken: 0.197 seconds, Fetched: 3 row(s)

既然这个样的话,我们可以通过另外一种方式,再建表的时候指定location,然后建完表就可以直接查数据了
3. 先把数据上传到hdfs,然后建表的时候指定location

-- hdfs创建文件夹
[atdyh@hadoop102 datas]$ hadoop fs -mkdir /mydata
-- 上传数据
[atdyh@hadoop102 datas]$ hadoop fs -put testdata.txt  /mydata
2022-06-19 16:51:10,787 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[atdyh@hadoop102 datas]$ 

-- 建表
hive (dyhtest)>  create table if not exists test3(
              >   id int ,
              >   name string 
              > )
              > row format delimited fields terminated by ','
              > location "/mydata" ;
OK
Time taken: 0.14 seconds

-- 查询数据
hive (dyhtest)> select * from test3;
OK
test3.id	test3.name
1001	zhangsan
1002	lisi
1003	wangwu
Time taken: 0.144 seconds, Fetched: 3 row(s)
hive (dyhtest)> 

  • 表的分类
    1.管理表(内部表),不加external关键字
hive (dyhtest)> create table if not exists test4(
              >   id int ,
              >   name string 
              > )
              > row format delimited fields terminated by ',' ;

hive (dyhtest)> desc formatted test4;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:42 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test4	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	bucketing_version   	2                   
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655630433          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.07 seconds, Fetched: 32 row(s)


通过 desc formatted 查看到test4 的类型:
Table Type: MANAGED_TABLE
2.外部表

hive (dyhtest)> create external table if not exists test5(
              >   id int ,
              >   name string 
              > )
              > row format delimited fields terminated by ',' ;
hive (dyhtest)> desc formatted test5;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:55 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test5	 
Table Type:         	EXTERNAL_TABLE      	 
Table Parameters:	 	 
	EXTERNAL            	TRUE                
	bucketing_version   	2                   
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655630436          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.07 seconds, Fetched: 33 row(s)
              

通过 desc formatted 查看到test4 的类型:
Table Type: EXTERNAL_TABLE

  1. 内部表和外部表相互转换
    a. 内部表转外部表
    语法:
    alter table 表名set tblproperties(‘EXTERNAL’ = ‘TRUE’);
    括号里面需要大写
hive (dyhtest)> alter table test4 set tblproperties('EXTERNAL' = 'TRUE');
OK
Time taken: 0.108 seconds
hive (dyhtest)> desc formatted test4;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:42 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test4	 
Table Type:         	EXTERNAL_TABLE      	 
Table Parameters:	 	 
	EXTERNAL            	TRUE                
	bucketing_version   	2                   
	last_modified_by    	atdyh               
	last_modified_time  	1655630844          
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655630844          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.075 seconds, Fetched: 35 row(s)

可以看到test4从内部表转为了外部表:
Table Type: EXTERNAL_TABLE

b. 外部表转内部表

hive (dyhtest)> alter table test5 set tblproperties ('EXTERNAL'='FALSE');
OK
Time taken: 0.094 seconds
hive (dyhtest)> desc formatted test5;
OK
col_name	data_type	comment
# col_name            	data_type           	comment             
id                  	int                 	                    
name                	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	dyhtest             	 
OwnerType:          	USER                	 
Owner:              	atdyh               	 
CreateTime:         	Sun Jun 19 17:15:55 CST 2022	 
LastAccessTime:     	UNKNOWN             	 
Retention:          	0                   	 
Location:           	hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/test5	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	EXTERNAL            	FALSE               
	bucketing_version   	2                   
	last_modified_by    	atdyh               
	last_modified_time  	1655631058          
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	36                  
	transient_lastDdlTime	1655631058          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.063 seconds, Fetched: 35 row(s)

可以看出来test5从外部表转为了内部表:
Table Type: MANAGED_TABLE

注意:
1.如果想要删除表和删除数据:
a.外部表:先把外部表转为内部表
alter table test5 set tblproperties (‘EXTERNAL’=‘FALSE’);
drop table test5
b.内部表:直接删除
drop test5

  • 修改表
    1.准备数据
[atdyh@hadoop102 datas]$ cat emptest.txt 
1001    zhangsan	10000.1
1002	lisi	10000.2
1003	wangwu	10000.3
[atdyh@hadoop102 datas]$ 

2.创建表 加载数据

-- 创建表
hive (dyhtest)> create table emp(
              >   id int , 
              >   name string, 
              >   salary double  
              > ) 
              > row format delimited fields terminated by '\t';  
OK
Time taken: 0.451 seconds
hive (dyhtest)> show tables;
OK
tab_name
emp
mytbl
test
test1
test2
test3
test4
test5
Time taken: 0.061 seconds, Fetched: 8 row(s)

-- 加载数据
hive (dyhtest)> load data local inpath '/opt/module/hive-3.1.2/datas/emptest.txt' into table emp;
Loading data to table dyhtest.emp
OK
Time taken: 0.394 seconds

3.修改表名
语法:
alter table 旧表名 rename 新表名

hive (dyhtest)> alter table emp rename to emptest;
OK
Time taken: 0.224 seconds
hive (dyhtest)> show tables;
OK
tab_name
emptest
mytbl
test
test1
test2
test3
test4
test5
Time taken: 0.045 seconds, Fetched: 8 row(s)

4.列相关操作
语法:
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
alter table 表名 change 旧列名 新列名 列类型
a. 修改列名

-- 修改列名
hive (dyhtest)>  alter table emptest change column salary sal double ;
OK
Time taken: 0.167 seconds
-- 查看修改后的结果
hive (dyhtest)> show  create table emptest;
OK
createtab_stmt
CREATE TABLE `emptest`(
  `id` int, 
  `name` string, 
  `sal` double)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='\t', 
  'serialization.format'='\t') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hadoop102:9820/user/hive/warehouse/dyhtest.db/emptest'
TBLPROPERTIES (
  'bucketing_version'='2', 
  'last_modified_by'='atdyh', 
  'last_modified_time'='1655645129', 
  'transient_lastDdlTime'='1655645129')
Time taken: 0.052 seconds, Fetched: 20 row(s)
		

注意:
1.修改列的时候,如果涉及到修改类型,修改后的类型需要>=原来的类型。例如原来是double,修改后就不能用float。会提示修改不成功。

b.增加和替换列
语法:
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], …)

alter tables 表名 add| replace cloumns (列名 类型 注释)

-- 添加列
hive (dyhtest)>  alter table emptest add columns (addr string, deptno int );
OK
Time taken: 0.132 seconds
-- 查看是否成功
hive (dyhtest)> select * from emptest;
OK
emptest.id	emptest.name	emptest.sal	emptest.addr	emptest.deptno
NULL	10000.1	NULL	NULL	NULL
1002	lisi	10000.2	NULL	NULL
1003	wangwu	10000.3	NULL	NULL
Time taken: 0.274 seconds, Fetched: 3 row(s)

替换列

-- 替换列
hive (dyhtest)> alter table emptest replace columns (empid int, empname string);
OK
Time taken: 0.114 seconds
-- 查看数据
hive (dyhtest)> select * from emptest;
OK
emptest.empid	emptest.empname
NULL	10000.1
1002	lisi
1003	wangwu
Time taken: 0.149 seconds, Fetched: 3 row(s)

注:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hive是一个基于Hadoop的数据仓库工具,用于进行大规模数据分析和查询。下面是Hive的一些基本操作命令: 1. 使用命令`show databases;`可以查看当前所有的数据库。 2. 使用命令`CREATE DATABASE park;`可以创建一个名为park的数据库。实际上,创建数据库相当于在Hadoop的HDFS文件系统中创建了一个目录节点,统一存在`/usr/hive/warehouse`目录下。 3. 使用命令`USE park;`可以进入park数据库。 4. 使用命令`show tables;`可以查看当前数据库下的所有表。 5. 使用命令`CREATE TABLE stu (id INT, name STRING);`可以创建一个名为stu的表,其中包含id和name两个字段。在Hive中,使用的是STRING类型来表示字符,而不是CHAR或VARCHAR类型。所创建的表实际上也是HDFS中的一个目录节点。默认情况下,所有在default数据库下创建的表都直接存在`/usr/hive/warehouse`目录下。 6. 使用命令`INSERT INTO TABLE stu VALUES (1, 'John');`可以向stu表中插入数据。HDFS不支持数据的修改和删除,但在Hive 2.0版本后开始支持数据的追加,可以使用`INSERT INTO`语句执行追加操作。Hive支持查询和行级别的插入,但不支持行级别的删除和修改。实际上,Hive的操作是通过执行MapReduce任务来完成的。插入数据后,我们可以在HDFS的stu目录下发现多了一个文件,其中存储了插入的数据。因此,可以得出结论:Hive存储的数据是通过HDFS的文件来存储的。 7. 使用命令`SELECT id FROM stu;`可以查询stu表中的数据,并只返回id字段的值。 需要注意的是,如果想在HDFS目录下自己创建一个分区,并在该分区下上传文件,需要执行一些额外的操作。首先,手动创建的分区在Hive中是无法使用的,因为原数据库中没有记录该分区的信息。要让自己创建的分区被识别,需要执行命令`ALTER TABLE book ADD PARTITION (category = 'jp') LOCATION '/user/hive/warehouse/park.db/book/category=jp';`。这条命令的作用是在原数据表book中创建对应的分区信息。另外,还可以使用命令`ALTER TABLE book****** 'nn');`来修改分区。 希望以上信息能对你有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值