hive sql详细学习

hive sql详细学习

1、准备数据

emp.txt

7369	SMITH	CLERK	7902	1980-12-17	800.00		20
7499	ALLEN	SALESMAN	7698	1981-2-20	1600.00	300.00	30
7521	WARD	SALESMAN	7698	1981-2-22	1250.00	500.00	30
7566	JONES	MANAGER	7839	1981-4-2	2975.00		20
7654	MARTIN	SALESMAN	7698	1981-9-28	1250.00	1400.00	30
7698	BLAKE	MANAGER	7839	1981-5-1	2850.00		30
7782	CLARK	MANAGER	7839	1981-6-9	2450.00		10
7788	SCOTT	ANALYST	7566	1987-4-19	3000.00		20
7839	KING	PRESIDENT		1981-11-17	5000.00		10
7844	TURNER	SALESMAN	7698	1981-9-8	1500.00	0.00	30
7876	ADAMS	CLERK	7788	1987-5-23	1100.00		20
7900	JAMES	CLERK	7698	1981-12-3	950.00		30
7902	FORD	ANALYST	7566	1981-12-3	3000.00		20
7934	MILLER	CLERK	7782	1982-1-23	1300.00		10


dept.txt

10	ACCOUNTING	NEW YORK
20	RESEARCH	DALLAS
30	SALES	CHICAGO
40	OPERATIONS	BOSTON


2、创建表emp

drop table if exists default.emp;
create table default.emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

hive> create table default.emp(
    > empno int,
    > ename string,
    > job string,
    > mgr int,
    > hiredate string,
    > sal double,
    > deptno int
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 3.152 seconds


2、加载数据

load data local inpath '/opt/hive-0.13.1/emp.txt' overwrite into table default.emp;
hive> load data local inpath '/opt/hive-0.13.1/emp.txt' overwrite into table default.emp ;
Copying data from file:/opt/hive-0.13.1/emp.txt
Copying file: file:/opt/hive-0.13.1/emp.txt
Loading data to table default.emp
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://cluster/user/hive/warehouse/emp
Table default.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0]
OK
Time taken: 7.242 seconds


3、创建表dept

hive> create table default.dept(
    > deptno int,
    > dname string,
    > loc string 
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 1.566 seconds


4、加载数据

hive> load data local inpath '/opt/hive-0.13.1/dept.txt' overwrite into table default.emp;
Copying data from file:/opt/hive-0.13.1/dept.txt
Copying file: file:/opt/hive-0.13.1/dept.txt
Loading data to table default.emp
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://cluster/user/hive/warehouse/emp
Table default.emp stats: [numFiles=1, numRows=0, totalSize=79, rawDataSize=0]
OK
Time taken: 2.526 seconds


5、可以查看hive中创建的表

hive> dfs -ls /user/hive/warehouse/                                                        
    > ;
Found 8 items
drwxr-xr-x   - root supergroup          0 2015-10-18 05:04 /user/hive/warehouse/db_hive.db
drwxr-xr-x   - root supergroup          0 2015-10-19 05:56 /user/hive/warehouse/dept
drwxr-xr-x   - root supergroup          0 2015-10-19 05:56 /user/hive/warehouse/emp
drwxr-xr-x   - root supergroup          0 2015-10-17 23:50 /user/hive/warehouse/hello.db
drwxr-xr-x   - root supergroup          0 2015-10-17 23:48 /user/hive/warehouse/student
drwxr-xr-x   - root supergroup          0 2015-10-18 08:34 /user/hive/warehouse/weblog
drwxr-xr-x   - root supergroup          0 2015-10-18 08:44 /user/hive/warehouse/weblog_20150923
drwxr-xr-x   - root supergroup          0 2015-10-18 08:56 /user/hive/warehouse/weblog_comm


6、可以查看mysql中hive元数据学习

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME        | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
|      1 |  1445150880 |     1 |                0 | root  |         0 |     1 | student         | MANAGED_TABLE | NULL               | NULL               |
|      2 |  1445169777 |     3 |                0 | root  |         0 |     2 | student         | MANAGED_TABLE | NULL               | NULL               |
|      6 |  1445182413 |     1 |                0 | root  |         0 |     6 | weblog          | MANAGED_TABLE | NULL               | NULL               |
|      7 |  1445183058 |     1 |                0 | root  |         0 |     7 | weblog_20150923 | MANAGED_TABLE | NULL               | NULL               |
|      8 |  1445183801 |     1 |                0 | root  |         0 |     8 | weblog_comm     | MANAGED_TABLE | NULL               | NULL               |
|      9 |  1445258345 |     1 |                0 | root  |         0 |     9 | emp             | MANAGED_TABLE | NULL               | NULL               |
|     10 |  1445258771 |     1 |                0 | root  |         0 |    10 | dept            | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
7 rows in set (0.00 sec)


7、如果删除表 则删除hdfs上面数据及mysql里面的元数据信息

hive> drop table if exists default.weblog_comm;
OK
Time taken: 9.869 seconds



hive> dfs -ls /user/hive/warehouse/;
Found 7 items
drwxr-xr-x   - root supergroup          0 2015-10-18 05:04 /user/hive/warehouse/db_hive.db
drwxr-xr-x   - root supergroup          0 2015-10-19 05:56 /user/hive/warehouse/dept
drwxr-xr-x   - root supergroup          0 2015-10-19 05:56 /user/hive/warehouse/emp
drwxr-xr-x   - root supergroup          0 2015-10-17 23:50 /user/hive/warehouse/hello.db
drwxr-xr-x   - root supergroup          0 2015-10-17 23:48 /user/hive/warehouse/student
drwxr-xr-x   - root supergroup          0 2015-10-18 08:34 /user/hive/warehouse/weblog
drwxr-xr-x   - root supergroup          0 2015-10-18 08:44 /user/hive/warehouse/weblog_20150923



mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME        | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
|      1 |  1445150880 |     1 |                0 | root  |         0 |     1 | student         | MANAGED_TABLE | NULL               | NULL               |
|      2 |  1445169777 |     3 |                0 | root  |         0 |     2 | student         | MANAGED_TABLE | NULL               | NULL               |
|      6 |  1445182413 |     1 |                0 | root  |         0 |     6 | weblog          | MANAGED_TABLE | NULL               | NULL               |
|      7 |  1445183058 |     1 |                0 | root  |         0 |     7 | weblog_20150923 | MANAGED_TABLE | NULL               | NULL               |
|      9 |  1445258345 |     1 |                0 | root  |         0 |     9 | emp             | MANAGED_TABLE | NULL               | NULL               |
|     10 |  1445258771 |     1 |                0 | root  |         0 |    10 | dept            | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
6 rows in set (0.00 sec)



8、hive中表的类型 外部表管理表

前面说的都是管理表,现在讲创建外部表

* MANAGED_TABLE   (管理表)
* EXTERNAL_TABLE(外部表)

先创建数据库:

hive> create database if not exists db_hive_0927;
OK
Time taken: 0.792 seconds

hive> drop table if exists db_hive_0927.dept_external ;
OK
Time taken: 0.092 seconds


hive> create EXTERNAL table db_hive_0927.dept_external(
    > deptno int,
    > dname string,
    > loc string
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
OK
Time taken: 0.866 seconds

9、查看表类型(外部表)

hive> desc formatted db_hive_0927.dept_external;
OK
# col_name            	data_type           	comment             
	 	 
deptno              	int                 	                    
dname               	string              	                    
loc                 	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	db_hive_0927        	 
Owner:              	root                	 
CreateTime:         	Mon Oct 19 06:16:47 PDT 2015	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://cluster/user/hive/warehouse/db_hive_0927.db/dept_external	 
Table Type:         	EXTERNAL_TABLE      	 
Table Parameters:	 	 
<span style="color:#ff0000;">	EXTERNAL            	TRUE      </span>          
	transient_lastDdlTime	1445260607          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	\t                  
	serialization.format	\t                  
Time taken: 0.384 seconds, Fetched: 30 row(s)


10、查看表类型(管理表)比如前面创建的表


hive> desc formatted default.dept;              
OK
# col_name            	data_type           	comment             
	 	 
deptno              	int                 	                    
dname               	string              	                    
loc                 	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Mon Oct 19 05:46:11 PDT 2015	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://cluster/user/hive/warehouse/dept	 
<span style="color:#ff0000;">Table Type:         	MANAGED_TABLE  </span>     	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	true                
	numFiles            	1                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	79                  
	transient_lastDdlTime	1445259404          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	\t                  
	serialization.format	\t                  
Time taken: 0.383 seconds, Fetched: 34 row(s)


11、下面介绍外部表与管理表有什么区别呢?

先加载一些数据到外部表及查看:

hive> load data local inpath '/opt/hive-0.13.1/dept.txt' overwrite into table db_hive_0927.dept_external;
Copying data from file:/opt/hive-0.13.1/dept.txt
Copying file: file:/opt/hive-0.13.1/dept.txt
Loading data to table db_hive_0927.dept_external
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://cluster/user/hive/warehouse/db_hive_0927.db/dept_external
Table db_hive_0927.dept_external stats: [numFiles=1, numRows=0, totalSize=79, rawDataSize=0]
OK
Time taken: 2.687 seconds


hive> select * from db_hive_0927.dept_external;
OK
10	ACCOUNTING	NEW YORK
20	RESEARCH	DALLAS
30	SALES	CHICAGO
40	OPERATIONS	BOSTON
Time taken: 0.405 seconds, Fetched: 4 row(s)



hive> select * from default.dept;              
OK
10	ACCOUNTING	NEW YORK
20	RESEARCH	DALLAS
30	SALES	CHICAGO
40	OPERATIONS	BOSTON
Time taken: 0.37 seconds, Fetched: 4 row(s)


下面展示:结论是

删除外部表,元数据被删除,hdfs上面数据没有被删除

删除管理,元数据被删除,hdfs上面数据也被删除

先看元数据信息:

mysql> select * from TBLS;


+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+----------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME        | TBL_TYPE       | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+----------------+--------------------+--------------------+
|      1 |  1445150880 |     1 |                0 | root  |         0 |     1 | student         | MANAGED_TABLE  | NULL               | NULL               |
|      2 |  1445169777 |     3 |                0 | root  |         0 |     2 | student         | MANAGED_TABLE  | NULL               | NULL               |
|      6 |  1445182413 |     1 |                0 | root  |         0 |     6 | weblog          | MANAGED_TABLE  | NULL               | NULL               |
|      7 |  1445183058 |     1 |                0 | root  |         0 |     7 | weblog_20150923 | MANAGED_TABLE  | NULL               | NULL               |
|      9 |  1445258345 |     1 |                0 | root  |         0 |     9 | emp             | MANAGED_TABLE  | NULL               | NULL               |
|     10 |  1445258771 |     1 |                0 | root  |         0 |    10 | dept            | MANAGED_TABLE  | NULL               | NULL               |
|     11 |  1445260607 |     6 |                0 | root  |         0 |    11 | dept_external   | EXTERNAL_TABLE | NULL               | NULL               |

+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+----------------+--------------------+--------------------+
7 rows in set (0.00 sec)

hdfs上面数据:

hive> dfs -ls /user/hive/warehouse/db_hive_0927.db;
Found 1 items
drwxr-xr-x   - root supergroup          0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db/dept_external



hive> dfs -ls /user/hive/warehouse/dept;
Found 1 items
-rw-r--r--   2 root supergroup         79 2015-10-19 05:56 /user/hive/warehouse/dept/dept.txt


下面分别删除表:

hive>  drop table default.dept ;
OK
Time taken: 2.27 seconds
hive> drop table db_hive_0927.dept_external ;
OK
Time taken: 0.479 seconds


查看元数据信息:

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME        | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
|      1 |  1445150880 |     1 |                0 | root  |         0 |     1 | student         | MANAGED_TABLE | NULL               | NULL               |
|      2 |  1445169777 |     3 |                0 | root  |         0 |     2 | student         | MANAGED_TABLE | NULL               | NULL               |
|      6 |  1445182413 |     1 |                0 | root  |         0 |     6 | weblog          | MANAGED_TABLE | NULL               | NULL               |
|      7 |  1445183058 |     1 |                0 | root  |         0 |     7 | weblog_20150923 | MANAGED_TABLE | NULL               | NULL               |
|      9 |  1445258345 |     1 |                0 | root  |         0 |     9 | emp             | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
5 rows in set (0.01 sec)


上面元数据信息都被删除了!!

查看hdfs数据:

hive> dfs -ls /user/hive/warehouse/;    

Found 7 items
drwxr-xr-x   - root supergroup          0 2015-10-18 05:04 /user/hive/warehouse/db_hive.db
drwxr-xr-x   - root supergroup          0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db
drwxr-xr-x   - root supergroup          0 2015-10-19 05:56 /user/hive/warehouse/emp
drwxr-xr-x   - root supergroup          0 2015-10-17 23:50 /user/hive/warehouse/hello.db
drwxr-xr-x   - root supergroup          0 2015-10-17 23:48 /user/hive/warehouse/student
drwxr-xr-x   - root supergroup          0 2015-10-18 08:34 /user/hive/warehouse/weblog
drwxr-xr-x   - root supergroup          0 2015-10-18 08:44 /user/hive/warehouse/weblog_20150923

没有了管理表的数据

hive> dfs -ls /user/hive/warehouse/db_hive_0927.db;
Found 1 items
drwxr-xr-x   - root supergroup          0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db/dept_external


外部表hdfs上面数据还在!


12、最后说一下,创建一个外部表的时候可以指定它的位置

drop table if exists db_hive_0927.xiaoming ;
create EXTERNAL table db_hive_0927.xiaoming(
deptno int,
dname string,
loc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/db_hive_0927.db/xiaoming';

hive> drop table if exists db_hive_0927.xiaoming ;
OK
Time taken: 0.173 seconds
hive> create EXTERNAL table db_hive_0927.xiaoming(
    > deptno int,
    > dname string,
    > loc string
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    > LOCATION '/user/hive/warehouse/db_hive_0927.db/xiaoming';
OK
Time taken: 0.396 seconds



hive> dfs -ls /user/hive/warehouse/db_hive_0927.db/;        
Found 3 items
drwxr-xr-x   - root supergroup          0 2015-10-19 06:58 /user/hive/warehouse/db_hive_0927.db/aa
drwxr-xr-x   - root supergroup          0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db/dept_external
<span style="color:#ff0000;">drwxr-xr-x   - root supergroup          0 2015-10-19 07:00 /user/hive/warehouse/db_hive_0927.db/xiaoming</span>







  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,那我可以为您提供一些学习Hive SQL的资源和建议。 1. 官方文档 首先,您可以参考Hive官方文档中关于Hive SQL的部分。官方文档提供了很详细的介绍和示例,可以帮助您快速入门。 链接:https://cwiki.apache.org/confluence/display/Hive/LanguageManual 2. 外部资源 除了官方文档,还有一些外部资源可以帮助您学习Hive SQL。以下是一些推荐的资源: - Hive Tutorial:这是一份比较全面的Hive教程,包括Hive SQL的基本语法、DDL、DML、函数等内容。链接:https://www.tutorialspoint.com/hive/index.htm - Hive Cheat Sheet:这是一份Hive SQL的速查表,可以帮助您快速查找Hive SQL的语法和用法。链接:https://www.edureka.co/blog/wp-content/uploads/2019/07/Hive-Cheat-Sheet.pdf - Hive Cookbook:这是一份Hive SQL的实战指南,通过一些常见的场景和问题,来帮助您深入理解Hive SQL的使用和优化。链接:https://www.oreilly.com/library/view/hive-cookbook/9781449328714/ 3. 实践项目 最后,我建议您通过实践项目来学习Hive SQL。通过实际操作,您可以更深入地了解Hive SQL的使用和优化技巧。以下是一些可以参考的实践项目: - Kaggle竞赛:Kaggle上有很多与Hive SQL相关的数据分析和数据挖掘竞赛,您可以参加这些竞赛来练习Hive SQL的使用。 - Github项目:Github上也有很多Hive SQL的开源项目,您可以参考这些项目来学习Hive SQL的最佳实践和优化技巧。 希望以上资源对您有所帮助,祝您学习愉快!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值