hive sql详细学习
1、准备数据
emp.txt
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7900 JAMES CLERK 7698 1981-12-3 950.00 30
7902 FORD ANALYST 7566 1981-12-3 3000.00 20
7934 MILLER CLERK 7782 1982-1-23 1300.00 10
dept.txt
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
2、创建表emp
drop table if exists default.emp;
create table default.emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
hive> create table default.emp(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> deptno int
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 3.152 seconds
2、加载数据
load data local inpath '/opt/hive-0.13.1/emp.txt' overwrite into table default.emp;
hive> load data local inpath '/opt/hive-0.13.1/emp.txt' overwrite into table default.emp ;
Copying data from file:/opt/hive-0.13.1/emp.txt
Copying file: file:/opt/hive-0.13.1/emp.txt
Loading data to table default.emp
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://cluster/user/hive/warehouse/emp
Table default.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0]
OK
Time taken: 7.242 seconds
3、创建表dept
hive> create table default.dept(
> deptno int,
> dname string,
> loc string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 1.566 seconds
4、加载数据
hive> load data local inpath '/opt/hive-0.13.1/dept.txt' overwrite into table default.emp;
Copying data from file:/opt/hive-0.13.1/dept.txt
Copying file: file:/opt/hive-0.13.1/dept.txt
Loading data to table default.emp
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://cluster/user/hive/warehouse/emp
Table default.emp stats: [numFiles=1, numRows=0, totalSize=79, rawDataSize=0]
OK
Time taken: 2.526 seconds
5、可以查看hive中创建的表
hive> dfs -ls /user/hive/warehouse/
> ;
Found 8 items
drwxr-xr-x - root supergroup 0 2015-10-18 05:04 /user/hive/warehouse/db_hive.db
drwxr-xr-x - root supergroup 0 2015-10-19 05:56 /user/hive/warehouse/dept
drwxr-xr-x - root supergroup 0 2015-10-19 05:56 /user/hive/warehouse/emp
drwxr-xr-x - root supergroup 0 2015-10-17 23:50 /user/hive/warehouse/hello.db
drwxr-xr-x - root supergroup 0 2015-10-17 23:48 /user/hive/warehouse/student
drwxr-xr-x - root supergroup 0 2015-10-18 08:34 /user/hive/warehouse/weblog
drwxr-xr-x - root supergroup 0 2015-10-18 08:44 /user/hive/warehouse/weblog_20150923
drwxr-xr-x - root supergroup 0 2015-10-18 08:56 /user/hive/warehouse/weblog_comm
6、可以查看mysql中hive元数据学习
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| 1 | 1445150880 | 1 | 0 | root | 0 | 1 | student | MANAGED_TABLE | NULL | NULL |
| 2 | 1445169777 | 3 | 0 | root | 0 | 2 | student | MANAGED_TABLE | NULL | NULL |
| 6 | 1445182413 | 1 | 0 | root | 0 | 6 | weblog | MANAGED_TABLE | NULL | NULL |
| 7 | 1445183058 | 1 | 0 | root | 0 | 7 | weblog_20150923 | MANAGED_TABLE | NULL | NULL |
| 8 | 1445183801 | 1 | 0 | root | 0 | 8 | weblog_comm | MANAGED_TABLE | NULL | NULL |
| 9 | 1445258345 | 1 | 0 | root | 0 | 9 | emp | MANAGED_TABLE | NULL | NULL |
| 10 | 1445258771 | 1 | 0 | root | 0 | 10 | dept | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
7 rows in set (0.00 sec)
7、如果删除表 则删除hdfs上面数据及mysql里面的元数据信息
hive> drop table if exists default.weblog_comm;
OK
Time taken: 9.869 seconds
hive> dfs -ls /user/hive/warehouse/;
Found 7 items
drwxr-xr-x - root supergroup 0 2015-10-18 05:04 /user/hive/warehouse/db_hive.db
drwxr-xr-x - root supergroup 0 2015-10-19 05:56 /user/hive/warehouse/dept
drwxr-xr-x - root supergroup 0 2015-10-19 05:56 /user/hive/warehouse/emp
drwxr-xr-x - root supergroup 0 2015-10-17 23:50 /user/hive/warehouse/hello.db
drwxr-xr-x - root supergroup 0 2015-10-17 23:48 /user/hive/warehouse/student
drwxr-xr-x - root supergroup 0 2015-10-18 08:34 /user/hive/warehouse/weblog
drwxr-xr-x - root supergroup 0 2015-10-18 08:44 /user/hive/warehouse/weblog_20150923
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| 1 | 1445150880 | 1 | 0 | root | 0 | 1 | student | MANAGED_TABLE | NULL | NULL |
| 2 | 1445169777 | 3 | 0 | root | 0 | 2 | student | MANAGED_TABLE | NULL | NULL |
| 6 | 1445182413 | 1 | 0 | root | 0 | 6 | weblog | MANAGED_TABLE | NULL | NULL |
| 7 | 1445183058 | 1 | 0 | root | 0 | 7 | weblog_20150923 | MANAGED_TABLE | NULL | NULL |
| 9 | 1445258345 | 1 | 0 | root | 0 | 9 | emp | MANAGED_TABLE | NULL | NULL |
| 10 | 1445258771 | 1 | 0 | root | 0 | 10 | dept | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
6 rows in set (0.00 sec)
8、hive中表的类型 外部表管理表
前面说的都是管理表,现在讲创建外部表
* MANAGED_TABLE (管理表)
* EXTERNAL_TABLE(外部表)
先创建数据库:
hive> create database if not exists db_hive_0927;
OK
Time taken: 0.792 seconds
hive> drop table if exists db_hive_0927.dept_external ;
OK
Time taken: 0.092 seconds
hive> create EXTERNAL table db_hive_0927.dept_external(
> deptno int,
> dname string,
> loc string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
OK
Time taken: 0.866 seconds
9、查看表类型(外部表)
hive> desc formatted db_hive_0927.dept_external;
OK
# col_name data_type comment
deptno int
dname string
loc string
# Detailed Table Information
Database: db_hive_0927
Owner: root
CreateTime: Mon Oct 19 06:16:47 PDT 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cluster/user/hive/warehouse/db_hive_0927.db/dept_external
Table Type: EXTERNAL_TABLE
Table Parameters:
<span style="color:#ff0000;"> EXTERNAL TRUE </span>
transient_lastDdlTime 1445260607
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.384 seconds, Fetched: 30 row(s)
10、查看表类型(管理表)比如前面创建的表
hive> desc formatted default.dept;
OK
# col_name data_type comment
deptno int
dname string
loc string
# Detailed Table Information
Database: default
Owner: root
CreateTime: Mon Oct 19 05:46:11 PDT 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cluster/user/hive/warehouse/dept
<span style="color:#ff0000;">Table Type: MANAGED_TABLE </span>
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 0
rawDataSize 0
totalSize 79
transient_lastDdlTime 1445259404
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.383 seconds, Fetched: 34 row(s)
11、下面介绍外部表与管理表有什么区别呢?
先加载一些数据到外部表及查看:
hive> load data local inpath '/opt/hive-0.13.1/dept.txt' overwrite into table db_hive_0927.dept_external;
Copying data from file:/opt/hive-0.13.1/dept.txt
Copying file: file:/opt/hive-0.13.1/dept.txt
Loading data to table db_hive_0927.dept_external
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted hdfs://cluster/user/hive/warehouse/db_hive_0927.db/dept_external
Table db_hive_0927.dept_external stats: [numFiles=1, numRows=0, totalSize=79, rawDataSize=0]
OK
Time taken: 2.687 seconds
hive> select * from db_hive_0927.dept_external;
OK
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
Time taken: 0.405 seconds, Fetched: 4 row(s)
hive> select * from default.dept;
OK
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
Time taken: 0.37 seconds, Fetched: 4 row(s)
下面展示:结论是
删除外部表,元数据被删除,hdfs上面数据没有被删除
删除管理,元数据被删除,hdfs上面数据也被删除
先看元数据信息:
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+----------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+----------------+--------------------+--------------------+
| 1 | 1445150880 | 1 | 0 | root | 0 | 1 | student | MANAGED_TABLE | NULL | NULL |
| 2 | 1445169777 | 3 | 0 | root | 0 | 2 | student | MANAGED_TABLE | NULL | NULL |
| 6 | 1445182413 | 1 | 0 | root | 0 | 6 | weblog | MANAGED_TABLE | NULL | NULL |
| 7 | 1445183058 | 1 | 0 | root | 0 | 7 | weblog_20150923 | MANAGED_TABLE | NULL | NULL |
| 9 | 1445258345 | 1 | 0 | root | 0 | 9 | emp | MANAGED_TABLE | NULL | NULL |
| 10 | 1445258771 | 1 | 0 | root | 0 | 10 | dept | MANAGED_TABLE | NULL | NULL |
| 11 | 1445260607 | 6 | 0 | root | 0 | 11 | dept_external | EXTERNAL_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+----------------+--------------------+--------------------+
7 rows in set (0.00 sec)
hdfs上面数据:
hive> dfs -ls /user/hive/warehouse/db_hive_0927.db;
Found 1 items
drwxr-xr-x - root supergroup 0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db/dept_external
hive> dfs -ls /user/hive/warehouse/dept;
Found 1 items
-rw-r--r-- 2 root supergroup 79 2015-10-19 05:56 /user/hive/warehouse/dept/dept.txt
下面分别删除表:
hive> drop table default.dept ;
OK
Time taken: 2.27 seconds
hive> drop table db_hive_0927.dept_external ;
OK
Time taken: 0.479 seconds
查看元数据信息:
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| 1 | 1445150880 | 1 | 0 | root | 0 | 1 | student | MANAGED_TABLE | NULL | NULL |
| 2 | 1445169777 | 3 | 0 | root | 0 | 2 | student | MANAGED_TABLE | NULL | NULL |
| 6 | 1445182413 | 1 | 0 | root | 0 | 6 | weblog | MANAGED_TABLE | NULL | NULL |
| 7 | 1445183058 | 1 | 0 | root | 0 | 7 | weblog_20150923 | MANAGED_TABLE | NULL | NULL |
| 9 | 1445258345 | 1 | 0 | root | 0 | 9 | emp | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------------+---------------+--------------------+--------------------+
5 rows in set (0.01 sec)
上面元数据信息都被删除了!!
查看hdfs数据:
hive> dfs -ls /user/hive/warehouse/;
Found 7 items
drwxr-xr-x - root supergroup 0 2015-10-18 05:04 /user/hive/warehouse/db_hive.db
drwxr-xr-x - root supergroup 0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db
drwxr-xr-x - root supergroup 0 2015-10-19 05:56 /user/hive/warehouse/emp
drwxr-xr-x - root supergroup 0 2015-10-17 23:50 /user/hive/warehouse/hello.db
drwxr-xr-x - root supergroup 0 2015-10-17 23:48 /user/hive/warehouse/student
drwxr-xr-x - root supergroup 0 2015-10-18 08:34 /user/hive/warehouse/weblog
drwxr-xr-x - root supergroup 0 2015-10-18 08:44 /user/hive/warehouse/weblog_20150923
没有了管理表的数据
hive> dfs -ls /user/hive/warehouse/db_hive_0927.db;
Found 1 items
drwxr-xr-x - root supergroup 0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db/dept_external
外部表hdfs上面数据还在!
12、最后说一下,创建一个外部表的时候可以指定它的位置
drop table if exists db_hive_0927.xiaoming ;
create EXTERNAL table db_hive_0927.xiaoming(
deptno int,
dname string,
loc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/db_hive_0927.db/xiaoming';
hive> drop table if exists db_hive_0927.xiaoming ;
OK
Time taken: 0.173 seconds
hive> create EXTERNAL table db_hive_0927.xiaoming(
> deptno int,
> dname string,
> loc string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LOCATION '/user/hive/warehouse/db_hive_0927.db/xiaoming';
OK
Time taken: 0.396 seconds
hive> dfs -ls /user/hive/warehouse/db_hive_0927.db/;
Found 3 items
drwxr-xr-x - root supergroup 0 2015-10-19 06:58 /user/hive/warehouse/db_hive_0927.db/aa
drwxr-xr-x - root supergroup 0 2015-10-19 06:34 /user/hive/warehouse/db_hive_0927.db/dept_external
<span style="color:#ff0000;">drwxr-xr-x - root supergroup 0 2015-10-19 07:00 /user/hive/warehouse/db_hive_0927.db/xiaoming</span>