1. 建表语句:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
例:
– 创建内部表 略去
– 创建外部表
create external table t_sz_ext(id int,name string)
row format delimited fields terminated by '\t'
stored as textfile
location '/class03';
查看表结构:
0: jdbc:hive2://localhost:10000> desc formatted t_sz_ext;
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| id | int | |
| name | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | shizhan03 | NULL |
| Owner: | hadoop | NULL |
| CreateTime: | Tue Sep 04 13:32:49 CST 2018 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://langzi01:9000/class03 | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | transient_lastDdlTime | 1536039169 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | \t |
| | serialization.format | \t |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
29 rows selected (0.167 seconds)
导入数据:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO
TABLE tablename [PATITION(protcol1=val1,partcol2=val2)...]
说明:
1. Load 操作知识单纯的复制/移动操作,将数据文件移动到Hive表对应的位置
2. filepath:
相对路径:例如 project/data1
绝对路径:例如 /user/hive/project/data1
示例:
load data local inpath ‘/home/hadoop/ext.txt’ into table t_sz_ext;
0: jdbc:hive2://localhost:10000> load data local inpath '/home/hadoop/ext.txt' into table t_sz_ext;
INFO : Loading data to table shizhan03.t_sz_ext from file:/home/hadoop/ext.txt
INFO : Table shizhan03.t_sz_ext stats: [numFiles=0, totalSize=0]
No rows affected (0.807 seconds)
/home/hadoop/ext.txt的内容如下:
1 zhangsan
2 lisi
3 wangwu
4 zhaoliu
5 zhouqi
查看数据
0: jdbc:hive2://localhost:10000> select * from t_sz_ext;
+--------------+----------------+--+
| t_sz_ext.id | t_sz_ext.name |
+--------------+----------------+--+
| 1 | zhangsan |
| 2 | lisi |
| 3 | wangwu |
| 4 | zhaoliu |
| 5 | zhouqi |
+--------------+----------------+--+
5 rows selected (0.724 seconds)
- 外部表和内部表的区别:
- 外部表删除掉表之后,上传到hdfs的数据还存在
- 内部表删除掉表之后,上传到hdfs的数据也同时
查看hdfs的数据:使用dfs命令
0: jdbc:hive2://localhost:10000> dfs -ls /class03;
+--------------------------------------------------------------------------------+--+
| DFS Output |
+--------------------------------------------------------------------------------+--+
| Found 1 items |
| -rwxr-xr-x 3 hadoop supergroup 46 2018-09-04 13:46 /class03/ext.txt |
+--------------------------------------------------------------------------------+--+
2 rows selected (0.025 seconds)
查看内部表数据:
0: jdbc:hive2://localhost:10000> dfs -ls /user/hive/warehouse/shizhan03.db/t_sz01;
+-----------------------------------------------------------------------------------------------------------------+--+
| DFS Output |
+-----------------------------------------------------------------------------------------------------------------+--+
| Found 1 items |
| -rw-r--r-- 3 hadoop supergroup 46 2018-09-04 10:20 /user/hive/warehouse/shizhan03.db/t_sz01/data.txt |
+-----------------------------------------------------------------------------------------------------------------+--+
创建分区表:
create table t_sz_part(id int,name string)
partitioned by (country string)
row format delimited
fields terminated by ',';
导入数据:
load data local inpath '/home/hadoop/data.txt' into table t_sz_part partition(country='China');
load data local inpath '/home/hadoop/data.txt.Japan' into table t_sz_part partition(country='Japan');
0: jdbc:hive2://localhost:10000> load data local inpath '/home/hadoop/data.txt' into table t_sz_part partition(country='China');
INFO : Loading data to table shizhan03.t_sz_part partition (country=China) from file:/home/hadoop/data.txt
INFO : Partition shizhan03.t_sz_part{country=China} stats: [numFiles=1, numRows=0, totalSize=46, rawDataSize=0]
No rows affected (1.548 seconds)
注意:
* 分区在hdfs里面就是在当前表下面创建了一个目录,这里目录的名字为:country=China,country为建表时指定的分区字段,China为导入数据时指定的分区名
* 每导入一个分区数据,hdfs就会多创建一个分区对应的目录
查看分区表数据:
0: jdbc:hive2://localhost:10000> select * from t_sz_part;
+---------------+-----------------+--------------------+--+
| t_sz_part.id | t_sz_part.name | t_sz_part.country |
+---------------+-----------------+--------------------+--+
| 1 | zhangsan | China |
| 2 | lisi | China |
| 3 | wangwu | China |
| 4 | zhaoliu | China |
| 5 | zhouqi | China |
| 1 | anbei | Japan |
| 2 | xiaoquan | Japan |
| 3 | xiaoze | Japan |
| 4 | jingkong | Japan |
| 5 | songxia | Japan |
+---------------+-----------------+--------------------+--+
10 rows selected (0.25 seconds)
查看指定分区数据
0: jdbc:hive2://localhost:10000> select * from t_sz_part where country = 'China';
+---------------+-----------------+--------------------+--+
| t_sz_part.id | t_sz_part.name | t_sz_part.country |
+---------------+-----------------+--------------------+--+
| 1 | zhangsan | China |
| 2 | lisi | China |
| 3 | wangwu | China |
| 4 | zhaoliu | China |
| 5 | zhouqi | China |
+---------------+-----------------+--------------------+--+
5 rows selected (0.435 seconds)
分组查询:
select count(1) from t_sz_part where country = 'China' group by (name='zhangsan');
0: jdbc:hive2://localhost:10000> select count(1) from t_sz_part where country = 'China' group by (name='zhangsan');
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:1
INFO : Submitting tokens for job: job_local1969129417_0001
INFO : The url to track the job: http://localhost:8080/
INFO : Job running in-process (local Hadoop)
INFO : 2018-09-05 20:13:21,371 Stage-1 map = 100%, reduce = 100%
INFO : Ended Job = job_local1969129417_0001
+------+--+
| _c0 |
+------+--+
| 4 |
| 1 |
+------+--+
2 rows selected (2.411 seconds)
修改/增加分区:
alter table table_name add[if not exists] partition_spec [localtion 'localtion1']
partition_spec [localtion 'localtion2']...
partition_spec:
:partition (partition_col=partition_col_value,partition_col=partition_col_value,...)
alter table table_name drop partition_spec,partition_spec,...
示例:
0: jdbc:hive2://localhost:10000> alter table t_sz_part add partition (country='American');
查看分区:
0: jdbc:hive2://localhost:10000> show partitions t_sz_part;
+-------------------+--+
| partition |
+-------------------+--+
| country=American |
| country=China |
| country=Japan |
+-------------------+--+
3 rows selected (0.175 seconds)