1.hive删除EXTERNAL外表
外表不能按一般步骤drop,否则得手动去hdfs rm -r xxx文件,并且还可能遇到因为文件太大而不能删除等问题;
- 两个步骤:
-
ALTER TABLE xxx SET TBLPROPERTIES(‘EXTERNAL’=‘False’);
-
drop table xxx;
2.查看内部表外部表:
- desc formatted dept 查看Table Type:
> desc formatted dept;
OK
col_name data_type comment
# col_name data_type comment
deptno int
dname string
loc int
# Detailed Table Information
Database: default
Owner: kfk
CreateTime: Sat Aug 31 15:24:21 CST 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop102:9000/user/hive/warehouse/dept
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE true
EXTERNAL TRUE
numFiles 1
totalSize 68
transient_lastDdlTime 1567236276
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.053 seconds, Fetched: 33 row(s)
3.hive分区表
- 分区其实分的是文件夹
hive (default)> load data local inpath '/opt/module/data/dept.txt' into table dept_partition partition(month='2019-08') ;
Loading data to table default.dept_partition partition (month=2019-08)
Partition default.dept_partition{month=2019-08} stats: [numFiles=1, numRows=0, totalSize=68, rawDataSize=0]
OK
Time taken: 0.319 seconds
4.hive查询多个月
- 会生成新的列 dept_partition.month
hive (default)> select * from dept_partition;
OK
dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month
10 ACCOUNTING 1700 2019-07
20 RESEARCH 1800 2019-07
30 SALES 1900 2019-07
40 OPERATIONS 1700 2019-07
10 ACCOUNTING 1700 2019-08
20 RESEARCH 1800 2019-08
30 SALES 1900 2019-08
40 OPERATIONS 1700 2019-08
Time taken: 0.064 seconds, Fetched: 8 row(s)
5.hive如何避免全表扫描
- from之后先走where过滤