Hive 分区表

9 篇文章 0 订阅

Partition 分区表

在 Hive 中,表的每一个分区对应表下的相应目录,所有分区的数据都是存储在对应的目录中;

可以按照一定维度数据进行分区存放,查询时大幅增加查询效率,同时也可以查询跨多个分区的整数集;

新建表

hive> CREATE TABLE t_partition (year string, month int, num int)

        partitioned by (dt string, country string)

        ROW FORMAT DELIMITED

        FIELDS TERMINATED BY ',';

查看表结构

hive> desc t_partition;

OK                                                                                                                                                                   
year                    string                                                                                                                                       
month                   int                                                                                                                                          
num                     int                                                                                                                                          
dt                      string                                                                                                                                       
country                 string                                                                                                                                       
                                                                                                                                                                     
# Partition Information                                                                                                                                              
# col_name              data_type               comment                                                                                                              
dt                      string                                                                                                                                       
country                 string                                                                                                                                       
Time taken: 0.406 seconds, Fetched: 10 row(s)   

此时在 hive 默认仓库下创建了空目录:

$ hdfs dfs -ls /user/hive/warehouse/

drwxr-xr-x   - hadoop supergroup          0 2022-09-19 15:19 /user/hive/warehouse/t_partition   

待导入数据文件

$ cat t_partition1.txt

2019,02,11                                                                                                                                                           
2020,04,13                                                                                                                                                           
2020,09,12 

$ cat t_partition2.txt

2018,03,15                                                                                                                                                           
2021,10,01                                                                                                                                                           
2021,08,21 

导入数据

hive> load data local inpath '/home/hadoop/test_data/t_partition1.txt' into table t_partition partition(dt='2021-09-18', country='US');

Loading data to table default.t_partition partition (dt=2021-09-18, country=US)                                                                                      
OK     
Time taken: 1.847 seconds

hive> load data local inpath '/home/hadoop/test_data/t_partition2.txt' into table t_partition partition(dt='2021-09-19',country='CN');

Loading data to table default.t_partition partition (dt=2021-09-19, country=CN)                                                                                      
OK                                                                                                                                                                   
Time taken: 0.516 seconds    

查看数据

hive> select * from t_partition;

OK                                                                                                                                                                   
2019    2       11      2021-09-18      US                                                                                                                           
2020    4       13      2021-09-18      US                                                                                                                           
2020    9       12      2021-09-18      US                                                                                                                           
2018    3       15      2021-09-19      CN                                                                                                                           
2021    10      1       2021-09-19      CN                                                                                                                           
2021    8       21      2021-09-19      CN                                                                                                                           
Time taken: 0.228 seconds, Fetched: 6 row(s) 

查看表分区

hive> show partitions t_partition;

OK                                                                                                                                                                   
dt=2021-09-18/country=US                                                                                                                                             
dt=2021-09-19/country=CN                                                                                                                                             
Time taken: 0.272 seconds, Fetched: 2 row(s)  

查看表中数据的文件路径

hive> select INPUT__FILE__NAME from t_partiton;

OK                                                                                                                                                                   
hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt                                                                           
hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt                                                                           
hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt                                                                           
hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt                                                                           
hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt                                                                           
hdfs://mycluster/user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt                                                                           
Time taken: 0.274 seconds, Fetched: 6 row(s)   

查看 hdfs 中存放的文件结构

$ hdfs dfs -ls -R /user/hive/warehouse/t_partition/

drwxr-xr-x   - hadoop supergroup          0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18                                                          
drwxr-xr-x   - hadoop supergroup          0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18/country=US                                               
-rw-r--r--   3 hadoop supergroup         33 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt                              
drwxr-xr-x   - hadoop supergroup          0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19                                                          
drwxr-xr-x   - hadoop supergroup          0 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN                                               
-rw-r--r--   3 hadoop supergroup         33 2022-09-19 15:37 /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt

文件内容

$ hdfs dfs -cat /user/hive/warehouse/t_partition/dt=2021-09-18/country=US/t_partition1.txt

2022-09-19 16:10:15,043 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false                           
2019,02,11                                                                                                                                                           
2020,04,13                                                                                                                                                           
2020,09,12  

$ hdfs dfs -cat /user/hive/warehouse/t_partition/dt=2021-09-19/country=CN/t_partition2.txt

2022-09-19 16:12:29,363 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false                           
2018,03,15                                                                                                                                                           
2021,10,01                                                                                                                                                           
2021,08,21  

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值