静态分区表和动态分区表

最新推荐文章于 2024-08-01 08:58:56 发布

逗点儿

最新推荐文章于 2024-08-01 08:58:56 发布

阅读量2.5k

点赞数

分类专栏： hive 文章标签：数据

本文链接：https://blog.csdn.net/weixin_39216383/article/details/79018633

版权

hive 专栏收录该内容

13 篇文章 1 订阅

订阅专栏

在Hive中处理数据时，当处理的一张表的数据量过大的时候，每次查询都是遍历整张表，显然对于计算机来说，是负担比较重的。所以我们可不可以针对数据进行分类，查询时只遍历该分类中的数据，这样就能有效的解决问题。所以就会Hive在表的架构下，就会有分区的这个概念，就是为了满足此需求。

静态分区表

静态分区表的创建

create table order_partition(
number string,
time string
)
partitioned by (event_month string)//这是分区表的标志
row format delimited fields terminated by "\t";

导入数据到静态分区表

load data local inpath "/home/hadoop/data/date"
overwrite into table order_partition
partition (event_month='2014.5')

查看静态分区表

hive> show partitions order_partition;
OK
event_month=2015.5
event_month=2015.6
Time taken: 8.08 seconds, Fetched: 2 row(s)
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse/order_partition
18/01/09 09:10:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 06:12 /user/hive/warehouse/order_partition/event_month=2015.5
-rw-r--r--   1 hadoop supergroup         93 2018-01-09 06:28 /user/hive/warehouse/order_partition/event_month=2015.6
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:02 /user/hive/warehouse/order_partition/event_month=2015.7
mysql> show tables
    -> ;
+---------------------------+
| Tables_in_hivedb          |
+---------------------------+
| bucketing_cols            |
| cds                       |
| columns_v2                |
| database_params           |
| db_privs                  |
| dbs                       |
| func_ru                   |
| funcs                     |
| global_privs              |
| idxs                      |
| index_params              |
| part_col_privs            |
| part_col_stats            |
| part_privs                |
| partition_key_vals        |
| partition_keys            |
| partition_params          |
| partitions                |
| roles                     |
| sd_params                 |
| sds                       |
| sequence_table            |
| serde_params              |
| serdes                    |
| skewed_col_names          |
| skewed_col_value_loc_map  |
| skewed_string_list        |
| skewed_string_list_values |
| skewed_values             |
| sort_cols                 |
| tab_col_stats             |
| table_params              |
| tbl_col_privs             |
| tbl_privs                 |
| tbls                      |
| version                   |
+---------------------------+
36 rows in set (0.07 sec)

mysql> select * from partitions; 
+---------+-------------+------------------+---------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                 | SD_ID | TBL_ID |
+---------+-------------+------------------+---------------------------+-------+--------+
|       1 |  1515294650 |                0 | month=2014-05             |    23 |     22 |
|       2 |  1515297190 |                0 | month=2015.5              |    25 |     23 |
|       6 |  1515496331 |                0 | event_month=2015.5        |    52 |     51 |
|       7 |  1515498310 |                0 | event_month=2015.6        |    53 |     51 |
|       8 |  1515499813 |                0 | event_month=2014.8/step=1 |    55 |     52 |
|      11 |  1515502691 |                0 | deptno=20                 |    61 |     54 |
|      12 |  1515502692 |                0 | deptno=10                 |    62 |     54 |
|      13 |  1515502693 |                0 | deptno=30                 |    63 |     54 |
+---------+-------------+------------------+---------------------------+-------+--------+
8 rows in set (0.04 sec)

动态分区表
我们已经会创建静态分区表了，但是大家有没有发现一个问题，从上一个数据来看，按照时间来分区，一个月为一个分区，如果两三年就需要建立二三十个分区。这样是不是太费事了，所以我们需要一种可以以某个字段为条件，自动建立分区表的方式，这就是动态分区表，语法与静态类似。

动态分区表的创建

create table emp_dy_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double
)
partitioned by (deptno int)
row format delimited fields terminated by "\t";//以deptno字段为条件

导入数据到动态分区表

 insert into table emp_dy_partition partition(deptno)
 select empno,ename,job,mgr,hiredate,sal,comm,deptno from emp;
 //限制最后一个字段应为分区的条件

查看动态分区表

hive> show partitions emp_dy_partition;
OK
deptno=10
deptno=20
deptno=30
Time taken: 0.225 seconds, Fetched: 3 row(s)
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse/emp_dy_partition

18/01/09 09:21:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=10
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=20
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=30
mysql> select * from partitions; 
+---------+-------------+------------------+---------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                 | SD_ID | TBL_ID |
+---------+-------------+------------------+---------------------------+-------+--------+
|       1 |  1515294650 |                0 | month=2014-05             |    23 |     22 |
|       2 |  1515297190 |                0 | month=2015.5              |    25 |     23 |
|       6 |  1515496331 |                0 | event_month=2015.5        |    52 |     51 |
|       7 |  1515498310 |                0 | event_month=2015.6        |    53 |     51 |
|       8 |  1515499813 |                0 | event_month=2014.8/step=1 |    55 |     52 |
|      11 |  1515502691 |                0 | deptno=20                 |    61 |     54 |
|      12 |  1515502692 |                0 | deptno=10                 |    62 |     54 |
|      13 |  1515502693 |                0 | deptno=30                 |    63 |     54 |
+---------+-------------+------------------+---------------------------+-------+--------+
8 rows in set (0.04 sec)

静态和动态望灵活运用！

若泽大数据交流群：671914634