静态分区表和动态分区表

在Hive中处理数据时,当处理的一张表的数据量过大的时候,每次查询都是遍历整张表,显然对于计算机来说,是负担比较重的。所以我们可不可以针对数据进行分类,查询时只遍历该分类中的数据,这样就能有效的解决问题。所以就会Hive在表的架构下,就会有分区的这个概念,就是为了满足此需求。

  • 静态分区表

    静态分区表的创建

create table order_partition(
number string,
time string
)
partitioned by (event_month string)//这是分区表的标志
row format delimited fields terminated by "\t";

导入数据到静态分区表

load data local inpath "/home/hadoop/data/date"
overwrite into table order_partition
partition (event_month='2014.5')

查看静态分区表

hive> show partitions order_partition;
OK
event_month=2015.5
event_month=2015.6
Time taken: 8.08 seconds, Fetched: 2 row(s)
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse/order_partition
18/01/09 09:10:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 06:12 /user/hive/warehouse/order_partition/event_month=2015.5
-rw-r--r--   1 hadoop supergroup         93 2018-01-09 06:28 /user/hive/warehouse/order_partition/event_month=2015.6
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:02 /user/hive/warehouse/order_partition/event_month=2015.7
mysql> show tables
    -> ;
+---------------------------+
| Tables_in_hivedb          |
+---------------------------+
| bucketing_cols            |
| cds                       |
| columns_v2                |
| database_params           |
| db_privs                  |
| dbs                       |
| func_ru                   |
| funcs                     |
| global_privs              |
| idxs                      |
| index_params              |
| part_col_privs            |
| part_col_stats            |
| part_privs                |
| partition_key_vals        |
| partition_keys            |
| partition_params          |
| partitions                |
| roles                     |
| sd_params                 |
| sds                       |
| sequence_table            |
| serde_params              |
| serdes                    |
| skewed_col_names          |
| skewed_col_value_loc_map  |
| skewed_string_list        |
| skewed_string_list_values |
| skewed_values             |
| sort_cols                 |
| tab_col_stats             |
| table_params              |
| tbl_col_privs             |
| tbl_privs                 |
| tbls                      |
| version                   |
+---------------------------+
36 rows in set (0.07 sec)

mysql> select * from partitions; 
+---------+-------------+------------------+---------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                 | SD_ID | TBL_ID |
+---------+-------------+------------------+---------------------------+-------+--------+
|       1 |  1515294650 |                0 | month=2014-05             |    23 |     22 |
|       2 |  1515297190 |                0 | month=2015.5              |    25 |     23 |
|       6 |  1515496331 |                0 | event_month=2015.5        |    52 |     51 |
|       7 |  1515498310 |                0 | event_month=2015.6        |    53 |     51 |
|       8 |  1515499813 |                0 | event_month=2014.8/step=1 |    55 |     52 |
|      11 |  1515502691 |                0 | deptno=20                 |    61 |     54 |
|      12 |  1515502692 |                0 | deptno=10                 |    62 |     54 |
|      13 |  1515502693 |                0 | deptno=30                 |    63 |     54 |
+---------+-------------+------------------+---------------------------+-------+--------+
8 rows in set (0.04 sec)
  • 动态分区表
    我们已经会创建静态分区表了,但是大家有没有发现一个问题,从上一个数据来看,按照时间来分区,一个月为一个分区,如果两三年就需要建立二三十个分区。这样是不是太费事了,所以我们需要一种可以以某个字段为条件,自动建立分区表的方式,这就是动态分区表,语法与静态类似。

    动态分区表的创建

create table emp_dy_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double
)
partitioned by (deptno int)
row format delimited fields terminated by "\t";//以deptno字段为条件

导入数据到动态分区表

 insert into table emp_dy_partition partition(deptno)
 select empno,ename,job,mgr,hiredate,sal,comm,deptno from emp;
 //限制最后一个字段应为分区的条件

查看动态分区表

hive> show partitions emp_dy_partition;
OK
deptno=10
deptno=20
deptno=30
Time taken: 0.225 seconds, Fetched: 3 row(s)
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse/emp_dy_partition

18/01/09 09:21:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=10
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=20
drwxr-xr-x   - hadoop supergroup          0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=30
mysql> select * from partitions; 
+---------+-------------+------------------+---------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                 | SD_ID | TBL_ID |
+---------+-------------+------------------+---------------------------+-------+--------+
|       1 |  1515294650 |                0 | month=2014-05             |    23 |     22 |
|       2 |  1515297190 |                0 | month=2015.5              |    25 |     23 |
|       6 |  1515496331 |                0 | event_month=2015.5        |    52 |     51 |
|       7 |  1515498310 |                0 | event_month=2015.6        |    53 |     51 |
|       8 |  1515499813 |                0 | event_month=2014.8/step=1 |    55 |     52 |
|      11 |  1515502691 |                0 | deptno=20                 |    61 |     54 |
|      12 |  1515502692 |                0 | deptno=10                 |    62 |     54 |
|      13 |  1515502693 |                0 | deptno=30                 |    63 |     54 |
+---------+-------------+------------------+---------------------------+-------+--------+
8 rows in set (0.04 sec)

静态和动态望灵活运用!

若泽大数据交流群:671914634

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值