在Hive中处理数据时,当处理的一张表的数据量过大的时候,每次查询都是遍历整张表,显然对于计算机来说,是负担比较重的。所以我们可不可以针对数据进行分类,查询时只遍历该分类中的数据,这样就能有效的解决问题。所以就会Hive在表的架构下,就会有分区的这个概念,就是为了满足此需求。
静态分区表
静态分区表的创建
create table order_partition(
number string,
time string
)
partitioned by (event_month string)//这是分区表的标志
row format delimited fields terminated by "\t";
导入数据到静态分区表
load data local inpath "/home/hadoop/data/date"
overwrite into table order_partition
partition (event_month='2014.5')
查看静态分区表
hive> show partitions order_partition;
OK
event_month=2015.5
event_month=2015.6
Time taken: 8.08 seconds, Fetched: 2 row(s)
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse/order_partition
18/01/09 09:10:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2018-01-09 06:12 /user/hive/warehouse/order_partition/event_month=2015.5
-rw-r--r-- 1 hadoop supergroup 93 2018-01-09 06:28 /user/hive/warehouse/order_partition/event_month=2015.6
drwxr-xr-x - hadoop supergroup 0 2018-01-09 07:02 /user/hive/warehouse/order_partition/event_month=2015.7
mysql> show tables
-> ;
+---------------------------+
| Tables_in_hivedb |
+---------------------------+
| bucketing_cols |
| cds |
| columns_v2 |
| database_params |
| db_privs |
| dbs |
| func_ru |
| funcs |
| global_privs |
| idxs |
| index_params |
| part_col_privs |
| part_col_stats |
| part_privs |
| partition_key_vals |
| partition_keys |
| partition_params |
| partitions |
| roles |
| sd_params |
| sds |
| sequence_table |
| serde_params |
| serdes |
| skewed_col_names |
| skewed_col_value_loc_map |
| skewed_string_list |
| skewed_string_list_values |
| skewed_values |
| sort_cols |
| tab_col_stats |
| table_params |
| tbl_col_privs |
| tbl_privs |
| tbls |
| version |
+---------------------------+
36 rows in set (0.07 sec)
mysql> select * from partitions;
+---------+-------------+------------------+---------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+---------------------------+-------+--------+
| 1 | 1515294650 | 0 | month=2014-05 | 23 | 22 |
| 2 | 1515297190 | 0 | month=2015.5 | 25 | 23 |
| 6 | 1515496331 | 0 | event_month=2015.5 | 52 | 51 |
| 7 | 1515498310 | 0 | event_month=2015.6 | 53 | 51 |
| 8 | 1515499813 | 0 | event_month=2014.8/step=1 | 55 | 52 |
| 11 | 1515502691 | 0 | deptno=20 | 61 | 54 |
| 12 | 1515502692 | 0 | deptno=10 | 62 | 54 |
| 13 | 1515502693 | 0 | deptno=30 | 63 | 54 |
+---------+-------------+------------------+---------------------------+-------+--------+
8 rows in set (0.04 sec)
动态分区表
我们已经会创建静态分区表了,但是大家有没有发现一个问题,从上一个数据来看,按照时间来分区,一个月为一个分区,如果两三年就需要建立二三十个分区。这样是不是太费事了,所以我们需要一种可以以某个字段为条件,自动建立分区表的方式,这就是动态分区表,语法与静态类似。动态分区表的创建
create table emp_dy_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double
)
partitioned by (deptno int)
row format delimited fields terminated by "\t";//以deptno字段为条件
导入数据到动态分区表
insert into table emp_dy_partition partition(deptno)
select empno,ename,job,mgr,hiredate,sal,comm,deptno from emp;
//限制最后一个字段应为分区的条件
查看动态分区表
hive> show partitions emp_dy_partition;
OK
deptno=10
deptno=20
deptno=30
Time taken: 0.225 seconds, Fetched: 3 row(s)
hadoop:hadoop:/home/hadoop:>hadoop fs -ls /user/hive/warehouse/emp_dy_partition
18/01/09 09:21:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=10
drwxr-xr-x - hadoop supergroup 0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=20
drwxr-xr-x - hadoop supergroup 0 2018-01-09 07:58 /user/hive/warehouse/emp_dy_partition/deptno=30
mysql> select * from partitions;
+---------+-------------+------------------+---------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+---------------------------+-------+--------+
| 1 | 1515294650 | 0 | month=2014-05 | 23 | 22 |
| 2 | 1515297190 | 0 | month=2015.5 | 25 | 23 |
| 6 | 1515496331 | 0 | event_month=2015.5 | 52 | 51 |
| 7 | 1515498310 | 0 | event_month=2015.6 | 53 | 51 |
| 8 | 1515499813 | 0 | event_month=2014.8/step=1 | 55 | 52 |
| 11 | 1515502691 | 0 | deptno=20 | 61 | 54 |
| 12 | 1515502692 | 0 | deptno=10 | 62 | 54 |
| 13 | 1515502693 | 0 | deptno=30 | 63 | 54 |
+---------+-------------+------------------+---------------------------+-------+--------+
8 rows in set (0.04 sec)
静态和动态望灵活运用!
若泽大数据交流群:671914634