Hive学习笔记（8）- Hive分区表

最新推荐文章于 2023-01-09 10:22:07 发布

一角残叶

最新推荐文章于 2023-01-09 10:22:07 发布

阅读量530

点赞数

分类专栏： # Hive学习笔记FB 文章标签： hive创建分区表分区表修复

本文链接：https://blog.csdn.net/u012292754/article/details/86520801

版权

Hive学习笔记FB 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

1 `Partitioned Tables`

分区表就是对应一个 HDFS 文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive 中的分区就是分目录，把一个大的数据集根据业务需要分割成更小的数据集。

在查询时通过 WHERE 子句中的表达式来选择查询所需要的指定的分区，这样查询效率会提高很多；

1.1 创建分区表

create external table if not exists default.emp_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
partitioned by (month string)
row format delimited fields terminated by '\t';

1.2 加载数据到分区表

load data local inpath '/home/hadoop/emp.txt' into table default.emp_partition
partition (month='201901');

在这里插入图片描述

1.3 查询分区表的数据

select * from emp_partition where month='201901';

1.3.1 分区查询案例

select count(distinct ip) from emp_partition where month='201810'
union
select count(distinct ip) from emp_partition where month='201811'
union
select count(distinct ip) from emp_partition where month='201812';

bin/hive -f xx.sql

1.4 多级分区

create external table if not exists default.emp_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
partitioned by (month string,day string)
row format delimited fields terminated by '\t';

select * from emp_partition where month='201901' and day='1'

2 分区表注意事项

2.1 非分区表

创建非分区表

create table if not exists default.dept_nopart(
deptno int,
dname string,
loc string
) row format delimited fields terminated by '\t';

在这里插入图片描述

上传数据

hive (default)> dfs -put /home/hadoop/dept.txt /user/hive/warehouse/dept_nopart;

在这里插入图片描述

2.2 分区表

创建分区表

create table if not exists default.dept_part(
deptno int,
dname string,
loc string
)
partitioned by (day string)
row format delimited fields terminated by '\t';

创建目录
hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_part/day=20190117;
上传数据 hive (default)> dfs -put /home/hadoop/dept.txt /user/hive/warehouse/dept_part/day=20190117;
查询发现没有数据

2.3 查询 mysql 中 hive的分区信息

mysql> show tables
    -> ;
+---------------------------+
| Tables_in_hivemetastore   |
+---------------------------+
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| IDXS                      |
| INDEX_PARAMS              |
| PARTITIONS                |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| VERSION                   |
+---------------------------+
36 rows in set (0.00 sec)

发现没有刚刚的分区信息

2.4 修复分区表

2.4.1 方式1 (常用)

hive (default)> msck repair table dept_part;

hive (default)> msck repair table dept_part;
OK
Partitions not in metastore:	dept_part:day=20190117
Repair: Added partition to metastore dept_part:day=20190117
Time taken: 0.103 seconds, Fetched: 2 row(s)

2.4.2 方式2 (不常用)

添加分区， dfs -mkdir -p /user/hive/warehouse/dept_part/day=20190118;
上传数据， dfs -put /home/hadoop/dept.txt /user/hive/warehouse/dept_part/day=20190118;
修复
alter table dept_part add partition(day='20190118')