谓词下推概念
谓词下推 Predicate Pushdown(PPD)
:简而言之,就是在不影响结果的情况下,尽量将过滤条件提前执行。谓词下推后,过滤条件在map端执行,减少了map端的输出,降低了数据在集群上传输的量,节约了集群的资源,也提升了任务的性能。
PPD 配置
PPD
控制参数:hive.optimize.ppd
- Default Value: true
- Added In: Hive 0.4.0
相关定义
- Preserved Row table
The table in an Outer Join that must return all rows.
For left outer joins this is the Left table, for right outer joins it is the Right table, and for full outer joins both tables are Preserved Row tables.
- Null Supplying table
This is the table that has nulls filled in for its columns in unmatched rows.
In the non-full outer join case, this is the other table in the Join. For full outer joins both tables are also Null Supplying tables.
- During Join predicate
A predicate that is in the JOIN ON clause.
For example, in ‘R1 join R2 on R1.x = 5’ the predicate ‘R1.x = 5’ is a During Join predicate.
- After Join predicate
A predicate that is in the WHERE clause.
PPD规则:
规则的逻辑描述如下:
- During Join predicates cannot be pushed past Preserved Row tabl