对PostgreSQL 的 hash join 的原理的学习

最新推荐文章于 2023-10-29 17:30:14 发布

weixin_33962923

最新推荐文章于 2023-10-29 17:30:14 发布

阅读量222

点赞数

文章标签：数据库

开始

PostgreSQL 名人 momjian 的文章指出了其pseudo code：

for (j = 0; j < length(inner); j++)
　　hash_key = hash(inner[j]);
　　append(hash_store[hash_key], inner[j]);
for (i = 0; i < length(outer); i++)
　　hash_key = hash(outer[i]);
　　for (j = 0; j < length(hash_store[hash_key]); j++)
　　　　if (outer[i] == hash_store[hash_key][j])
　　　　　　output(outer[i], inner[j]);

为了看的更加清楚一点，加上自己的注释：

//利用 inner 表， 来构造 hash 表(放在内存里)            
for (j = 0; j < length(inner); j++)            
{            
    hash_key = hash(inner[j]);        
    append(hash_store[hash_key], inner[j]);        
}            
            
//对 outer 表的每一个元素， 进行遍历            
for (i = 0; i < length(outer); i++)            
{            
    //拿到 outer 表中的  某个元素， 进行 hash运算， 得到其 hash_key 值        
    hash_key = hash(outer[i]);        
            
            
    //用上面刚得到的 hash_key值， 来 对 hash 表进行 探测（假定hash表中有此key 值）        
    //采用 length (hash_store[hash_Key])  是因为，hash算法构造完hash 表后，有可能出现一个key值处有多个元素的情况。        
    //例如：  hash_key 100 ，对应 a,c, e； 而  hash_key 200 ， 对应 d;  hash_key 300， 对应 f;        
    //也就是说， 如下的遍历，其实是对 拥有相同 的 （此处是上面刚运算的，特定的）hash_key 值的各个元素的遍历        
            
    for (j = 0; j < length(hash_store[hash_key]); j++)        
    {        
        //如果找到了匹配值，则输出一行结果    
        if (outer[i] == hash_store[hash_key][j])    
            output(outer[i], inner[j]);
    }        
}

[作者：技术者高健@博客园 mail: luckyjackgao@gmail.com ]

实践一下：

postgres=# \d employee
          Table "public.employee"
 Column |         Type          | Modifiers 
--------+-----------------------+-----------
 id     | integer               | 
 name   | character varying(20) | 
 deptno | integer               | 
 age    | integer               | 
Indexes:
    "idx_id_dept" btree (id, deptno)

postgres=# \d deptment
           Table "public.deptment"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 deptno   | integer               | 
 deptname | character varying(20) | 

postgres=# 

postgres=# select count(*) from employee;
 count 
-------
1000
(1 row)

postgres=# select count(*) from deptment;
 count 
-------
102
(1 row)

postgres=#

执行计划：

postgres=# explain select a.name, b.deptname from employee a, deptment b where a.deptno=b.deptno;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Hash Join  (cost=3.29..34.05 rows=1000 width=14)
   Hash Cond: (a.deptno = b.deptno)
   ->  Seq Scan on employee a  (cost=0.00..17.00 rows=1000 width=10)
   ->  Hash  (cost=2.02..2.02 rows=102 width=12)
         ->  Seq Scan on deptment b  (cost=0.00..2.02 rows=102 width=12)
(5 rows)

postgres=#