浅谈merge join 与hash join的区别

Merge Joins

Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform. better than sort merge joins. On the other hand, sort merge joins can perform. better than hash joins if both of the following conditions exist:

  • The row sources are sorted already.
  • A sort operation does not have to be done.

However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.

Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform. better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.

In a merge join, there is no concept of a driving table. The join consists of two steps:

  1. Sort join operation: Both the inputs are sorted on the join key.
  2. Merge join operation: The sorted lists are merged together.

If the input is already sorted by the join column, then a sort join operation is not performed for that row source.

The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:

  • The join condition between two tables is not an equi-join.
  • OPTIMIZER_MODE is set to RULE.
  • HASH_JOIN_ENABLED is false.
  • Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
  • The optimizer thinks that the cost of a hash join is higher, based on the settings of HASH_AREA_SIZE and SORT_AREA_SIZE.

To advise the optimizer to use a sort merge join, apply the USE_MERGE hint. You might also need to give hints to force an access path.

There are situations where it is better to override the optimize with the USE_MERGE hint. For example, the optimizer can choose a full scan on a table and avoid a sort operation in a query . However, there is an increased cost because a large table is accessed through an index and single block reads, as opposed to faster access through a full table scan.

Hash Joins

Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.

This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

However, if the hash table grows too big to fit into the memory, then the optimizer breaks it up into different partitions. As the partitions exceed allocated memory, parts are written to temporary segments on disk. Larger temporary extent sizes lead to improved I/O when writing the partitions to disk; the recommended temporary extent is about 1 MB. Temporary extent size is specified by INITIAL and NEXT for permanent tablespaces and by UNIFORM. SIZE for temporary tablespaces.

After the hash table is complete, the following processes occur:

  1. The second, larger table is scanned.
  2. It is broken up into partitions like the smaller table.
  3. The partitions are written to disk.

When the hash table build is complete, it is possible that an entire hash table partition is resident in memory. Then, you do not need to build the corresponding partition for the second (larger) table. When that table is scanned, rows that hash to the resident hash table partition can be joined and returned immediately.

Each hash table partition is then read into memory, and the following processes occur:

  1. The corresponding partition for the second table is scanned.
  2. The hash table is probed to return the joined rows.

This process is repeated for the rest of the partitions. The cost can increase to two read passes over the data and one write pass over the data.

If the hash table does not fit in the memory, it is possible that parts of it may need to be swapped in and out, depending on the rows retrieved from the second table. Performance for this scenario can be extremely poor.

The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

  • A large amount of data needs to be joined.
  • A large fraction of the table needs to be joined.


SELECT o.customer_id, l.unit_price * l.quantity
  FROM orders o ,order_items l
WHERE l.order_id = o.order_id;

Apply the USE_HASH hint to advise the optimizer to use a hash join when joining two tables together. If you are having trouble getting the optimizer to use hash joins, investigate the values for the HASH_AREA_SIZE and HASH_JOIN_ENABLED parameters.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值