FOR ALL ENTRIES vs DB2 JOIN

http://it.toolbox.com/blogs/sap-on-db2/for-all-entries-vs-db2-join-8912

 

 

All abap programers and most of the dba's that support abap programmers are familiar with the abap clause "for all entries". Most of the web pages I visited recently, discuss 3 major drawbacks of the "for all entries" clause:

1. duplicate rows are automatically removed
2. if the itab used in the clause is empty , all the rows in the source table will be selected .
3. performance degradation when using the clause on big tables.

In this post I'd like to shed some light on the third issue. Specifically i'll discuss the use of the "for all entries" clause as a means to join tables in the abap code instead of in db2.

Say for example you have the following abap code:
Select * from mara
For all entries in itab
Where matnr = itab-matnr.

If the actual source of the material list (represented here by itab) is actually another database table, like:
select matnr from mseg
into corresponding fields of table itab
where ….

Then you could have used one sql statement that joins both tables.
Select t1.*
From mara t1, mseg t2
Where t1.matnr = t2.matnr
And T2…..

So what are the drawbacks of using the "for all entires" instead of a join ?

At run time , in order to fulfill the "for all entries " request, the abap engine will generate several sql statements (for detailed information on this refer to note 48230). Regardless of which method the engine uses (union all, "or" or "in" predicates) If the itab is bigger then a few records, the abap engine will break the itab into parts, and rerun an sql statement several times in a loop. This rerun of the same sql statement , each time with different host values, is a source of resource waste because it may lead to re-reading of data pages.

returing to the above example , lets say that our itab contains 500 records and that the abap engine will be forced to run the following sql statement 50 times with a list of 10 values each time.
Select * from mara
Where matnr in ( ...)

Db2 will be able to perform this sql statement cheaply all 50 times, using one of sap standard indexes that contain the matnr column. But in actuality, if you consider the wider picture (all 50 executions of the statement), you will see that some of the data pages, especially the root and middle-tire index pages have been re-read each execution.

Even though db2 has mechanisms like buffer pools and sequential detection to try to minimize the i/o cost of such cases, those mechanisms can only minimize the actual i/o operations , not the cpu cost of re-reading them once they are in memory. Had you coded the join, db2 would have known that you actually need 500 rows from mara, it would have been able to use other access methods, and potentially consume less getpages i/o and cpu.

In other words , when you use the "for all entries " clause instead of coding a join , you are depriving the database of important information needed to select the best access path for your application. Moreover, you are depriving your DBA of the same vital information. When the DBA monitors & tunes the system, he (or she) is less likely to recognize this kind of resource waste. The DBA will see a simple statement that uses an index , he is less likely to realize that this statement is executed in a loop unnecessarily.

In conclusion I suggest to "think twice" before using the "for all entries" clause and to evaluate the use of database views as a means to:
a. simplify sql
b. simplify abap code
c. get around open sql limitations.

Omer Brandis
DB2 DBA & SAP Basis professional (and all around nice guy)
omerb@srl.co.il


 

 

Related White Papers

9 Comments

  Really nice statement. All you've written is true. But you compare ABAP coding with native SQL. What's you intention? Should an ABAP developer use more native SQL? You should show an example using ABAP and the ABAP join clause. In my opinion the major problem of ABAP seems to be that it is not ANSI SQL compliant. It's not a good idead to compare ANSI SQL with ABAP. But you are right: We could make thinks pretty much better if SAP would support ANSI SQL.
Dear JD, In this post, i didn't mean to compare ABAP open sql with DB2 NATIVE SQL, I meant to compare the abap "for all entries" clause with a "simple" database join, i.e. a join that is executed completly at the database level. This simple join can be writen in abap using the regular open sql syntax (no need for native sql syntax), all you have to do is write a simple sql join in your abap program. In conclusion, in this post i tried to shed some light on one of the [performance related ] draw backs of using the "for all entries" clause. I don't mean to say that open sql should be more ansi sql complient, nor that, open sql or that the "for all entries" clasue is "bad" and shouldn't be used ,nor that developers should code more native sql and less open sql. I hope this has helped clearify this blog entry (if not , please feel free to post more questions) Omer.
Hello Omer, very useful article. I have a question concerning the performance of such statement:

SELECT f1 f2
FROM db_table
FOR ALL ENTRIES IN itab
WHERE f3 = itab-f3
INTO...

The itab-f3 has about 5 000 different values and it is not possible to use join. Is it better to use SELECT SINGLE in a loop or there is a better way to achieve good performance?

BR
Peter
Hello peter,

if you are 100% sure , you can't code a database join, then :

Theoretically , the "for all entries" clause has an advantage over a loop of select singel by primary key. This is becuase the abap interpreter can group together several rows in the itab into one select (see note 48230) so it could get the requested result using less conversations with the database.

The down side is that by doing so, the where predicates will not be a simple equation predicates, they may become "in" predicates and so on (see note 48230). This change in the sql may lead to a more costly access path.

So, unfortunatly, you will probably be forced to write and test both options and see which costs less.

* by costs i mean total getpages or "in db2 time" for the whole workload, and not transaction elapsed time.)

** as a rule of thumb I would lean to select single by primary key, over "for all entries" ... but as i said, its depends on several factors....

good luck.
omer
Thanks Omer,

I should really do some more tests. Unfortunately test results are totally different on different system and I do not know what will be the system of the customer.

Anyway, once again thank you for the advices!
Peter
Omer - very intersting reading. I wish I had seen it before I wrote:

https://weblogs.sdn.sap.com/pub/wlg/6050

I took a more practical rather than theoretical approach and came to basically the same conclusion.

(However, I found in a few cases, when selecting a large number of records (say around 100,000), FOR ALL ENTRIES actually outperformed the JOIN.)

Rob
Dear Omer

Use sub query instead of "for all entries", if possible, bcos Network load is considerably less when used large db. Please check DWDM transaction for more info.

Regards
Vijayendran S
Hi,

I'd like to point out that there IS a way to ensure that duplicate entries are not eliminated, in "FOR ALL ENTRIES" - in the FROM clause of your select-statements, ensure that the entire primary key of the database table is selected; this will automatically ensure that all entries, even duplicates, get picked up.

Omar ? thanks for an interesting article. However have I seen when you have large number of records you need to do performance tests on which of the alternatives to use. Have seen huge differences in performance for tables with over 10 million records, especially where several tables have been JOINed together. However is that another but related question on how many table you should JOIN together.

Deepak ? thanks for your easy tip. Really helpful to have in mind.

Regards
Anders

NOTE 48230
Summary

Symptom
Performance problems with the open SQL statement "SELECT ... FOR ALL ENTRIES ...".

Other terms
FOR_ALL_ENTRIES

Reason and Prerequisites
The open SQL statement "SELECT ... FOR ALL ENTRIES ..." is an ABAP-specific enhancement of the SQL standard. This variant of the SELECT statement allows the ABAP programmer to join an internal program table with one or several database tables. (For a detailed description of that statement type please refer to the corresponding ABAP documentation.)
Since there is no analogous statement in the SQL standard, the open SQL statement has to be mapped from the database interface of the ABAP environment to one or several semantically equivalent SELECT statements which can be processed by the DB platform. Several profile parameters allow a definition of how the database interface should carry out this mapping with regard to the database. This note describes the parameters that can be used to control the "SELECT ... FOR ALL ENTRIES" statement and their effect.

Solution
The parameters mentioned in this note have considerable effects on most of the critical database commands and influence the performance of the whole system to a great extent. For this reason, before changing the parameters described in this note, a detailed problem analysis by experienced SAP consultants or the support team is required. Please note in particular that changing the parameters may often solve a local performance problem but it may also cause a still bigger problem to occur at another place. For this reason, prior to changing the profile parameters - which has a global effect on all statements - you should check first whether the performance problem might be caused by one or two positions in the corresponding application which can be corrected by a local change of the critical SQL statements.

The following profile parameters are available:

rsdb/prefer_join (ab Release 7.0)
           If you set this parameter to "1" the SELECT ... FOR ALL ENTRIES is implemented using a join. Note that this variant is only supported by the DB6 (DB2 UDB) and MS SQL Server database platforms.

rsdb/prefer_union_all
           You can override this parameter using rsdb/prefer_join = 1. The following remarks relate to rsdb/prefer_join = 0.


           Setting this parameter to "1" generates a linking of entire statements with UNION; setting it to "0" generates an OR link of conditions in the WHERE clause. Each of the linked partial statements/conditions represents an entry of the input table [itab].

For Example:
The open SQL statement

SELECT ... FOR ALL ENTRIES IN itab WHERE f = itab-f.

is mapped to an SQL statement which is consistent with the standard:

rsdb/prefer_union_all            = 0
=>
SELECT ... WHERE f = itab[1]-f
          OR    f = itab[2]-f
          ...
           OR    f = itab[N]-f

rsdb/prefer_union_all            = 1
=>
          SELECT ... WHERE f = itab[1]-f
UNION ALL SELECT ... WHERE f = itab[2]-f
....
UNION ALL SELECT ... WHERE f = itab[N]-f

Where N is the number of rows in itab, and itab[i]-f is the value of
field f in the i-th table row.


rsdb/prefer_in_itab_opt
           If this parameter is set to "1", a statement where only one field in the WHERE clause depends on the converted internal table is reflected by a statement with an IN clause. However, this is only possible if the field reference and the WHERE condition are simple enough: in essential the field reference must be a not negated EQ condition.

For Example:
If parameter rsdb/prefer_in_itab_opt is set to "1", the open SQL

SELECT ... FOR ALL ENTRIES IN itab WHERE f = itab-f.

is mapped to the following SQL statement:

SELECT ... WHERE f IN (itab[1]-f, itab[2]-f, ..., itab[N]-f)


rsdb/max_blocking_factor
           This parameter specifies an upper limit for the number of entries taken in from [itab] to be processed in one statement. This means that if the internal table specified in the FOR ALL ENTRIES clause contains more than rsdb/max_blocking_factor rows, the open SQL statement is split into several statements for the database the results of which are collected in the DB interface and then returned as an overall result to the ABAP program. For an internal table with N rows

N / "rsdb/max_blocking_factor" + 1

individual SELECT statements are issued for the database. However, this parameter has no effect on the translation to IN (...) (for prefer_in_itab_opt).

           Additionally the technical maximum blocking factor is calculated for each statement, so no limits of the database system are exceeded. If the limit of the blocking factor is lower than max_blocking_factor, the limit is used implicitely.

rsdb/max_in_blocking_factor
           This parameter, analogously to rsdb/max_blocking_factor, gives the upper limit for the number of entries to be processed from [itab] if the concrete statement is reflected on an IN clause (see prefer_in_itab_opt).

           Analogously to rsdb/max_blocking_factor also the limit of the blocking factor is used instead of rsdb/max_in_blocking_factor, if otherwise the upper limits of the database system would be exceeded.

rsdb/prefer_fix_blocking
           If the number of entries in [itab] is not divisible by max_blocking_factor, less entries (conditions) are allocated to the last statement which has been generated for processing the FOR ALL ENTRIES statement. The result is a new statement.
If the same FOR ALL ENTRIES statement is executed very frequently with a different number of entries in the input table [itab], different statements are created up to the maximum of max_blocking_factor statements.
This can be avoided by the above profile parameter.
If this parameter is set to "1", at most two statements of different length are generated. This is achieved by repeating the last value in the input table as if [itab] has been padded to the blocking factor ([itab] is not really modified).

rsdb/min_blocking_factor
           If this parameter is set to a value larger than "0" AND if rsdb/prefer_fix_blocking is set, 2 different blocking factors are used: a smaller (min_blocking_factor) and a larger factor (max_blocking_factor).
However, the min_blocking_factor is only used if there are only a few entries in [itab]: A little simplified, if the following applies: "Entries [itab] < max_blocking_factor / 2"

rsdb/min_in_blocking_factor
           This parameter works in conjunction with rsdb/min_blocking_factor, for the case that the addition FOR ALL ENTRIES has been implemented with an IN clause (see prefer_in_itab_opt).


Control over FOR ALL ENTRIES Hints
Under the heading Database Interface Hints, Note 129385 describes the options you have for influencing the database interface by entering hints. The hints are evaluated in the database interface itself and are not passed on to the database.

Starting with kernel Release 4.6B all the above mentioned FOR ALL ENTRIES parameters can be set via such a hint for a single statement. In the example:
  SELECT * FROM [..] FOR ALL ENTRIES IN [..] WHERE [..]
  %_HINTS ORACLE '&prefer_in_itab_opt 1&&prefer_fix_blocking -1&'.
This way, the boolean parameter 'prefer_in_itab_opt' is explictly set and the boolean parameter 'prefer_fix_blocking' is set to its default value.

FOR ALL ENTRIES hints, like hints are generally only used as a a corrective device in emergency situations; Note 129385 goes into this. The hints described here should only be used with careful consideration.

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值