Today I was asked this question. We have 2 cases with code blocks A, B and C. These code blocks don't share any resources except an iterator (int i
).
Please give 3 possible reasons why case 1 could be faster than case 2, and 3 possible reasons why case 2 could be faster than case 1:
case 1
for (i=0; i<N; ++i){
A;
B;
C;
}
case 2
for (i=0; i<N; ++i){
A;
}
for (i=0; i<N; ++i){
B;
}
for (i=0; i<N; ++i){
C;
}
根据情况的不同,原来在某种情况下效率不好的程序却会达到好的效果。
these are only reasons why it could be faster (of course it depends of what exactly are A B and C)
case1
- only a single occurrence of loop prologue/epilogue (less code to run)只有一个单独的开场白的出现
- better scheduling of A B and C generated code (more parallelism)更好的ABC调度代码
- may factorize code (no dependency on output, but A B and C may read the same inputs)可能因式化代码,不依赖于输出,但是ABC可能读取相同输入
case2
- lower register pressure in each loop (avoid spilling)更少的寄存器压力在每次循环中,避免了泄露
- more likely to unroll loop (when A, B or C is trivial)更可能展开循环,当ABC是琐碎的
- more likely the entire loop being into instruction cache (useful when N is big)更可能整个循环到指令缓存,有用当N是很大
第二个得票很高的答案
Assuming no dependencies between A
, B
and C
, I would guess that Case 2 would normally be faster than case 1 because of:
- Data locality
- Code locality
- Branch prediction
However, if the code blocks are very short, then theoretically the extra loop overhead in Case 2 might dominate. Note also @James Kanze's answer, which is another reason why Case 1 could be faster.
但是,如果代码块很短,理论上额外循环开销在例子2会主宰。
Of course, if there are truly no dependencies, then the compiler is free to transform Case 1 into Case 2, and vice versa.
当然,如果真的有依赖性,编译器自由转换例子1到例子2,反之成立。
参考:
http://programmers.stackexchange.com/questions/64132/interesting-interview-question