Operation | Clock Cycles |
---|---|
Simple register-register op(ADD, OR) | 1 |
Memory write | 1 |
Bypass delay: switch between integer and floating-point units | 0 - 3 |
"Right" branch of "if" | 1 - 2 |
Floating-point/vector addition | 1 - 3 |
Multiplication(integer/float/vector) | 1 - 7 |
Return error and check | 1 - 7 |
L1 read | 3 - 4 |
TLB miss | 7 - 21 |
L2 read | 10 - 12 |
"Wrong" branch of "if"(branch misprediction) | 10 - 20 |
Floating-point division | 10 - 40 |
128-bit vector division | 10 - 70 |
Atomics/CAS | 15 - 30 |
C function direct call | 15 - 30 |
Integer division | 15 - 40 |
C function indirect call | 20 - 50 |
C++ virtual function call | 30 - 60 |
L3 read | 30 - 70 |
Main RAM read | 100 - 150 |
NUMA: different-socket atomics/CAS(guesstimate) | 100 - 300 |
NUMA: different-socket L3 read | 100 - 300 |
Allocation+Deallocation pair(small objects) | 200 - 500 |
NUMA: different-socket main RAM read | 300 - 500 |
Kernel call | 1000 - 1500 |
Thread context switch(direct costs) C++ Exception thrown+caught | 2000 5000 - 10000 |
Thread context switch(total costs, including cache invalidation) | 10000 - 1 million |
Infographics: Operation Costs in CPU Clock Cycles - IT Hare on Soft.ware
Infographics: Operation Costs in CPU Clock Cycles
http://ithare.com/wp-content/uploads/part101_infographics_v08.png