code 片段
//cuda-sim.cc line 855
//in void ptx_instruction::pre_decode()
...
switch( m_cache_option ) {
case CA_OPTION: cache_op = CACHE_ALL; break;
case CG_OPTION: cache_op = CACHE_GLOBAL; break;
case CS_OPTION: cache_op = CACHE_STREAMING; break;
case LU_OPTION: cache_op = CACHE_LAST_USE; break;
case CV_OPTION: cache_op = CACHE_VOLATILE; break;
case WB_OPTION: cache_op = CACHE_WRITE_BACK; break;
case WT_OPTION: cache_op = CACHE_WRITE_THROUGH; break;
default:
if( m_opcode == LD_OP || m_opcode == LDU_OP )
cache_op = CACHE_ALL;
else if( m_opcode == ST_OP )
cache_op = CACHE_WRITE_BACK;
else if( m_opcode == ATOM_OP )
cache_op = CACHE_GLOBAL;
break;
}
分析
1. 很容易看出,其默认策略 是 ld.ca 和 st.wb
2. 这也是在 英伟达 PTX ISA 手册提到的:
cache operator
PTX ISA version 2.0 introduced optional cache operators on load and store instructions.
The cache operators require a target architecture of sm_20 or higher.
Cache operators on load or store instructions are treated as performance hints only.
The use of a cache operator on an ld or st instruction does not change the memory
consistency behavior of the program.
For sm_20 and higher, the cache operators have the following definitions and behavior.