一、kernel 源代码
kernel 文件名:vectorAddKernel.cu
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numElements)
{
C[i] = A[i] + B[i];
}
}
二、编译命令
$ /usr/local/cuda/bin/nvcc -cubin ./vectorAddKernel.cu
三、cuobjdump -sass
命令:
$ /usr/local/cuda/bin/cuobjdump -sass vectorAddKernel.cubin
结果:
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -sass vectorAddKernel.cubin
code for sm_52
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
/* 0x001cfc00e22007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ S2R R0, SR_CTAID.X ; /* 0xf0c8000002570000 */
/*0018*/ S2R R2, SR_TID.X ; /* 0xf0c8000002170002 */
/* 0x001fd842fec20ff1 */
/*0028*/ XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ; /* 0x4f107f8000270003 */
/*0030*/ XMAD R2, R0.reuse, c[0x0] [0x8], R2 ; /* 0x4e00010000270002 */
/*0038*/ XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ; /* 0x5b30011800370000 */
/* 0x001ff400fd4007ed */
/*0048*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ; /* 0x4b6d038005670007 */
/*0050*/ NOP ; /* 0x50b0000000070f00 */
/*0058*/ @P0 EXIT ; /* 0xe30000000000000f */
/* 0x081fd800fea207f1 */
/*0068*/ SHL R6, R0.reuse, 0x2 ; /* 0x3848000000270006 */
/*0070*/ SHR R0, R0, 0x1e ; /* 0x3829000001e70000 */
/*0078*/ IADD R4.CC, R6.reuse, c[0x0][0x140] ; /* 0x4c10800005070604 */
/* 0x001fd800fe0207f2 */
/*0088*/ IADD.X R5, R0.reuse, c[0x0][0x144] ; /* 0x4c10080005170005 */
/*0090*/ { IADD R2.CC, R6, c[0x0][0x148] ; /* 0x4c10800005270602 */
/*0098*/ LDG.E R4, [R4] }
/* 0xeed4200000070404 */
/* 0x001fd800f62007e2 */
/*00a8*/ IADD.X R3, R0, c[0x0][0x14c] ; /* 0x4c10080005370003 */
/*00b0*/ LDG.E R2, [R2] ; /* 0xeed4200000070202 */
/*00b8*/ IADD R6.CC, R6, c[0x0][0x150] ; /* 0x4c10800005470606 */
/* 0x001fc420fe4007f7 */
/*00c8*/ IADD.X R7, R0, c[0x0][0x154] ; /* 0x4c10080005570007 */
/*00d0*/ FADD R0, R2, R4 ; /* 0x5c58000000470200 */
/*00d8*/ STG.E [R6], R0 ; /* 0xeedc200000070600 */
/* 0x001ffc00ffe007ea */
/*00e8*/ NOP ; /* 0x50b0000000070f00 */
/*00f0*/ EXIT ; /* 0xe30000000007000f */
/*00f8*/ BRA 0xf8 ; /* 0xe2400fffff87000f */
..........
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$
四、cuobjdump -elf
命令:
$ /usr/local/cuda/bin/cuobjdump -elf vectorAddKernel.cubin
结果:
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -elf vectorAddKernel.cubin
64bit elf: type=2, abi=7, sm=52, toolkit=113, flags = 0x340534
Sections:
Index Offset Size ES Align Type Flags Link Info Name
1 40 c1 0 1 STRTAB 0 0 0 .shstrtab
2 101 de 0 1 STRTAB 0 0 0 .strtab
3 1e0 78 18 8 SYMTAB 0 2 3 .symtab
4 258 30 0 4 CUDA_INFO 0 3 0 .nv.info
5 288 78 0 4 CUDA_INFO 0 3 8 .nv.info._Z9vectorAddPKfS0_Pfi
6 300 d8 8 8 CUDA_RELOCINFO 0 0 0 .nv.rel.action
7 3d8 15c 0 4 PROGBITS 2 0 8 .nv.constant0._Z9vectorAddPKfS0_Pfi
8 540 100 0 20 PROGBITS 6 3 8000004 .text._Z9vectorAddPKfS0_Pfi
.section .strtab
.section .shstrtab
.section .symtab
index value size info other shndx name
0 0 0 0 0 0 (null)
1 0 0 3 0 8 .text._Z9vectorAddPKfS0_Pfi
2 0 0 3 0 7 .nv.constant0._Z9vectorAddPKfS0_Pfi
3 0 0 3 0 6 .nv.rel.action
4 0 256 12 10 8 _Z9vectorAddPKfS0_Pfi
.nv.constant0._Z9vectorAddPKfS0_Pfi
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000
.nv.info
<0x1>
Attribute: EIATTR_REGCOUNT
Format: EIFMT_SVAL
Value: function: _Z9vectorAddPKfS0_Pfi(0x4) register count: 8
<0x2>
Attribute: EIATTR_MAX_STACK_SIZE
Format: EIFMT_SVAL
Value: 0x4 0x0
<0x3>
Attribute: EIATTR_MIN_STACK_SIZE
Format: EIFMT_SVAL
Value: function: _Z9vectorAddPKfS0_Pfi(0x4) min stack size: 0x0
<0x4>
Attribute: EIATTR_FRAME_SIZE
Format: EIFMT_SVAL
Value: function: _Z9vectorAddPKfS0_Pfi(0x4) frame size: 0x0
.nv.info._Z9vectorAddPKfS0_Pfi
<0x1>
Attribute: EIATTR_CUDA_API_VERSION
Format: EIFMT_SVAL
Value: 0x71
<0x2>
Attribute: EIATTR_SW2393858_WAR
Format: EIFMT_NVAL
<0x3>
Attribute: EIATTR_SW1850030_WAR
Format: EIFMT_NVAL
<0x4>
Attribute: EIATTR_PARAM_CBANK
Format: EIFMT_SVAL
Value: 0x2 0x1c0140
<0x5>
Attribute: EIATTR_CBANK_PARAM_SIZE
Format: EIFMT_HVAL
Value: 0x1c
<0x6>
Attribute: EIATTR_KPARAM_INFO
Format: EIFMT_SVAL
Value: Index : 0x0 Ordinal : 0x3 Offset : 0x18 Size : 0x4
Pointee's logAlignment : 0x0 Space : 0x0 cbank : 0x1f Parameter Space : CBANK
<0x7>
Attribute: EIATTR_KPARAM_INFO
Format: EIFMT_SVAL
Value: Index : 0x0 Ordinal : 0x2 Offset : 0x10 Size : 0x8
Pointee's logAlignment : 0x0 Space : 0x0 cbank : 0x1f Parameter Space : CBANK
<0x8>
Attribute: EIATTR_KPARAM_INFO
Format: EIFMT_SVAL
Value: Index : 0x0 Ordinal : 0x1 Offset : 0x8 Size : 0x8
Pointee's logAlignment : 0x0 Space : 0x0 cbank : 0x1f Parameter Space : CBANK
<0x9>
Attribute: EIATTR_KPARAM_INFO
Format: EIFMT_SVAL
Value: Index : 0x0 Ordinal : 0x0 Offset : 0x0 Size : 0x8
Pointee's logAlignment : 0x0 Space : 0x0 cbank : 0x1f Parameter Space : CBANK
<0x10>
Attribute: EIATTR_MAXREG_COUNT
Format: EIFMT_HVAL
Value: 0xff
<0x11>
Attribute: EIATTR_S2RCTAID_INSTR_OFFSETS
Format: EIFMT_SVAL
Value: 0x10
<0x12>
Attribute: EIATTR_EXIT_INSTR_OFFSETS
Format: EIFMT_SVAL
Value: 0x58 0xf0
.nv.rel.action
Header : Base Relocation : R_CUDA_ABS55_16_34
Reloc type : Symbol Kind,Addend Shift,Source Position1,length1,Destination Position1,Source Position2,length2,Destination Position2
R_CUDA_ABS55_16_34 : EIVALUE_SYM_KIND_ADDR,2,2,8,16,10,47,34
R_CUDA_8_0 : EIVALUE_SYM_KIND_ADDR,0,0,8,0,0,0,0
R_CUDA_8_8 : EIVALUE_SYM_KIND_ADDR,0,8,8,0,0,0,0
R_CUDA_8_16 : EIVALUE_SYM_KIND_ADDR,0,16,8,0,0,0,0
R_CUDA_8_24 : EIVALUE_SYM_KIND_ADDR,0,24,8,0,0,0,0
R_CUDA_8_32 : EIVALUE_SYM_KIND_ADDR,0,32,8,0,0,0,0
R_CUDA_8_40 : EIVALUE_SYM_KIND_ADDR,0,40,8,0,0,0,0
R_CUDA_8_48 : EIVALUE_SYM_KIND_ADDR,0,48,8,0,0,0,0
R_CUDA_8_56 : EIVALUE_SYM_KIND_ADDR,0,56,8,0,0,0,0
R_CUDA_G8_0 : EIVALUE_SYM_KIND_GADDR,0,0,8,0,0,0,0
R_CUDA_G8_8 : EIVALUE_SYM_KIND_GADDR,0,8,8,0,0,0,0
R_CUDA_G8_16 : EIVALUE_SYM_KIND_GADDR,0,16,8,0,0,0,0
R_CUDA_G8_24 : EIVALUE_SYM_KIND_GADDR,0,24,8,0,0,0,0
R_CUDA_G8_32 : EIVALUE_SYM_KIND_GADDR,0,32,8,0,0,0,0
R_CUDA_G8_40 : EIVALUE_SYM_KIND_GADDR,0,40,8,0,0,0,0
R_CUDA_G8_48 : EIVALUE_SYM_KIND_GADDR,0,48,8,0,0,0,0
R_CUDA_G8_56 : EIVALUE_SYM_KIND_GADDR,0,56,8,0,0,0,0
R_CUDA_FUNC_DESC_8_0 : EIVALUE_SYM_KIND_FDESC,0,0,8,0,0,0,0
R_CUDA_FUNC_DESC_8_8 : EIVALUE_SYM_KIND_FDESC,0,8,8,0,0,0,0
R_CUDA_FUNC_DESC_8_16 : EIVALUE_SYM_KIND_FDESC,0,16,8,0,0,0,0
R_CUDA_FUNC_DESC_8_24 : EIVALUE_SYM_KIND_FDESC,0,24,8,0,0,0,0
R_CUDA_FUNC_DESC_8_32 : EIVALUE_SYM_KIND_FDESC,0,32,8,0,0,0,0
R_CUDA_FUNC_DESC_8_40 : EIVALUE_SYM_KIND_FDESC,0,40,8,0,0,0,0
R_CUDA_FUNC_DESC_8_48 : EIVALUE_SYM_KIND_FDESC,0,48,8,0,0,0,0
R_CUDA_FUNC_DESC_8_56 : EIVALUE_SYM_KIND_FDESC,0,56,8,0,0,0,0
R_CUDA_ABS20_44 : EIVALUE_SYM_KIND_ADDR,0,0,20,44,0,0,0
.text._Z9vectorAddPKfS0_Pfi
bar = 0 reg = 8 lmem=0 smem=0
0xe22007f6 0x001cfc00 0x00870001 0x4c980780
0x02570000 0xf0c80000 0x02170002 0xf0c80000
0xfec20ff1 0x001fd842 0x00270003 0x4f107f80
0x00270002 0x4e000100 0x00370000 0x5b300118
0xfd4007ed 0x001ff400 0x05670007 0x4b6d0380
0x00070f00 0x50b00000 0x0000000f 0xe3000000
0xfea207f1 0x081fd800 0x00270006 0x38480000
0x01e70000 0x38290000 0x05070604 0x4c108000
0xfe0207f2 0x001fd800 0x05170005 0x4c100800
0x05270602 0x4c108000 0x00070404 0xeed42000
0xf62007e2 0x001fd800 0x05370003 0x4c100800
0x00070202 0xeed42000 0x05470606 0x4c108000
0xfe4007f7 0x001fc420 0x05570007 0x4c100800
0x00470200 0x5c580000 0x00070600 0xeedc2000
0xffe007ea 0x001ffc00 0x00070f00 0x50b00000
0x0007000f 0xe3000000 0xff87000f 0xe2400fff
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$
三、cuobjdump -ptx <hostBinary>
命令:
cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL
$ /usr/local/cuda/bin/cuobjdump -ptx vectorAdd
结果:
Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
Fatbin ptx code:
================
arch = sm_86
code version = [7,3]
producer = <unknown>
host = linux
compile_size = 64bit
compressed
.version 7.3
.target sm_86
.address_size 64
.visible .entry _Z9vectorAddPKfS0_Pfi(
.param .u64 _Z9vectorAddPKfS0_Pfi_param_0,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_1,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_2,
.param .u32 _Z9vectorAddPKfS0_Pfi_param_3
)
{
.reg .pred %p<2>;
.reg .f32 %f<4>;
.reg .b32 %r<6>;
.reg .b64 %rd<11>;
ld.param.u64 %rd1, [_Z9vectorAddPKfS0_Pfi_param_0];
ld.param.u64 %rd2, [_Z9vectorAddPKfS0_Pfi_param_1];
ld.param.u64 %rd3, [_Z9vectorAddPKfS0_Pfi_param_2];
ld.param.u32 %r2, [_Z9vectorAddPKfS0_Pfi_param_3];
mov.u32 %r3, %ntid.x;
mov.u32 %r4, %ctaid.x;
mov.u32 %r5, %tid.x;
mad.lo.s32 %r1, %r3, %r4, %r5;
setp.ge.s32 %p1, %r1, %r2;
@%p1 bra $L__BB0_2;
cvta.to.global.u64 %rd4, %rd1;
mul.wide.s32 %rd5, %r1, 4;
add.s64 %rd6, %rd4, %rd5;
cvta.to.global.u64 %rd7, %rd2;
add.s64 %rd8, %rd7, %rd5;
ld.global.f32 %f1, [%rd8];
ld.global.f32 %f2, [%rd6];
add.f32 %f3, %f2, %f1;
cvta.to.global.u64 %rd9, %rd3;
add.s64 %rd10, %rd9, %rd5;
st.global.f32 [%rd10], %f3;
$L__BB0_2:
ret;
}
Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
四、综合 cuobjdump -ptx -sass <hostBinary>
命令:
cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL
$ /usr/local/cuda/bin/cuobjdump -ptx -sass vectorAdd
结果:
Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_35
Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_37
Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_50
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_52
Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_60
Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_61
Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_70
Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_75
Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_80
Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_86
Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_35
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM35 EF_CUDA_PTX_SM(EF_CUDA_SM35)"
/* 0x0880b8b0a0a08cc0 */
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
/*0010*/ S2R R0, SR_CTAID.X; /* 0x86400000129c0002 */
/*0018*/ S2R R3, SR_TID.X; /* 0x86400000109c000e */
/*0020*/ IMAD R0, R0, c[0x0][0x28], R3; /* 0x51080c00051c0002 */
/*0028*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT; /* 0x5b681c002b1c001e */
/*0030*/ @P0 EXIT; /* 0x180000000000003c */
/*0038*/ ISCADD R4.CC, R0, c[0x0][0x140], 0x2; /* 0x60c40800281c0012 */
/* 0x08b010a0b010a0ac */
/*0048*/ MOV32I R7, 0x4; /* 0x74000000021fc01e */
/*0050*/ IMAD.HI.X R5, R0, R7, c[0x0][0x144]; /* 0x93181c00289c0016 */
/*0058*/ ISCADD R2.CC, R0, c[0x0][0x148], 0x2; /* 0x60c40800291c000a */
/*0060*/ LD.E R4, [R4]; /* 0xc4800000001c1010 */
/*0068*/ IMAD.HI.X R3, R0, R7, c[0x0][0x14c]; /* 0x93181c00299c000e */
/*0070*/ LD.E R2, [R2]; /* 0xc4800000001c0808 */
/*0078*/ ISCADD R6.CC, R0, c[0x0][0x150], 0x2; /* 0x60c408002a1c001a */
/* 0x08000000bc10a0fc */
/*0088*/ IMAD.HI.X R7, R0, R7, c[0x0][0x154]; /* 0x93181c002a9c001e */
/*0090*/ FADD R0, R2, R4; /* 0xe2c00000021c0802 */
/*0098*/ ST.E [R6], R0; /* 0xe4800000001c1800 */
/*00a0*/ EXIT; /* 0x18000000001c003c */
/*00a8*/ BRA 0xa8; /* 0x12007ffffc1c003c */
/*00b0*/ NOP; /* 0x85800000001c3c02 */
/*00b8*/ NOP; /* 0x85800000001c3c02 */
..........
Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_37
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM37 EF_CUDA_PTX_SM(EF_CUDA_SM37)"
/* 0x0880b8b0a0a08cc0 */
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
/*0010*/ S2R R0, SR_CTAID.X; /* 0x86400000129c0002 */
/*0018*/ S2R R3, SR_TID.X; /* 0x86400000109c000e */
/*0020*/ IMAD R0, R0, c[0x0][0x28], R3; /* 0x51080c00051c0002 */
/*0028*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT; /* 0x5b681c002b1c001e */
/*0030*/ @P0 EXIT; /* 0x180000000000003c */
/*0038*/ ISCADD R4.CC, R0, c[0x0][0x140], 0x2; /* 0x60c40800281c0012 */
/* 0x08b010a0b010a0ac */
/*0048*/ MOV32I R7, 0x4; /* 0x74000000021fc01e */
/*0050*/ IMAD.HI.X R5, R0, R7, c[0x0][0x144]; /* 0x93181c00289c0016 */
/*0058*/ ISCADD R2.CC, R0, c[0x0][0x148], 0x2; /* 0x60c40800291c000a */
/*0060*/ LD.E R4, [R4]; /* 0xc4800000001c1010 */
/*0068*/ IMAD.HI.X R3, R0, R7, c[0x0][0x14c]; /* 0x93181c00299c000e */
/*0070*/ LD.E R2, [R2]; /* 0xc4800000001c0808 */
/*0078*/ ISCADD R6.CC, R0, c[0x0][0x150], 0x2; /* 0x60c408002a1c001a */
/* 0x08000000bc10a0fc */
/*0088*/ IMAD.HI.X R7, R0, R7, c[0x0][0x154]; /* 0x93181c002a9c001e */
/*0090*/ FADD R0, R2, R4; /* 0xe2c00000021c0802 */
/*0098*/ ST.E [R6], R0; /* 0xe4800000001c1800 */
/*00a0*/ EXIT; /* 0x18000000001c003c */
/*00a8*/ BRA 0xa8; /* 0x12007ffffc1c003c */
/*00b0*/ NOP; /* 0x85800000001c3c02 */
/*00b8*/ NOP; /* 0x85800000001c3c02 */
..........
Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_50
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM50 EF_CUDA_PTX_SM(EF_CUDA_SM50)"
/* 0x001cfc00e22007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ S2R R0, SR_CTAID.X ; /* 0xf0c8000002570000 */
/*0018*/ S2R R2, SR_TID.X ; /* 0xf0c8000002170002 */
/* 0x001fd842fec20ff1 */
/*0028*/ XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ; /* 0x4f107f8000270003 */
/*0030*/ XMAD R2, R0.reuse, c[0x0] [0x8], R2 ; /* 0x4e00010000270002 */
/*0038*/ XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ; /* 0x5b30011800370000 */
/* 0x001ff400fd4007ed */
/*0048*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ; /* 0x4b6d038005670007 */
/*0050*/ NOP ; /* 0x50b0000000070f00 */
/*0058*/ @P0 EXIT ; /* 0xe30000000000000f */
/* 0x081fd800fea207f1 */
/*0068*/ SHL R6, R0.reuse, 0x2 ; /* 0x3848000000270006 */
/*0070*/ SHR R0, R0, 0x1e ; /* 0x3829000001e70000 */
/*0078*/ IADD R4.CC, R6.reuse, c[0x0][0x140] ; /* 0x4c10800005070604 */
/* 0x001fd800fe0207f2 */
/*0088*/ IADD.X R5, R0.reuse, c[0x0][0x144] ; /* 0x4c10080005170005 */
/*0090*/ { IADD R2.CC, R6, c[0x0][0x148] ; /* 0x4c10800005270602 */
/*0098*/ LDG.E R4, [R4] }
/* 0xeed4200000070404 */
/* 0x001fd800f62007e2 */
/*00a8*/ IADD.X R3, R0, c[0x0][0x14c] ; /* 0x4c10080005370003 */
/*00b0*/ LDG.E R2, [R2] ; /* 0xeed4200000070202 */
/*00b8*/ IADD R6.CC, R6, c[0x0][0x150] ; /* 0x4c10800005470606 */
/* 0x001fc420fe4007f7 */
/*00c8*/ IADD.X R7, R0, c[0x0][0x154] ; /* 0x4c10080005570007 */
/*00d0*/ FADD R0, R2, R4 ; /* 0x5c58000000470200 */
/*00d8*/ STG.E [R6], R0 ; /* 0xeedc200000070600 */
/* 0x001ffc00ffe007ea */
/*00e8*/ NOP ; /* 0x50b0000000070f00 */
/*00f0*/ EXIT ; /* 0xe30000000007000f */
/*00f8*/ BRA 0xf8 ; /* 0xe2400fffff87000f */
..........
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_52
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
/* 0x001cfc00e22007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ S2R R0, SR_CTAID.X ; /* 0xf0c8000002570000 */
/*0018*/ S2R R2, SR_TID.X ; /* 0xf0c8000002170002 */
/* 0x001fd842fec20ff1 */
/*0028*/ XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ; /* 0x4f107f8000270003 */
/*0030*/ XMAD R2, R0.reuse, c[0x0] [0x8], R2 ; /* 0x4e00010000270002 */
/*0038*/ XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ; /* 0x5b30011800370000 */
/* 0x001ff400fd4007ed */
/*0048*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ; /* 0x4b6d038005670007 */
/*0050*/ NOP ; /* 0x50b0000000070f00 */
/*0058*/ @P0 EXIT ; /* 0xe30000000000000f */
/* 0x081fd800fea207f1 */
/*0068*/ SHL R6, R0.reuse, 0x2 ; /* 0x3848000000270006 */
/*0070*/ SHR R0, R0, 0x1e ; /* 0x3829000001e70000 */
/*0078*/ IADD R4.CC, R6.reuse, c[0x0][0x140] ; /* 0x4c10800005070604 */
/* 0x001fd800fe0207f2 */
/*0088*/ IADD.X R5, R0.reuse, c[0x0][0x144] ; /* 0x4c10080005170005 */
/*0090*/ { IADD R2.CC, R6, c[0x0][0x148] ; /* 0x4c10800005270602 */
/*0098*/ LDG.E R4, [R4] }
/* 0xeed4200000070404 */
/* 0x001fd800f62007e2 */
/*00a8*/ IADD.X R3, R0, c[0x0][0x14c] ; /* 0x4c10080005370003 */
/*00b0*/ LDG.E R2, [R2] ; /* 0xeed4200000070202 */
/*00b8*/ IADD R6.CC, R6, c[0x0][0x150] ; /* 0x4c10800005470606 */
/* 0x001fc420fe4007f7 */
/*00c8*/ IADD.X R7, R0, c[0x0][0x154] ; /* 0x4c10080005570007 */
/*00d0*/ FADD R0, R2, R4 ; /* 0x5c58000000470200 */
/*00d8*/ STG.E [R6], R0 ; /* 0xeedc200000070600 */
/* 0x001ffc00ffe007ea */
/*00e8*/ NOP ; /* 0x50b0000000070f00 */
/*00f0*/ EXIT ; /* 0xe30000000007000f */
/*00f8*/ BRA 0xf8 ; /* 0xe2400fffff87000f */
..........
Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_60
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM60 EF_CUDA_PTX_SM(EF_CUDA_SM60)"
/* 0x001cfc00e22007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ S2R R0, SR_CTAID.X ; /* 0xf0c8000002570000 */
/*0018*/ S2R R2, SR_TID.X ; /* 0xf0c8000002170002 */
/* 0x001fd842fec20ff1 */
/*0028*/ XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ; /* 0x4f107f8000270003 */
/*0030*/ XMAD R2, R0.reuse, c[0x0] [0x8], R2 ; /* 0x4e00010000270002 */
/*0038*/ XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ; /* 0x5b30011800370000 */
/* 0x001ff400fd4007ed */
/*0048*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ; /* 0x4b6d038005670007 */
/*0050*/ NOP ; /* 0x50b0000000070f00 */
/*0058*/ @P0 EXIT ; /* 0xe30000000000000f */
/* 0x081fd800fea207f1 */
/*0068*/ SHL R6, R0.reuse, 0x2 ; /* 0x3848000000270006 */
/*0070*/ SHR R0, R0, 0x1e ; /* 0x3829000001e70000 */
/*0078*/ IADD R4.CC, R6.reuse, c[0x0][0x140] ; /* 0x4c10800005070604 */
/* 0x001fd800fe0207f2 */
/*0088*/ IADD.X R5, R0.reuse, c[0x0][0x144] ; /* 0x4c10080005170005 */
/*0090*/ { IADD R2.CC, R6, c[0x0][0x148] ; /* 0x4c10800005270602 */
/*0098*/ LDG.E R4, [R4] }
/* 0xeed4200000070404 */
/* 0x001fd800f62007e2 */
/*00a8*/ IADD.X R3, R0, c[0x0][0x14c] ; /* 0x4c10080005370003 */
/*00b0*/ LDG.E R2, [R2] ; /* 0xeed4200000070202 */
/*00b8*/ IADD R6.CC, R6, c[0x0][0x150] ; /* 0x4c10800005470606 */
/* 0x001fc420fe4007f7 */
/*00c8*/ IADD.X R7, R0, c[0x0][0x154] ; /* 0x4c10080005570007 */
/*00d0*/ FADD R0, R2, R4 ; /* 0x5c58000000470200 */
/*00d8*/ STG.E [R6], R0 ; /* 0xeedc200000070600 */
/* 0x001ffc00ffe007ea */
/*00e8*/ NOP ; /* 0x50b0000000070f00 */
/*00f0*/ EXIT ; /* 0xe30000000007000f */
/*00f8*/ BRA 0xf8 ; /* 0xe2400fffff87000f */
..........
Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_61
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM61 EF_CUDA_PTX_SM(EF_CUDA_SM61)"
/* 0x001cfc00e22007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ S2R R0, SR_CTAID.X ; /* 0xf0c8000002570000 */
/*0018*/ S2R R2, SR_TID.X ; /* 0xf0c8000002170002 */
/* 0x001fd842fec20ff1 */
/*0028*/ XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ; /* 0x4f107f8000270003 */
/*0030*/ XMAD R2, R0.reuse, c[0x0] [0x8], R2 ; /* 0x4e00010000270002 */
/*0038*/ XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ; /* 0x5b30011800370000 */
/* 0x001ff400fd4007ed */
/*0048*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ; /* 0x4b6d038005670007 */
/*0050*/ NOP ; /* 0x50b0000000070f00 */
/*0058*/ @P0 EXIT ; /* 0xe30000000000000f */
/* 0x081fd800fea207f1 */
/*0068*/ SHL R6, R0.reuse, 0x2 ; /* 0x3848000000270006 */
/*0070*/ SHR R0, R0, 0x1e ; /* 0x3829000001e70000 */
/*0078*/ IADD R4.CC, R6.reuse, c[0x0][0x140] ; /* 0x4c10800005070604 */
/* 0x001fd800fe0207f2 */
/*0088*/ IADD.X R5, R0.reuse, c[0x0][0x144] ; /* 0x4c10080005170005 */
/*0090*/ { IADD R2.CC, R6, c[0x0][0x148] ; /* 0x4c10800005270602 */
/*0098*/ LDG.E R4, [R4] }
/* 0xeed4200000070404 */
/* 0x001fd800f62007e2 */
/*00a8*/ IADD.X R3, R0, c[0x0][0x14c] ; /* 0x4c10080005370003 */
/*00b0*/ LDG.E R2, [R2] ; /* 0xeed4200000070202 */
/*00b8*/ IADD R6.CC, R6, c[0x0][0x150] ; /* 0x4c10800005470606 */
/* 0x001fc420fe4007f7 */
/*00c8*/ IADD.X R7, R0, c[0x0][0x154] ; /* 0x4c10080005570007 */
/*00d0*/ FADD R0, R2, R4 ; /* 0x5c58000000470200 */
/*00d8*/ STG.E [R6], R0 ; /* 0xeedc200000070600 */
/* 0x001ffc00ffe007ea */
/*00e8*/ NOP ; /* 0x50b0000000070f00 */
/*00f0*/ EXIT ; /* 0xe30000000007000f */
/*00f8*/ BRA 0xf8 ; /* 0xe2400fffff87000f */
..........
Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_70
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM70 EF_CUDA_PTX_SM(EF_CUDA_SM70)"
/*0000*/ MOV R1, c[0x0][0x28] ; /* 0x00000a0000017a02 */
/* 0x000fc40000000f00 */
/*0010*/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /* 0x000000fffffff389 */
/* 0x000fe200000e00ff */
/*0020*/ S2R R6, SR_CTAID.X ; /* 0x0000000000067919 */
/* 0x000e280000002500 */
/*0030*/ S2R R3, SR_TID.X ; /* 0x0000000000037919 */
/* 0x000e240000002100 */
/*0040*/ IMAD R6, R6, c[0x0][0x0], R3 ; /* 0x0000000006067a24 */
/* 0x001fca00078e0203 */
/*0050*/ ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ; /* 0x00005e0006007a0c */
/* 0x000fd80003f06270 */
/*0060*/ @P0 EXIT ; /* 0x000000000000094d */
/* 0x000fea0003800000 */
/*0070*/ MOV R7, 0x4 ; /* 0x0000000400077802 */
/* 0x000fca0000000f00 */
/*0080*/ IMAD.WIDE R4, R6, R7, c[0x0][0x168] ; /* 0x00005a0006047625 */
/* 0x000fc800078e0207 */
/*0090*/ IMAD.WIDE R2, R6, R7, c[0x0][0x160] ; /* 0x0000580006027625 */
/* 0x000fc800078e0207 */
/*00a0*/ LDG.E.SYS R4, [R4] ; /* 0x0000000004047381 */
/* 0x000ea800001ee900 */
/*00b0*/ LDG.E.SYS R3, [R2] ; /* 0x0000000002037381 */
/* 0x000ea200001ee900 */
/*00c0*/ IMAD.WIDE R6, R6, R7, c[0x0][0x170] ; /* 0x00005c0006067625 */
/* 0x000fc800078e0207 */
/*00d0*/ FADD R9, R4, R3 ; /* 0x0000000304097221 */
/* 0x004fd00000000000 */
/*00e0*/ STG.E.SYS [R6], R9 ; /* 0x0000000906007386 */
/* 0x000fe2000010e900 */
/*00f0*/ EXIT ; /* 0x000000000000794d */
/* 0x000fea0003800000 */
/*0100*/ BRA 0x100; /* 0xfffffff000007947 */
/* 0x000fc0000383ffff */
/*0110*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0120*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0130*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0140*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0150*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0160*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0170*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
..........
Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_75
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM75 EF_CUDA_PTX_SM(EF_CUDA_SM75)"
/*0000*/ MOV R1, c[0x0][0x28] ; /* 0x00000a0000017a02 */
/* 0x000fc40000000f00 */
/*0010*/ S2R R6, SR_CTAID.X ; /* 0x0000000000067919 */
/* 0x000e280000002500 */
/*0020*/ S2R R3, SR_TID.X ; /* 0x0000000000037919 */
/* 0x000e240000002100 */
/*0030*/ IMAD R6, R6, c[0x0][0x0], R3 ; /* 0x0000000006067a24 */
/* 0x001fca00078e0203 */
/*0040*/ ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ; /* 0x00005e0006007a0c */
/* 0x000fd80003f06270 */
/*0050*/ @P0 EXIT ; /* 0x000000000000094d */
/* 0x000fea0003800000 */
/*0060*/ MOV R7, 0x4 ; /* 0x0000000400077802 */
/* 0x000fca0000000f00 */
/*0070*/ IMAD.WIDE R4, R6, R7, c[0x0][0x168] ; /* 0x00005a0006047625 */
/* 0x000fc800078e0207 */
/*0080*/ IMAD.WIDE R2, R6, R7, c[0x0][0x160] ; /* 0x0000580006027625 */
/* 0x000fc800078e0207 */
/*0090*/ LDG.E.SYS R4, [R4] ; /* 0x0000000004047381 */
/* 0x000ea800001ee900 */
/*00a0*/ LDG.E.SYS R3, [R2] ; /* 0x0000000002037381 */
/* 0x000ea200001ee900 */
/*00b0*/ IMAD.WIDE R6, R6, R7, c[0x0][0x170] ; /* 0x00005c0006067625 */
/* 0x000fc800078e0207 */
/*00c0*/ FADD R9, R4, R3 ; /* 0x0000000304097221 */
/* 0x004fd00000000000 */
/*00d0*/ STG.E.SYS [R6], R9 ; /* 0x0000000906007386 */
/* 0x000fe2000010e900 */
/*00e0*/ EXIT ; /* 0x000000000000794d */
/* 0x000fea0003800000 */
/*00f0*/ BRA 0xf0; /* 0xfffffff000007947 */
/* 0x000fc0000383ffff */
..........
Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_80
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM80 EF_CUDA_PTX_SM(EF_CUDA_SM80)"
/*0000*/ MOV R1, c[0x0][0x28] ; /* 0x00000a0000017a02 */
/* 0x000fc40000000f00 */
/*0010*/ S2R R6, SR_CTAID.X ; /* 0x0000000000067919 */
/* 0x000e280000002500 */
/*0020*/ S2R R3, SR_TID.X ; /* 0x0000000000037919 */
/* 0x000e240000002100 */
/*0030*/ IMAD R6, R6, c[0x0][0x0], R3 ; /* 0x0000000006067a24 */
/* 0x001fca00078e0203 */
/*0040*/ ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ; /* 0x00005e0006007a0c */
/* 0x000fda0003f06270 */
/*0050*/ @P0 EXIT ; /* 0x000000000000094d */
/* 0x000fea0003800000 */
/*0060*/ HFMA2.MMA R7, -RZ, RZ, 0, 2.384185791015625e-07 ; /* 0x00000004ff077435 */
/* 0x000fe200000001ff */
/*0070*/ ULDC.64 UR4, c[0x0][0x118] ; /* 0x0000460000047ab9 */
/* 0x000fd20000000a00 */
/*0080*/ IMAD.WIDE R4, R6, R7, c[0x0][0x168] ; /* 0x00005a0006047625 */
/* 0x000fc800078e0207 */
/*0090*/ IMAD.WIDE R2, R6.reuse, R7.reuse, c[0x0][0x160] ; /* 0x0000580006027625 */
/* 0x0c0fe400078e0207 */
/*00a0*/ LDG.E R4, [R4.64] ; /* 0x0000000404047981 */
/* 0x000ea8000c1e1900 */
/*00b0*/ LDG.E R3, [R2.64] ; /* 0x0000000402037981 */
/* 0x000ea2000c1e1900 */
/*00c0*/ IMAD.WIDE R6, R6, R7, c[0x0][0x170] ; /* 0x00005c0006067625 */
/* 0x000fc800078e0207 */
/*00d0*/ FADD R9, R4, R3 ; /* 0x0000000304097221 */
/* 0x004fca0000000000 */
/*00e0*/ STG.E [R6.64], R9 ; /* 0x0000000906007986 */
/* 0x000fe2000c101904 */
/*00f0*/ EXIT ; /* 0x000000000000794d */
/* 0x000fea0003800000 */
/*0100*/ BRA 0x100; /* 0xfffffff000007947 */
/* 0x000fc0000383ffff */
/*0110*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0120*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0130*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0140*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0150*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0160*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0170*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0180*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0190*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01a0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01b0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01c0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01d0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01e0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01f0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
..........
Fatbin ptx code:
================
arch = sm_86
code version = [7,3]
producer = <unknown>
host = linux
compile_size = 64bit
compressed
.version 7.3
.target sm_86
.address_size 64
.visible .entry _Z9vectorAddPKfS0_Pfi(
.param .u64 _Z9vectorAddPKfS0_Pfi_param_0,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_1,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_2,
.param .u32 _Z9vectorAddPKfS0_Pfi_param_3
)
{
.reg .pred %p<2>;
.reg .f32 %f<4>;
.reg .b32 %r<6>;
.reg .b64 %rd<11>;
ld.param.u64 %rd1, [_Z9vectorAddPKfS0_Pfi_param_0];
ld.param.u64 %rd2, [_Z9vectorAddPKfS0_Pfi_param_1];
ld.param.u64 %rd3, [_Z9vectorAddPKfS0_Pfi_param_2];
ld.param.u32 %r2, [_Z9vectorAddPKfS0_Pfi_param_3];
mov.u32 %r3, %ntid.x;
mov.u32 %r4, %ctaid.x;
mov.u32 %r5, %tid.x;
mad.lo.s32 %r1, %r3, %r4, %r5;
setp.ge.s32 %p1, %r1, %r2;
@%p1 bra $L__BB0_2;
cvta.to.global.u64 %rd4, %rd1;
mul.wide.s32 %rd5, %r1, 4;
add.s64 %rd6, %rd4, %rd5;
cvta.to.global.u64 %rd7, %rd2;
add.s64 %rd8, %rd7, %rd5;
ld.global.f32 %f1, [%rd8];
ld.global.f32 %f2, [%rd6];
add.f32 %f3, %f2, %f1;
cvta.to.global.u64 %rd9, %rd3;
add.s64 %rd10, %rd9, %rd5;
st.global.f32 [%rd10], %f3;
$L__BB0_2:
ret;
}
Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_86
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM86 EF_CUDA_PTX_SM(EF_CUDA_SM86)"
/*0000*/ MOV R1, c[0x0][0x28] ; /* 0x00000a0000017a02 */
/* 0x000fc40000000f00 */
/*0010*/ S2R R6, SR_CTAID.X ; /* 0x0000000000067919 */
/* 0x000e280000002500 */
/*0020*/ S2R R3, SR_TID.X ; /* 0x0000000000037919 */
/* 0x000e240000002100 */
/*0030*/ IMAD R6, R6, c[0x0][0x0], R3 ; /* 0x0000000006067a24 */
/* 0x001fca00078e0203 */
/*0040*/ ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ; /* 0x00005e0006007a0c */
/* 0x000fda0003f06270 */
/*0050*/ @P0 EXIT ; /* 0x000000000000094d */
/* 0x000fea0003800000 */
/*0060*/ MOV R7, 0x4 ; /* 0x0000000400077802 */
/* 0x000fe20000000f00 */
/*0070*/ ULDC.64 UR4, c[0x0][0x118] ; /* 0x0000460000047ab9 */
/* 0x000fc80000000a00 */
/*0080*/ IMAD.WIDE R4, R6, R7, c[0x0][0x168] ; /* 0x00005a0006047625 */
/* 0x000fc800078e0207 */
/*0090*/ IMAD.WIDE R2, R6.reuse, R7.reuse, c[0x0][0x160] ; /* 0x0000580006027625 */
/* 0x0c0fe400078e0207 */
/*00a0*/ LDG.E R4, [R4.64] ; /* 0x0000000404047981 */
/* 0x000ea8000c1e1900 */
/*00b0*/ LDG.E R3, [R2.64] ; /* 0x0000000402037981 */
/* 0x000ea2000c1e1900 */
/*00c0*/ IMAD.WIDE R6, R6, R7, c[0x0][0x170] ; /* 0x00005c0006067625 */
/* 0x000fe200078e0207 */
/*00d0*/ FADD R9, R4, R3 ; /* 0x0000000304097221 */
/* 0x004fca0000000000 */
/*00e0*/ STG.E [R6.64], R9 ; /* 0x0000000906007986 */
/* 0x000fe2000c101904 */
/*00f0*/ EXIT ; /* 0x000000000000794d */
/* 0x000fea0003800000 */
/*0100*/ BRA 0x100; /* 0xfffffff000007947 */
/* 0x000fc0000383ffff */
/*0110*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0120*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0130*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0140*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0150*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0160*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0170*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0180*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0190*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01a0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01b0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01c0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01d0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01e0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01f0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
..........
五、cuobjdump -lelf <hostBinary>
命令:
$ /usr/local/cuda/bin/cuobjdump -lelf vectorAdd
结果:
ELF file 1: vectorAdd.1.sm_35.cubin
ELF file 2: vectorAdd.2.sm_37.cubin
ELF file 3: vectorAdd.3.sm_50.cubin
ELF file 4: vectorAdd.4.sm_52.cubin
ELF file 5: vectorAdd.5.sm_60.cubin
ELF file 6: vectorAdd.6.sm_61.cubin
ELF file 7: vectorAdd.7.sm_70.cubin
ELF file 8: vectorAdd.8.sm_75.cubin
ELF file 9: vectorAdd.9.sm_80.cubin
ELF file 10: vectorAdd.10.sm_86.cubin
ELF file 11: vectorAdd.11.sm_35.cubin
ELF file 12: vectorAdd.12.sm_37.cubin
ELF file 13: vectorAdd.13.sm_50.cubin
ELF file 14: vectorAdd.14.sm_52.cubin
ELF file 15: vectorAdd.15.sm_60.cubin
ELF file 16: vectorAdd.16.sm_61.cubin
ELF file 17: vectorAdd.17.sm_70.cubin
ELF file 18: vectorAdd.18.sm_75.cubin
ELF file 19: vectorAdd.19.sm_80.cubin
ELF file 20: vectorAdd.20.sm_86.cubin
六、cuobjdump -xelf all <hostBinary>
命令:
cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL
$ /usr/local/cuda/bin/cuobjdump -xelf all vectorAdd
结果:
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -xelf all vectorAdd
Extracting ELF file 1: vectorAdd.1.sm_35.cubin
Extracting ELF file 2: vectorAdd.2.sm_37.cubin
Extracting ELF file 3: vectorAdd.3.sm_50.cubin
Extracting ELF file 4: vectorAdd.4.sm_52.cubin
Extracting ELF file 5: vectorAdd.5.sm_60.cubin
Extracting ELF file 6: vectorAdd.6.sm_61.cubin
Extracting ELF file 7: vectorAdd.7.sm_70.cubin
Extracting ELF file 8: vectorAdd.8.sm_75.cubin
Extracting ELF file 9: vectorAdd.9.sm_80.cubin
Extracting ELF file 10: vectorAdd.10.sm_86.cubin
Extracting ELF file 11: vectorAdd.11.sm_35.cubin
Extracting ELF file 12: vectorAdd.12.sm_37.cubin
Extracting ELF file 13: vectorAdd.13.sm_50.cubin
Extracting ELF file 14: vectorAdd.14.sm_52.cubin
Extracting ELF file 15: vectorAdd.15.sm_60.cubin
Extracting ELF file 16: vectorAdd.16.sm_61.cubin
Extracting ELF file 17: vectorAdd.17.sm_70.cubin
Extracting ELF file 18: vectorAdd.18.sm_75.cubin
Extracting ELF file 19: vectorAdd.19.sm_80.cubin
Extracting ELF file 20: vectorAdd.20.sm_86.cubin
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ ls
Makefile vectorAdd.10.sm_86.cubin vectorAdd.14.sm_52.cubin vectorAdd.18.sm_75.cubin vectorAdd.2.sm_37.cubin vectorAdd.6.sm_61.cubin vectorAdd.cu
NsightEclipse.xml vectorAdd.11.sm_35.cubin vectorAdd.15.sm_60.cubin vectorAdd.19.sm_80.cubin vectorAdd.3.sm_50.cubin vectorAdd.7.sm_70.cubin vectorAddKernel.cu
readme.txt vectorAdd.12.sm_37.cubin vectorAdd.16.sm_61.cubin vectorAdd.1.sm_35.cubin vectorAdd.4.sm_52.cubin vectorAdd.8.sm_75.cubin vectorAddKernel.cubin
vectorAdd vectorAdd.13.sm_50.cubin vectorAdd.17.sm_70.cubin vectorAdd.20.sm_86.cubin vectorAdd.5.sm_60.cubin vectorAdd.9.sm_80.cubin vectorAdd.o
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$
七、解压出特定cubin
命令:
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -lelf vectorAdd
ELF file 1: vectorAdd.1.sm_35.cubin
ELF file 2: vectorAdd.2.sm_37.cubin
ELF file 3: vectorAdd.3.sm_50.cubin
ELF file 4: vectorAdd.4.sm_52.cubin
ELF file 5: vectorAdd.5.sm_60.cubin
ELF file 6: vectorAdd.6.sm_61.cubin
ELF file 7: vectorAdd.7.sm_70.cubin
ELF file 8: vectorAdd.8.sm_75.cubin
ELF file 9: vectorAdd.9.sm_80.cubin
ELF file 10: vectorAdd.10.sm_86.cubin
ELF file 11: vectorAdd.11.sm_35.cubin
ELF file 12: vectorAdd.12.sm_37.cubin
ELF file 13: vectorAdd.13.sm_50.cubin
ELF file 14: vectorAdd.14.sm_52.cubin
ELF file 15: vectorAdd.15.sm_60.cubin
ELF file 16: vectorAdd.16.sm_61.cubin
ELF file 17: vectorAdd.17.sm_70.cubin
ELF file 18: vectorAdd.18.sm_75.cubin
ELF file 19: vectorAdd.19.sm_80.cubin
ELF file 20: vectorAdd.20.sm_86.cubin
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$
cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL
$ /usr/local/cuda/bin/cuobjdump vectorAdd -xelf vectorAdd.19.sm_80.cubin
Extracting ELF file 19: vectorAdd.19.sm_80.cubin
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$
结果:
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -sass vectorAdd.19.sm_80.cubin
code for sm_80
Function : _Z9vectorAddPKfS0_Pfi
.headerflags @"EF_CUDA_SM80 EF_CUDA_PTX_SM(EF_CUDA_SM80)"
/*0000*/ MOV R1, c[0x0][0x28] ; /* 0x00000a0000017a02 */
/* 0x000fc40000000f00 */
/*0010*/ S2R R6, SR_CTAID.X ; /* 0x0000000000067919 */
/* 0x000e280000002500 */
/*0020*/ S2R R3, SR_TID.X ; /* 0x0000000000037919 */
/* 0x000e240000002100 */
/*0030*/ IMAD R6, R6, c[0x0][0x0], R3 ; /* 0x0000000006067a24 */
/* 0x001fca00078e0203 */
/*0040*/ ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ; /* 0x00005e0006007a0c */
/* 0x000fda0003f06270 */
/*0050*/ @P0 EXIT ; /* 0x000000000000094d */
/* 0x000fea0003800000 */
/*0060*/ HFMA2.MMA R7, -RZ, RZ, 0, 2.384185791015625e-07 ; /* 0x00000004ff077435 */
/* 0x000fe200000001ff */
/*0070*/ ULDC.64 UR4, c[0x0][0x118] ; /* 0x0000460000047ab9 */
/* 0x000fd20000000a00 */
/*0080*/ IMAD.WIDE R4, R6, R7, c[0x0][0x168] ; /* 0x00005a0006047625 */
/* 0x000fc800078e0207 */
/*0090*/ IMAD.WIDE R2, R6.reuse, R7.reuse, c[0x0][0x160] ; /* 0x0000580006027625 */
/* 0x0c0fe400078e0207 */
/*00a0*/ LDG.E R4, [R4.64] ; /* 0x0000000404047981 */
/* 0x000ea8000c1e1900 */
/*00b0*/ LDG.E R3, [R2.64] ; /* 0x0000000402037981 */
/* 0x000ea2000c1e1900 */
/*00c0*/ IMAD.WIDE R6, R6, R7, c[0x0][0x170] ; /* 0x00005c0006067625 */
/* 0x000fc800078e0207 */
/*00d0*/ FADD R9, R4, R3 ; /* 0x0000000304097221 */
/* 0x004fca0000000000 */
/*00e0*/ STG.E [R6.64], R9 ; /* 0x0000000906007986 */
/* 0x000fe2000c101904 */
/*00f0*/ EXIT ; /* 0x000000000000794d */
/* 0x000fea0003800000 */
/*0100*/ BRA 0x100; /* 0xfffffff000007947 */
/* 0x000fc0000383ffff */
/*0110*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0120*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0130*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0140*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0150*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0160*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0170*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0180*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*0190*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01a0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01b0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01c0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01d0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01e0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
/*01f0*/ NOP; /* 0x0000000000007918 */
/* 0x000fc00000000000 */
..........
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$
cuobjdump -sass ...
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -elf vectorAdd.19.sm_80.cubin
64bit elf: type=2, abi=7, sm=80, toolkit=113, flags = 0x500550
Sections:
Index Offset Size ES Align Type Flags Link Info Name
1 40 df 0 1 STRTAB 0 0 0 .shstrtab
2 11f fc 0 1 STRTAB 0 0 0 .strtab
3 220 90 18 8 SYMTAB 0 2 4 .symtab
4 2b0 70 0 1 PROGBITS 0 0 0 .debug_frame
5 320 30 0 4 CUDA_INFO 0 3 0 .nv.info
6 350 6c 0 4 CUDA_INFO 0 3 a .nv.info._Z9vectorAddPKfS0_Pfi
7 3c0 d8 8 8 CUDA_RELOCINFO 0 0 0 .nv.rel.action
8 498 10 10 8 REL 0 3 4 .rel.debug_frame
9 4a8 17c 0 4 PROGBITS 2 0 a .nv.constant0._Z9vectorAddPKfS0_Pfi
a 680 200 0 80 PROGBITS 6 3 c000005 .text._Z9vectorAddPKfS0_Pfi
.section .strtab
.section .shstrtab
.section .symtab
index value size info other shndx name
0 0 0 0 0 0 (null)
1 0 0 3 0 a .text._Z9vectorAddPKfS0_Pfi
2 0 0 3 0 9 .nv.constant0._Z9vectorAddPKfS0_Pfi
3 0 0 3 0 4 .debug_frame
4 0 0 3 0 7 .nv.rel.action
5 0 512 12 10 a _Z9vectorAddPKfS0_Pfi
.nv.constant0._Z9vectorAddPKfS0_Pfi
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
八、查看kernel中的函数占用的资源
命令:
$ /usr/local/cuda/bin/cuobjdump -res-usage vectorAdd.19.sm_80.cubin
结果:
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -res-usage vectorAdd.19.sm_80.cubin
Resource usage:
Common:
GLOBAL:0
Function _Z9vectorAddPKfS0_Pfi:
REG:12 STACK:0 SHARED:0 LOCAL:0 CONSTANT[0]:380 TEXTURE:0 SURFACE:0 SAMPLER:0
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$
九、nvprune 分离出特定sm_xx的 .a library库
命令:
cd ~/ex/cuobjdumpEx/libcublasEx
$ /usr/local/cuda/bin/nvprune -arch sm_70 libcublas_static.a -o libcublas_static70.a
结果:
opencl@opencl-PC:~/ex/cuobjdumpEx/libcublasEx$ ls
lelf_libcublas_static_70.a.txt libcublas.so.11.4.2.10064 libcublas_static70.174.sm_70.cubin libcublas_static70.a libcublas_static.a
opencl@opencl-PC:~/ex/cuobjdumpEx/libcublasEx$