编译生成 cubin 并用 cuobjdump 查看相关汇编语言和 PTX 信息

一、kernel 源代码

kernel 文件名:vectorAddKernel.cu


__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < numElements)
    {
        C[i] = A[i] + B[i];
    }
}


二、编译命令

$ /usr/local/cuda/bin/nvcc -cubin ./vectorAddKernel.cu

三、cuobjdump -sass

命令:

$ /usr/local/cuda/bin/cuobjdump -sass vectorAddKernel.cubin

结果:

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -sass vectorAddKernel.cubin

        code for sm_52
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
                                                                                 /* 0x001cfc00e22007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20] ;                        /* 0x4c98078000870001 */
        /*0010*/                   S2R R0, SR_CTAID.X ;                          /* 0xf0c8000002570000 */
        /*0018*/                   S2R R2, SR_TID.X ;                            /* 0xf0c8000002170002 */
                                                                                 /* 0x001fd842fec20ff1 */
        /*0028*/                   XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ;  /* 0x4f107f8000270003 */
        /*0030*/                   XMAD R2, R0.reuse, c[0x0] [0x8], R2 ;         /* 0x4e00010000270002 */
        /*0038*/                   XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ;          /* 0x5b30011800370000 */
                                                                                 /* 0x001ff400fd4007ed */
        /*0048*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ;  /* 0x4b6d038005670007 */
        /*0050*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*0058*/               @P0 EXIT ;                                        /* 0xe30000000000000f */
                                                                                 /* 0x081fd800fea207f1 */
        /*0068*/                   SHL R6, R0.reuse, 0x2 ;                       /* 0x3848000000270006 */
        /*0070*/                   SHR R0, R0, 0x1e ;                            /* 0x3829000001e70000 */
        /*0078*/                   IADD R4.CC, R6.reuse, c[0x0][0x140] ;         /* 0x4c10800005070604 */
                                                                                 /* 0x001fd800fe0207f2 */
        /*0088*/                   IADD.X R5, R0.reuse, c[0x0][0x144] ;          /* 0x4c10080005170005 */
        /*0090*/         {         IADD R2.CC, R6, c[0x0][0x148] ;               /* 0x4c10800005270602 */
        /*0098*/                   LDG.E R4, [R4]         }
                                                                                 /* 0xeed4200000070404 */
                                                                                 /* 0x001fd800f62007e2 */
        /*00a8*/                   IADD.X R3, R0, c[0x0][0x14c] ;                /* 0x4c10080005370003 */
        /*00b0*/                   LDG.E R2, [R2] ;                              /* 0xeed4200000070202 */
        /*00b8*/                   IADD R6.CC, R6, c[0x0][0x150] ;               /* 0x4c10800005470606 */
                                                                                 /* 0x001fc420fe4007f7 */
        /*00c8*/                   IADD.X R7, R0, c[0x0][0x154] ;                /* 0x4c10080005570007 */
        /*00d0*/                   FADD R0, R2, R4 ;                             /* 0x5c58000000470200 */
        /*00d8*/                   STG.E [R6], R0 ;                              /* 0xeedc200000070600 */
                                                                                 /* 0x001ffc00ffe007ea */
        /*00e8*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*00f0*/                   EXIT ;                                        /* 0xe30000000007000f */
        /*00f8*/                   BRA 0xf8 ;                                    /* 0xe2400fffff87000f */
                ..........


opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ 

四、cuobjdump -elf 

命令:

$ /usr/local/cuda/bin/cuobjdump -elf vectorAddKernel.cubin

结果:

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -elf vectorAddKernel.cubin

64bit elf: type=2, abi=7, sm=52, toolkit=113, flags = 0x340534
Sections:
Index Offset   Size ES Align        Type        Flags Link     Info Name
    1     40     c1  0  1            STRTAB       0    0        0 .shstrtab
    2    101     de  0  1            STRTAB       0    0        0 .strtab
    3    1e0     78 18  8            SYMTAB       0    2        3 .symtab
    4    258     30  0  4         CUDA_INFO       0    3        0 .nv.info
    5    288     78  0  4         CUDA_INFO       0    3        8 .nv.info._Z9vectorAddPKfS0_Pfi
    6    300     d8  8  8    CUDA_RELOCINFO       0    0        0 .nv.rel.action
    7    3d8    15c  0  4          PROGBITS       2    0        8 .nv.constant0._Z9vectorAddPKfS0_Pfi
    8    540    100  0 20          PROGBITS       6    3  8000004 .text._Z9vectorAddPKfS0_Pfi

.section .strtab

.section .shstrtab

.section .symtab
 index           value           size      info    other  shndx    name
   0               0               0        0        0      0     (null)
   1               0               0        3        0      8     .text._Z9vectorAddPKfS0_Pfi
   2               0               0        3        0      7     .nv.constant0._Z9vectorAddPKfS0_Pfi
   3               0               0        3        0      6     .nv.rel.action
   4               0             256       12       10      8     _Z9vectorAddPKfS0_Pfi


.nv.constant0._Z9vectorAddPKfS0_Pfi
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000


.nv.info
        <0x1>
        Attribute:      EIATTR_REGCOUNT
        Format: EIFMT_SVAL
        Value:  function: _Z9vectorAddPKfS0_Pfi(0x4)    register count: 8
        <0x2>
        Attribute:      EIATTR_MAX_STACK_SIZE
        Format: EIFMT_SVAL
        Value:  0x4 0x0
        <0x3>
        Attribute:      EIATTR_MIN_STACK_SIZE
        Format: EIFMT_SVAL
        Value:  function: _Z9vectorAddPKfS0_Pfi(0x4)    min stack size: 0x0
        <0x4>
        Attribute:      EIATTR_FRAME_SIZE
        Format: EIFMT_SVAL
        Value:  function: _Z9vectorAddPKfS0_Pfi(0x4)    frame size: 0x0


.nv.info._Z9vectorAddPKfS0_Pfi
        <0x1>
        Attribute:      EIATTR_CUDA_API_VERSION
        Format: EIFMT_SVAL
        Value:  0x71
        <0x2>
        Attribute:      EIATTR_SW2393858_WAR
        Format: EIFMT_NVAL
        <0x3>
        Attribute:      EIATTR_SW1850030_WAR
        Format: EIFMT_NVAL
        <0x4>
        Attribute:      EIATTR_PARAM_CBANK
        Format: EIFMT_SVAL
        Value:  0x2 0x1c0140
        <0x5>
        Attribute:      EIATTR_CBANK_PARAM_SIZE
        Format: EIFMT_HVAL
        Value:  0x1c
        <0x6>
        Attribute:      EIATTR_KPARAM_INFO
        Format: EIFMT_SVAL
        Value:  Index : 0x0     Ordinal : 0x3   Offset  : 0x18  Size    : 0x4
                Pointee's logAlignment : 0x0    Space : 0x0     cbank : 0x1f    Parameter Space : CBANK
        <0x7>
        Attribute:      EIATTR_KPARAM_INFO
        Format: EIFMT_SVAL
        Value:  Index : 0x0     Ordinal : 0x2   Offset  : 0x10  Size    : 0x8
                Pointee's logAlignment : 0x0    Space : 0x0     cbank : 0x1f    Parameter Space : CBANK
        <0x8>
        Attribute:      EIATTR_KPARAM_INFO
        Format: EIFMT_SVAL
        Value:  Index : 0x0     Ordinal : 0x1   Offset  : 0x8   Size    : 0x8
                Pointee's logAlignment : 0x0    Space : 0x0     cbank : 0x1f    Parameter Space : CBANK
        <0x9>
        Attribute:      EIATTR_KPARAM_INFO
        Format: EIFMT_SVAL
        Value:  Index : 0x0     Ordinal : 0x0   Offset  : 0x0   Size    : 0x8
                Pointee's logAlignment : 0x0    Space : 0x0     cbank : 0x1f    Parameter Space : CBANK
        <0x10>
        Attribute:      EIATTR_MAXREG_COUNT
        Format: EIFMT_HVAL
        Value:  0xff
        <0x11>
        Attribute:      EIATTR_S2RCTAID_INSTR_OFFSETS
        Format: EIFMT_SVAL
        Value:  0x10
        <0x12>
        Attribute:      EIATTR_EXIT_INSTR_OFFSETS
        Format: EIFMT_SVAL
        Value:  0x58 0xf0


.nv.rel.action
Header : Base Relocation : R_CUDA_ABS55_16_34
Reloc type : Symbol Kind,Addend Shift,Source Position1,length1,Destination Position1,Source Position2,length2,Destination Position2
R_CUDA_ABS55_16_34 : EIVALUE_SYM_KIND_ADDR,2,2,8,16,10,47,34
R_CUDA_8_0 : EIVALUE_SYM_KIND_ADDR,0,0,8,0,0,0,0
R_CUDA_8_8 : EIVALUE_SYM_KIND_ADDR,0,8,8,0,0,0,0
R_CUDA_8_16 : EIVALUE_SYM_KIND_ADDR,0,16,8,0,0,0,0
R_CUDA_8_24 : EIVALUE_SYM_KIND_ADDR,0,24,8,0,0,0,0
R_CUDA_8_32 : EIVALUE_SYM_KIND_ADDR,0,32,8,0,0,0,0
R_CUDA_8_40 : EIVALUE_SYM_KIND_ADDR,0,40,8,0,0,0,0
R_CUDA_8_48 : EIVALUE_SYM_KIND_ADDR,0,48,8,0,0,0,0
R_CUDA_8_56 : EIVALUE_SYM_KIND_ADDR,0,56,8,0,0,0,0
R_CUDA_G8_0 : EIVALUE_SYM_KIND_GADDR,0,0,8,0,0,0,0
R_CUDA_G8_8 : EIVALUE_SYM_KIND_GADDR,0,8,8,0,0,0,0
R_CUDA_G8_16 : EIVALUE_SYM_KIND_GADDR,0,16,8,0,0,0,0
R_CUDA_G8_24 : EIVALUE_SYM_KIND_GADDR,0,24,8,0,0,0,0
R_CUDA_G8_32 : EIVALUE_SYM_KIND_GADDR,0,32,8,0,0,0,0
R_CUDA_G8_40 : EIVALUE_SYM_KIND_GADDR,0,40,8,0,0,0,0
R_CUDA_G8_48 : EIVALUE_SYM_KIND_GADDR,0,48,8,0,0,0,0
R_CUDA_G8_56 : EIVALUE_SYM_KIND_GADDR,0,56,8,0,0,0,0
R_CUDA_FUNC_DESC_8_0 : EIVALUE_SYM_KIND_FDESC,0,0,8,0,0,0,0
R_CUDA_FUNC_DESC_8_8 : EIVALUE_SYM_KIND_FDESC,0,8,8,0,0,0,0
R_CUDA_FUNC_DESC_8_16 : EIVALUE_SYM_KIND_FDESC,0,16,8,0,0,0,0
R_CUDA_FUNC_DESC_8_24 : EIVALUE_SYM_KIND_FDESC,0,24,8,0,0,0,0
R_CUDA_FUNC_DESC_8_32 : EIVALUE_SYM_KIND_FDESC,0,32,8,0,0,0,0
R_CUDA_FUNC_DESC_8_40 : EIVALUE_SYM_KIND_FDESC,0,40,8,0,0,0,0
R_CUDA_FUNC_DESC_8_48 : EIVALUE_SYM_KIND_FDESC,0,48,8,0,0,0,0
R_CUDA_FUNC_DESC_8_56 : EIVALUE_SYM_KIND_FDESC,0,56,8,0,0,0,0
R_CUDA_ABS20_44 : EIVALUE_SYM_KIND_ADDR,0,0,20,44,0,0,0


.text._Z9vectorAddPKfS0_Pfi
bar = 0 reg = 8 lmem=0  smem=0
0xe22007f6 0x001cfc00 0x00870001 0x4c980780
0x02570000 0xf0c80000 0x02170002 0xf0c80000
0xfec20ff1 0x001fd842 0x00270003 0x4f107f80
0x00270002 0x4e000100 0x00370000 0x5b300118
0xfd4007ed 0x001ff400 0x05670007 0x4b6d0380
0x00070f00 0x50b00000 0x0000000f 0xe3000000
0xfea207f1 0x081fd800 0x00270006 0x38480000
0x01e70000 0x38290000 0x05070604 0x4c108000
0xfe0207f2 0x001fd800 0x05170005 0x4c100800
0x05270602 0x4c108000 0x00070404 0xeed42000
0xf62007e2 0x001fd800 0x05370003 0x4c100800
0x00070202 0xeed42000 0x05470606 0x4c108000
0xfe4007f7 0x001fc420 0x05570007 0x4c100800
0x00470200 0x5c580000 0x00070600 0xeedc2000
0xffe007ea 0x001ffc00 0x00070f00 0x50b00000
0x0007000f 0xe3000000 0xff87000f 0xe2400fff

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$

三、cuobjdump -ptx <hostBinary>

命令:

cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL

$ /usr/local/cuda/bin/cuobjdump -ptx vectorAdd

结果:



Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

Fatbin ptx code:
================
arch = sm_86
code version = [7,3]
producer = <unknown>
host = linux
compile_size = 64bit
compressed








.version 7.3
.target sm_86
.address_size 64



.visible .entry _Z9vectorAddPKfS0_Pfi(
.param .u64 _Z9vectorAddPKfS0_Pfi_param_0,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_1,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_2,
.param .u32 _Z9vectorAddPKfS0_Pfi_param_3
)
{
.reg .pred %p<2>;
.reg .f32 %f<4>;
.reg .b32 %r<6>;
.reg .b64 %rd<11>;


ld.param.u64 %rd1, [_Z9vectorAddPKfS0_Pfi_param_0];
ld.param.u64 %rd2, [_Z9vectorAddPKfS0_Pfi_param_1];
ld.param.u64 %rd3, [_Z9vectorAddPKfS0_Pfi_param_2];
ld.param.u32 %r2, [_Z9vectorAddPKfS0_Pfi_param_3];
mov.u32 %r3, %ntid.x;
mov.u32 %r4, %ctaid.x;
mov.u32 %r5, %tid.x;
mad.lo.s32 %r1, %r3, %r4, %r5;
setp.ge.s32 %p1, %r1, %r2;
@%p1 bra $L__BB0_2;

cvta.to.global.u64 %rd4, %rd1;
mul.wide.s32 %rd5, %r1, 4;
add.s64 %rd6, %rd4, %rd5;
cvta.to.global.u64 %rd7, %rd2;
add.s64 %rd8, %rd7, %rd5;
ld.global.f32 %f1, [%rd8];
ld.global.f32 %f2, [%rd6];
add.f32 %f3, %f2, %f1;
cvta.to.global.u64 %rd9, %rd3;
add.s64 %rd10, %rd9, %rd5;
st.global.f32 [%rd10], %f3;

$L__BB0_2:
ret;

}


Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

四、综合 cuobjdump -ptx -sass <hostBinary>

命令:

cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL
$ /usr/local/cuda/bin/cuobjdump -ptx -sass vectorAdd

结果:



Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_35

Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_37

Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_50

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_52

Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_60

Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_61

Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_70

Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_75

Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_80

Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_86

Fatbin elf code:
================
arch = sm_35
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_35
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM35 EF_CUDA_PTX_SM(EF_CUDA_SM35)"
                                                                                /* 0x0880b8b0a0a08cc0 */
        /*0008*/                   MOV R1, c[0x0][0x44];                        /* 0x64c03c00089c0006 */
        /*0010*/                   S2R R0, SR_CTAID.X;                          /* 0x86400000129c0002 */
        /*0018*/                   S2R R3, SR_TID.X;                            /* 0x86400000109c000e */
        /*0020*/                   IMAD R0, R0, c[0x0][0x28], R3;               /* 0x51080c00051c0002 */
        /*0028*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT;  /* 0x5b681c002b1c001e */
        /*0030*/               @P0 EXIT;                                        /* 0x180000000000003c */
        /*0038*/                   ISCADD R4.CC, R0, c[0x0][0x140], 0x2;        /* 0x60c40800281c0012 */
                                                                                /* 0x08b010a0b010a0ac */
        /*0048*/                   MOV32I R7, 0x4;                              /* 0x74000000021fc01e */
        /*0050*/                   IMAD.HI.X R5, R0, R7, c[0x0][0x144];         /* 0x93181c00289c0016 */
        /*0058*/                   ISCADD R2.CC, R0, c[0x0][0x148], 0x2;        /* 0x60c40800291c000a */
        /*0060*/                   LD.E R4, [R4];                               /* 0xc4800000001c1010 */
        /*0068*/                   IMAD.HI.X R3, R0, R7, c[0x0][0x14c];         /* 0x93181c00299c000e */
        /*0070*/                   LD.E R2, [R2];                               /* 0xc4800000001c0808 */
        /*0078*/                   ISCADD R6.CC, R0, c[0x0][0x150], 0x2;        /* 0x60c408002a1c001a */
                                                                                /* 0x08000000bc10a0fc */
        /*0088*/                   IMAD.HI.X R7, R0, R7, c[0x0][0x154];         /* 0x93181c002a9c001e */
        /*0090*/                   FADD R0, R2, R4;                             /* 0xe2c00000021c0802 */
        /*0098*/                   ST.E [R6], R0;                               /* 0xe4800000001c1800 */
        /*00a0*/                   EXIT;                                        /* 0x18000000001c003c */
        /*00a8*/                   BRA 0xa8;                                    /* 0x12007ffffc1c003c */
        /*00b0*/                   NOP;                                         /* 0x85800000001c3c02 */
        /*00b8*/                   NOP;                                         /* 0x85800000001c3c02 */
                ..........



Fatbin elf code:
================
arch = sm_37
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_37
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM37 EF_CUDA_PTX_SM(EF_CUDA_SM37)"
                                                                                /* 0x0880b8b0a0a08cc0 */
        /*0008*/                   MOV R1, c[0x0][0x44];                        /* 0x64c03c00089c0006 */
        /*0010*/                   S2R R0, SR_CTAID.X;                          /* 0x86400000129c0002 */
        /*0018*/                   S2R R3, SR_TID.X;                            /* 0x86400000109c000e */
        /*0020*/                   IMAD R0, R0, c[0x0][0x28], R3;               /* 0x51080c00051c0002 */
        /*0028*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT;  /* 0x5b681c002b1c001e */
        /*0030*/               @P0 EXIT;                                        /* 0x180000000000003c */
        /*0038*/                   ISCADD R4.CC, R0, c[0x0][0x140], 0x2;        /* 0x60c40800281c0012 */
                                                                                /* 0x08b010a0b010a0ac */
        /*0048*/                   MOV32I R7, 0x4;                              /* 0x74000000021fc01e */
        /*0050*/                   IMAD.HI.X R5, R0, R7, c[0x0][0x144];         /* 0x93181c00289c0016 */
        /*0058*/                   ISCADD R2.CC, R0, c[0x0][0x148], 0x2;        /* 0x60c40800291c000a */
        /*0060*/                   LD.E R4, [R4];                               /* 0xc4800000001c1010 */
        /*0068*/                   IMAD.HI.X R3, R0, R7, c[0x0][0x14c];         /* 0x93181c00299c000e */
        /*0070*/                   LD.E R2, [R2];                               /* 0xc4800000001c0808 */
        /*0078*/                   ISCADD R6.CC, R0, c[0x0][0x150], 0x2;        /* 0x60c408002a1c001a */
                                                                                /* 0x08000000bc10a0fc */
        /*0088*/                   IMAD.HI.X R7, R0, R7, c[0x0][0x154];         /* 0x93181c002a9c001e */
        /*0090*/                   FADD R0, R2, R4;                             /* 0xe2c00000021c0802 */
        /*0098*/                   ST.E [R6], R0;                               /* 0xe4800000001c1800 */
        /*00a0*/                   EXIT;                                        /* 0x18000000001c003c */
        /*00a8*/                   BRA 0xa8;                                    /* 0x12007ffffc1c003c */
        /*00b0*/                   NOP;                                         /* 0x85800000001c3c02 */
        /*00b8*/                   NOP;                                         /* 0x85800000001c3c02 */
                ..........



Fatbin elf code:
================
arch = sm_50
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_50
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM50 EF_CUDA_PTX_SM(EF_CUDA_SM50)"
                                                                                 /* 0x001cfc00e22007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20] ;                        /* 0x4c98078000870001 */
        /*0010*/                   S2R R0, SR_CTAID.X ;                          /* 0xf0c8000002570000 */
        /*0018*/                   S2R R2, SR_TID.X ;                            /* 0xf0c8000002170002 */
                                                                                 /* 0x001fd842fec20ff1 */
        /*0028*/                   XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ;  /* 0x4f107f8000270003 */
        /*0030*/                   XMAD R2, R0.reuse, c[0x0] [0x8], R2 ;         /* 0x4e00010000270002 */
        /*0038*/                   XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ;          /* 0x5b30011800370000 */
                                                                                 /* 0x001ff400fd4007ed */
        /*0048*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ;  /* 0x4b6d038005670007 */
        /*0050*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*0058*/               @P0 EXIT ;                                        /* 0xe30000000000000f */
                                                                                 /* 0x081fd800fea207f1 */
        /*0068*/                   SHL R6, R0.reuse, 0x2 ;                       /* 0x3848000000270006 */
        /*0070*/                   SHR R0, R0, 0x1e ;                            /* 0x3829000001e70000 */
        /*0078*/                   IADD R4.CC, R6.reuse, c[0x0][0x140] ;         /* 0x4c10800005070604 */
                                                                                 /* 0x001fd800fe0207f2 */
        /*0088*/                   IADD.X R5, R0.reuse, c[0x0][0x144] ;          /* 0x4c10080005170005 */
        /*0090*/         {         IADD R2.CC, R6, c[0x0][0x148] ;               /* 0x4c10800005270602 */
        /*0098*/                   LDG.E R4, [R4]         }
                                                                                 /* 0xeed4200000070404 */
                                                                                 /* 0x001fd800f62007e2 */
        /*00a8*/                   IADD.X R3, R0, c[0x0][0x14c] ;                /* 0x4c10080005370003 */
        /*00b0*/                   LDG.E R2, [R2] ;                              /* 0xeed4200000070202 */
        /*00b8*/                   IADD R6.CC, R6, c[0x0][0x150] ;               /* 0x4c10800005470606 */
                                                                                 /* 0x001fc420fe4007f7 */
        /*00c8*/                   IADD.X R7, R0, c[0x0][0x154] ;                /* 0x4c10080005570007 */
        /*00d0*/                   FADD R0, R2, R4 ;                             /* 0x5c58000000470200 */
        /*00d8*/                   STG.E [R6], R0 ;                              /* 0xeedc200000070600 */
                                                                                 /* 0x001ffc00ffe007ea */
        /*00e8*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*00f0*/                   EXIT ;                                        /* 0xe30000000007000f */
        /*00f8*/                   BRA 0xf8 ;                                    /* 0xe2400fffff87000f */
                ..........



Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_52
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
                                                                                 /* 0x001cfc00e22007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20] ;                        /* 0x4c98078000870001 */
        /*0010*/                   S2R R0, SR_CTAID.X ;                          /* 0xf0c8000002570000 */
        /*0018*/                   S2R R2, SR_TID.X ;                            /* 0xf0c8000002170002 */
                                                                                 /* 0x001fd842fec20ff1 */
        /*0028*/                   XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ;  /* 0x4f107f8000270003 */
        /*0030*/                   XMAD R2, R0.reuse, c[0x0] [0x8], R2 ;         /* 0x4e00010000270002 */
        /*0038*/                   XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ;          /* 0x5b30011800370000 */
                                                                                 /* 0x001ff400fd4007ed */
        /*0048*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ;  /* 0x4b6d038005670007 */
        /*0050*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*0058*/               @P0 EXIT ;                                        /* 0xe30000000000000f */
                                                                                 /* 0x081fd800fea207f1 */
        /*0068*/                   SHL R6, R0.reuse, 0x2 ;                       /* 0x3848000000270006 */
        /*0070*/                   SHR R0, R0, 0x1e ;                            /* 0x3829000001e70000 */
        /*0078*/                   IADD R4.CC, R6.reuse, c[0x0][0x140] ;         /* 0x4c10800005070604 */
                                                                                 /* 0x001fd800fe0207f2 */
        /*0088*/                   IADD.X R5, R0.reuse, c[0x0][0x144] ;          /* 0x4c10080005170005 */
        /*0090*/         {         IADD R2.CC, R6, c[0x0][0x148] ;               /* 0x4c10800005270602 */
        /*0098*/                   LDG.E R4, [R4]         }
                                                                                 /* 0xeed4200000070404 */
                                                                                 /* 0x001fd800f62007e2 */
        /*00a8*/                   IADD.X R3, R0, c[0x0][0x14c] ;                /* 0x4c10080005370003 */
        /*00b0*/                   LDG.E R2, [R2] ;                              /* 0xeed4200000070202 */
        /*00b8*/                   IADD R6.CC, R6, c[0x0][0x150] ;               /* 0x4c10800005470606 */
                                                                                 /* 0x001fc420fe4007f7 */
        /*00c8*/                   IADD.X R7, R0, c[0x0][0x154] ;                /* 0x4c10080005570007 */
        /*00d0*/                   FADD R0, R2, R4 ;                             /* 0x5c58000000470200 */
        /*00d8*/                   STG.E [R6], R0 ;                              /* 0xeedc200000070600 */
                                                                                 /* 0x001ffc00ffe007ea */
        /*00e8*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*00f0*/                   EXIT ;                                        /* 0xe30000000007000f */
        /*00f8*/                   BRA 0xf8 ;                                    /* 0xe2400fffff87000f */
                ..........



Fatbin elf code:
================
arch = sm_60
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_60
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM60 EF_CUDA_PTX_SM(EF_CUDA_SM60)"
                                                                                 /* 0x001cfc00e22007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20] ;                        /* 0x4c98078000870001 */
        /*0010*/                   S2R R0, SR_CTAID.X ;                          /* 0xf0c8000002570000 */
        /*0018*/                   S2R R2, SR_TID.X ;                            /* 0xf0c8000002170002 */
                                                                                 /* 0x001fd842fec20ff1 */
        /*0028*/                   XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ;  /* 0x4f107f8000270003 */
        /*0030*/                   XMAD R2, R0.reuse, c[0x0] [0x8], R2 ;         /* 0x4e00010000270002 */
        /*0038*/                   XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ;          /* 0x5b30011800370000 */
                                                                                 /* 0x001ff400fd4007ed */
        /*0048*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ;  /* 0x4b6d038005670007 */
        /*0050*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*0058*/               @P0 EXIT ;                                        /* 0xe30000000000000f */
                                                                                 /* 0x081fd800fea207f1 */
        /*0068*/                   SHL R6, R0.reuse, 0x2 ;                       /* 0x3848000000270006 */
        /*0070*/                   SHR R0, R0, 0x1e ;                            /* 0x3829000001e70000 */
        /*0078*/                   IADD R4.CC, R6.reuse, c[0x0][0x140] ;         /* 0x4c10800005070604 */
                                                                                 /* 0x001fd800fe0207f2 */
        /*0088*/                   IADD.X R5, R0.reuse, c[0x0][0x144] ;          /* 0x4c10080005170005 */
        /*0090*/         {         IADD R2.CC, R6, c[0x0][0x148] ;               /* 0x4c10800005270602 */
        /*0098*/                   LDG.E R4, [R4]         }
                                                                                 /* 0xeed4200000070404 */
                                                                                 /* 0x001fd800f62007e2 */
        /*00a8*/                   IADD.X R3, R0, c[0x0][0x14c] ;                /* 0x4c10080005370003 */
        /*00b0*/                   LDG.E R2, [R2] ;                              /* 0xeed4200000070202 */
        /*00b8*/                   IADD R6.CC, R6, c[0x0][0x150] ;               /* 0x4c10800005470606 */
                                                                                 /* 0x001fc420fe4007f7 */
        /*00c8*/                   IADD.X R7, R0, c[0x0][0x154] ;                /* 0x4c10080005570007 */
        /*00d0*/                   FADD R0, R2, R4 ;                             /* 0x5c58000000470200 */
        /*00d8*/                   STG.E [R6], R0 ;                              /* 0xeedc200000070600 */
                                                                                 /* 0x001ffc00ffe007ea */
        /*00e8*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*00f0*/                   EXIT ;                                        /* 0xe30000000007000f */
        /*00f8*/                   BRA 0xf8 ;                                    /* 0xe2400fffff87000f */
                ..........



Fatbin elf code:
================
arch = sm_61
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_61
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM61 EF_CUDA_PTX_SM(EF_CUDA_SM61)"
                                                                                 /* 0x001cfc00e22007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20] ;                        /* 0x4c98078000870001 */
        /*0010*/                   S2R R0, SR_CTAID.X ;                          /* 0xf0c8000002570000 */
        /*0018*/                   S2R R2, SR_TID.X ;                            /* 0xf0c8000002170002 */
                                                                                 /* 0x001fd842fec20ff1 */
        /*0028*/                   XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ;  /* 0x4f107f8000270003 */
        /*0030*/                   XMAD R2, R0.reuse, c[0x0] [0x8], R2 ;         /* 0x4e00010000270002 */
        /*0038*/                   XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ;          /* 0x5b30011800370000 */
                                                                                 /* 0x001ff400fd4007ed */
        /*0048*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ;  /* 0x4b6d038005670007 */
        /*0050*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*0058*/               @P0 EXIT ;                                        /* 0xe30000000000000f */
                                                                                 /* 0x081fd800fea207f1 */
        /*0068*/                   SHL R6, R0.reuse, 0x2 ;                       /* 0x3848000000270006 */
        /*0070*/                   SHR R0, R0, 0x1e ;                            /* 0x3829000001e70000 */
        /*0078*/                   IADD R4.CC, R6.reuse, c[0x0][0x140] ;         /* 0x4c10800005070604 */
                                                                                 /* 0x001fd800fe0207f2 */
        /*0088*/                   IADD.X R5, R0.reuse, c[0x0][0x144] ;          /* 0x4c10080005170005 */
        /*0090*/         {         IADD R2.CC, R6, c[0x0][0x148] ;               /* 0x4c10800005270602 */
        /*0098*/                   LDG.E R4, [R4]         }
                                                                                 /* 0xeed4200000070404 */
                                                                                 /* 0x001fd800f62007e2 */
        /*00a8*/                   IADD.X R3, R0, c[0x0][0x14c] ;                /* 0x4c10080005370003 */
        /*00b0*/                   LDG.E R2, [R2] ;                              /* 0xeed4200000070202 */
        /*00b8*/                   IADD R6.CC, R6, c[0x0][0x150] ;               /* 0x4c10800005470606 */
                                                                                 /* 0x001fc420fe4007f7 */
        /*00c8*/                   IADD.X R7, R0, c[0x0][0x154] ;                /* 0x4c10080005570007 */
        /*00d0*/                   FADD R0, R2, R4 ;                             /* 0x5c58000000470200 */
        /*00d8*/                   STG.E [R6], R0 ;                              /* 0xeedc200000070600 */
                                                                                 /* 0x001ffc00ffe007ea */
        /*00e8*/                   NOP ;                                         /* 0x50b0000000070f00 */
        /*00f0*/                   EXIT ;                                        /* 0xe30000000007000f */
        /*00f8*/                   BRA 0xf8 ;                                    /* 0xe2400fffff87000f */
                ..........



Fatbin elf code:
================
arch = sm_70
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_70
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM70 EF_CUDA_PTX_SM(EF_CUDA_SM70)"
        /*0000*/                   MOV R1, c[0x0][0x28] ;                        /* 0x00000a0000017a02 */
                                                                                 /* 0x000fc40000000f00 */
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;                 /* 0x000000fffffff389 */
                                                                                 /* 0x000fe200000e00ff */
        /*0020*/                   S2R R6, SR_CTAID.X ;                          /* 0x0000000000067919 */
                                                                                 /* 0x000e280000002500 */
        /*0030*/                   S2R R3, SR_TID.X ;                            /* 0x0000000000037919 */
                                                                                 /* 0x000e240000002100 */
        /*0040*/                   IMAD R6, R6, c[0x0][0x0], R3 ;                /* 0x0000000006067a24 */
                                                                                 /* 0x001fca00078e0203 */
        /*0050*/                   ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ;  /* 0x00005e0006007a0c */
                                                                                 /* 0x000fd80003f06270 */
        /*0060*/               @P0 EXIT ;                                        /* 0x000000000000094d */
                                                                                 /* 0x000fea0003800000 */
        /*0070*/                   MOV R7, 0x4 ;                                 /* 0x0000000400077802 */
                                                                                 /* 0x000fca0000000f00 */
        /*0080*/                   IMAD.WIDE R4, R6, R7, c[0x0][0x168] ;         /* 0x00005a0006047625 */
                                                                                 /* 0x000fc800078e0207 */
        /*0090*/                   IMAD.WIDE R2, R6, R7, c[0x0][0x160] ;         /* 0x0000580006027625 */
                                                                                 /* 0x000fc800078e0207 */
        /*00a0*/                   LDG.E.SYS R4, [R4] ;                          /* 0x0000000004047381 */
                                                                                 /* 0x000ea800001ee900 */
        /*00b0*/                   LDG.E.SYS R3, [R2] ;                          /* 0x0000000002037381 */
                                                                                 /* 0x000ea200001ee900 */
        /*00c0*/                   IMAD.WIDE R6, R6, R7, c[0x0][0x170] ;         /* 0x00005c0006067625 */
                                                                                 /* 0x000fc800078e0207 */
        /*00d0*/                   FADD R9, R4, R3 ;                             /* 0x0000000304097221 */
                                                                                 /* 0x004fd00000000000 */
        /*00e0*/                   STG.E.SYS [R6], R9 ;                          /* 0x0000000906007386 */
                                                                                 /* 0x000fe2000010e900 */
        /*00f0*/                   EXIT ;                                        /* 0x000000000000794d */
                                                                                 /* 0x000fea0003800000 */
        /*0100*/                   BRA 0x100;                                    /* 0xfffffff000007947 */
                                                                                 /* 0x000fc0000383ffff */
        /*0110*/                   NOP;                                          /* 0x0000000000007918 */
                                                                                 /* 0x000fc00000000000 */
        /*0120*/                   NOP;                                          /* 0x0000000000007918 */
                                                                                 /* 0x000fc00000000000 */
        /*0130*/                   NOP;                                          /* 0x0000000000007918 */
                                                                                 /* 0x000fc00000000000 */
        /*0140*/                   NOP;                                          /* 0x0000000000007918 */
                                                                                 /* 0x000fc00000000000 */
        /*0150*/                   NOP;                                          /* 0x0000000000007918 */
                                                                                 /* 0x000fc00000000000 */
        /*0160*/                   NOP;                                          /* 0x0000000000007918 */
                                                                                 /* 0x000fc00000000000 */
        /*0170*/                   NOP;                                          /* 0x0000000000007918 */
                                                                                 /* 0x000fc00000000000 */
                ..........



Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_75
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM75 EF_CUDA_PTX_SM(EF_CUDA_SM75)"
        /*0000*/                   MOV R1, c[0x0][0x28] ;                        /* 0x00000a0000017a02 */
                                                                                 /* 0x000fc40000000f00 */
        /*0010*/                   S2R R6, SR_CTAID.X ;                          /* 0x0000000000067919 */
                                                                                 /* 0x000e280000002500 */
        /*0020*/                   S2R R3, SR_TID.X ;                            /* 0x0000000000037919 */
                                                                                 /* 0x000e240000002100 */
        /*0030*/                   IMAD R6, R6, c[0x0][0x0], R3 ;                /* 0x0000000006067a24 */
                                                                                 /* 0x001fca00078e0203 */
        /*0040*/                   ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ;  /* 0x00005e0006007a0c */
                                                                                 /* 0x000fd80003f06270 */
        /*0050*/               @P0 EXIT ;                                        /* 0x000000000000094d */
                                                                                 /* 0x000fea0003800000 */
        /*0060*/                   MOV R7, 0x4 ;                                 /* 0x0000000400077802 */
                                                                                 /* 0x000fca0000000f00 */
        /*0070*/                   IMAD.WIDE R4, R6, R7, c[0x0][0x168] ;         /* 0x00005a0006047625 */
                                                                                 /* 0x000fc800078e0207 */
        /*0080*/                   IMAD.WIDE R2, R6, R7, c[0x0][0x160] ;         /* 0x0000580006027625 */
                                                                                 /* 0x000fc800078e0207 */
        /*0090*/                   LDG.E.SYS R4, [R4] ;                          /* 0x0000000004047381 */
                                                                                 /* 0x000ea800001ee900 */
        /*00a0*/                   LDG.E.SYS R3, [R2] ;                          /* 0x0000000002037381 */
                                                                                 /* 0x000ea200001ee900 */
        /*00b0*/                   IMAD.WIDE R6, R6, R7, c[0x0][0x170] ;         /* 0x00005c0006067625 */
                                                                                 /* 0x000fc800078e0207 */
        /*00c0*/                   FADD R9, R4, R3 ;                             /* 0x0000000304097221 */
                                                                                 /* 0x004fd00000000000 */
        /*00d0*/                   STG.E.SYS [R6], R9 ;                          /* 0x0000000906007386 */
                                                                                 /* 0x000fe2000010e900 */
        /*00e0*/                   EXIT ;                                        /* 0x000000000000794d */
                                                                                 /* 0x000fea0003800000 */
        /*00f0*/                   BRA 0xf0;                                     /* 0xfffffff000007947 */
                                                                                 /* 0x000fc0000383ffff */
                ..........



Fatbin elf code:
================
arch = sm_80
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_80
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM80 EF_CUDA_PTX_SM(EF_CUDA_SM80)"
        /*0000*/                   MOV R1, c[0x0][0x28] ;                             /* 0x00000a0000017a02 */
                                                                                      /* 0x000fc40000000f00 */
        /*0010*/                   S2R R6, SR_CTAID.X ;                               /* 0x0000000000067919 */
                                                                                      /* 0x000e280000002500 */
        /*0020*/                   S2R R3, SR_TID.X ;                                 /* 0x0000000000037919 */
                                                                                      /* 0x000e240000002100 */
        /*0030*/                   IMAD R6, R6, c[0x0][0x0], R3 ;                     /* 0x0000000006067a24 */
                                                                                      /* 0x001fca00078e0203 */
        /*0040*/                   ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ;       /* 0x00005e0006007a0c */
                                                                                      /* 0x000fda0003f06270 */
        /*0050*/               @P0 EXIT ;                                             /* 0x000000000000094d */
                                                                                      /* 0x000fea0003800000 */
        /*0060*/                   HFMA2.MMA R7, -RZ, RZ, 0, 2.384185791015625e-07 ;  /* 0x00000004ff077435 */
                                                                                      /* 0x000fe200000001ff */
        /*0070*/                   ULDC.64 UR4, c[0x0][0x118] ;                       /* 0x0000460000047ab9 */
                                                                                      /* 0x000fd20000000a00 */
        /*0080*/                   IMAD.WIDE R4, R6, R7, c[0x0][0x168] ;              /* 0x00005a0006047625 */
                                                                                      /* 0x000fc800078e0207 */
        /*0090*/                   IMAD.WIDE R2, R6.reuse, R7.reuse, c[0x0][0x160] ;  /* 0x0000580006027625 */
                                                                                      /* 0x0c0fe400078e0207 */
        /*00a0*/                   LDG.E R4, [R4.64] ;                                /* 0x0000000404047981 */
                                                                                      /* 0x000ea8000c1e1900 */
        /*00b0*/                   LDG.E R3, [R2.64] ;                                /* 0x0000000402037981 */
                                                                                      /* 0x000ea2000c1e1900 */
        /*00c0*/                   IMAD.WIDE R6, R6, R7, c[0x0][0x170] ;              /* 0x00005c0006067625 */
                                                                                      /* 0x000fc800078e0207 */
        /*00d0*/                   FADD R9, R4, R3 ;                                  /* 0x0000000304097221 */
                                                                                      /* 0x004fca0000000000 */
        /*00e0*/                   STG.E [R6.64], R9 ;                                /* 0x0000000906007986 */
                                                                                      /* 0x000fe2000c101904 */
        /*00f0*/                   EXIT ;                                             /* 0x000000000000794d */
                                                                                      /* 0x000fea0003800000 */
        /*0100*/                   BRA 0x100;                                         /* 0xfffffff000007947 */
                                                                                      /* 0x000fc0000383ffff */
        /*0110*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0120*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0130*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0140*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0150*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0160*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0170*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0180*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0190*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01a0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01b0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01c0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01d0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01e0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01f0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
                ..........



Fatbin ptx code:
================
arch = sm_86
code version = [7,3]
producer = <unknown>
host = linux
compile_size = 64bit
compressed








.version 7.3
.target sm_86
.address_size 64



.visible .entry _Z9vectorAddPKfS0_Pfi(
.param .u64 _Z9vectorAddPKfS0_Pfi_param_0,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_1,
.param .u64 _Z9vectorAddPKfS0_Pfi_param_2,
.param .u32 _Z9vectorAddPKfS0_Pfi_param_3
)
{
.reg .pred %p<2>;
.reg .f32 %f<4>;
.reg .b32 %r<6>;
.reg .b64 %rd<11>;


ld.param.u64 %rd1, [_Z9vectorAddPKfS0_Pfi_param_0];
ld.param.u64 %rd2, [_Z9vectorAddPKfS0_Pfi_param_1];
ld.param.u64 %rd3, [_Z9vectorAddPKfS0_Pfi_param_2];
ld.param.u32 %r2, [_Z9vectorAddPKfS0_Pfi_param_3];
mov.u32 %r3, %ntid.x;
mov.u32 %r4, %ctaid.x;
mov.u32 %r5, %tid.x;
mad.lo.s32 %r1, %r3, %r4, %r5;
setp.ge.s32 %p1, %r1, %r2;
@%p1 bra $L__BB0_2;

cvta.to.global.u64 %rd4, %rd1;
mul.wide.s32 %rd5, %r1, 4;
add.s64 %rd6, %rd4, %rd5;
cvta.to.global.u64 %rd7, %rd2;
add.s64 %rd8, %rd7, %rd5;
ld.global.f32 %f1, [%rd8];
ld.global.f32 %f2, [%rd6];
add.f32 %f3, %f2, %f1;
cvta.to.global.u64 %rd9, %rd3;
add.s64 %rd10, %rd9, %rd5;
st.global.f32 [%rd10], %f3;

$L__BB0_2:
ret;

}


Fatbin elf code:
================
arch = sm_86
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_86
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM86 EF_CUDA_PTX_SM(EF_CUDA_SM86)"
        /*0000*/                   MOV R1, c[0x0][0x28] ;                             /* 0x00000a0000017a02 */
                                                                                      /* 0x000fc40000000f00 */
        /*0010*/                   S2R R6, SR_CTAID.X ;                               /* 0x0000000000067919 */
                                                                                      /* 0x000e280000002500 */
        /*0020*/                   S2R R3, SR_TID.X ;                                 /* 0x0000000000037919 */
                                                                                      /* 0x000e240000002100 */
        /*0030*/                   IMAD R6, R6, c[0x0][0x0], R3 ;                     /* 0x0000000006067a24 */
                                                                                      /* 0x001fca00078e0203 */
        /*0040*/                   ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ;       /* 0x00005e0006007a0c */
                                                                                      /* 0x000fda0003f06270 */
        /*0050*/               @P0 EXIT ;                                             /* 0x000000000000094d */
                                                                                      /* 0x000fea0003800000 */
        /*0060*/                   MOV R7, 0x4 ;                                      /* 0x0000000400077802 */
                                                                                      /* 0x000fe20000000f00 */
        /*0070*/                   ULDC.64 UR4, c[0x0][0x118] ;                       /* 0x0000460000047ab9 */
                                                                                      /* 0x000fc80000000a00 */
        /*0080*/                   IMAD.WIDE R4, R6, R7, c[0x0][0x168] ;              /* 0x00005a0006047625 */
                                                                                      /* 0x000fc800078e0207 */
        /*0090*/                   IMAD.WIDE R2, R6.reuse, R7.reuse, c[0x0][0x160] ;  /* 0x0000580006027625 */
                                                                                      /* 0x0c0fe400078e0207 */
        /*00a0*/                   LDG.E R4, [R4.64] ;                                /* 0x0000000404047981 */
                                                                                      /* 0x000ea8000c1e1900 */
        /*00b0*/                   LDG.E R3, [R2.64] ;                                /* 0x0000000402037981 */
                                                                                      /* 0x000ea2000c1e1900 */
        /*00c0*/                   IMAD.WIDE R6, R6, R7, c[0x0][0x170] ;              /* 0x00005c0006067625 */
                                                                                      /* 0x000fe200078e0207 */
        /*00d0*/                   FADD R9, R4, R3 ;                                  /* 0x0000000304097221 */
                                                                                      /* 0x004fca0000000000 */
        /*00e0*/                   STG.E [R6.64], R9 ;                                /* 0x0000000906007986 */
                                                                                      /* 0x000fe2000c101904 */
        /*00f0*/                   EXIT ;                                             /* 0x000000000000794d */
                                                                                      /* 0x000fea0003800000 */
        /*0100*/                   BRA 0x100;                                         /* 0xfffffff000007947 */
                                                                                      /* 0x000fc0000383ffff */
        /*0110*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0120*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0130*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0140*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0150*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0160*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0170*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0180*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0190*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01a0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01b0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01c0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01d0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01e0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01f0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
                ..........


五、cuobjdump -lelf <hostBinary>

命令:

$ /usr/local/cuda/bin/cuobjdump -lelf vectorAdd

结果:


ELF file    1: vectorAdd.1.sm_35.cubin
ELF file    2: vectorAdd.2.sm_37.cubin
ELF file    3: vectorAdd.3.sm_50.cubin
ELF file    4: vectorAdd.4.sm_52.cubin
ELF file    5: vectorAdd.5.sm_60.cubin
ELF file    6: vectorAdd.6.sm_61.cubin
ELF file    7: vectorAdd.7.sm_70.cubin
ELF file    8: vectorAdd.8.sm_75.cubin
ELF file    9: vectorAdd.9.sm_80.cubin
ELF file   10: vectorAdd.10.sm_86.cubin
ELF file   11: vectorAdd.11.sm_35.cubin
ELF file   12: vectorAdd.12.sm_37.cubin
ELF file   13: vectorAdd.13.sm_50.cubin
ELF file   14: vectorAdd.14.sm_52.cubin
ELF file   15: vectorAdd.15.sm_60.cubin
ELF file   16: vectorAdd.16.sm_61.cubin
ELF file   17: vectorAdd.17.sm_70.cubin
ELF file   18: vectorAdd.18.sm_75.cubin
ELF file   19: vectorAdd.19.sm_80.cubin
ELF file   20: vectorAdd.20.sm_86.cubin

六、cuobjdump -xelf all <hostBinary>

命令:

cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL
$ /usr/local/cuda/bin/cuobjdump -xelf all vectorAdd

结果:

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -xelf all vectorAdd
Extracting ELF file    1: vectorAdd.1.sm_35.cubin
Extracting ELF file    2: vectorAdd.2.sm_37.cubin
Extracting ELF file    3: vectorAdd.3.sm_50.cubin
Extracting ELF file    4: vectorAdd.4.sm_52.cubin
Extracting ELF file    5: vectorAdd.5.sm_60.cubin
Extracting ELF file    6: vectorAdd.6.sm_61.cubin
Extracting ELF file    7: vectorAdd.7.sm_70.cubin
Extracting ELF file    8: vectorAdd.8.sm_75.cubin
Extracting ELF file    9: vectorAdd.9.sm_80.cubin
Extracting ELF file   10: vectorAdd.10.sm_86.cubin
Extracting ELF file   11: vectorAdd.11.sm_35.cubin
Extracting ELF file   12: vectorAdd.12.sm_37.cubin
Extracting ELF file   13: vectorAdd.13.sm_50.cubin
Extracting ELF file   14: vectorAdd.14.sm_52.cubin
Extracting ELF file   15: vectorAdd.15.sm_60.cubin
Extracting ELF file   16: vectorAdd.16.sm_61.cubin
Extracting ELF file   17: vectorAdd.17.sm_70.cubin
Extracting ELF file   18: vectorAdd.18.sm_75.cubin
Extracting ELF file   19: vectorAdd.19.sm_80.cubin
Extracting ELF file   20: vectorAdd.20.sm_86.cubin
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ ls
Makefile           vectorAdd.10.sm_86.cubin  vectorAdd.14.sm_52.cubin  vectorAdd.18.sm_75.cubin  vectorAdd.2.sm_37.cubin  vectorAdd.6.sm_61.cubin  vectorAdd.cu
NsightEclipse.xml  vectorAdd.11.sm_35.cubin  vectorAdd.15.sm_60.cubin  vectorAdd.19.sm_80.cubin  vectorAdd.3.sm_50.cubin  vectorAdd.7.sm_70.cubin  vectorAddKernel.cu
readme.txt         vectorAdd.12.sm_37.cubin  vectorAdd.16.sm_61.cubin  vectorAdd.1.sm_35.cubin   vectorAdd.4.sm_52.cubin  vectorAdd.8.sm_75.cubin  vectorAddKernel.cubin
vectorAdd          vectorAdd.13.sm_50.cubin  vectorAdd.17.sm_70.cubin  vectorAdd.20.sm_86.cubin  vectorAdd.5.sm_60.cubin  vectorAdd.9.sm_80.cubin  vectorAdd.o
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$

七、解压出特定cubin

命令:

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -lelf vectorAdd
ELF file    1: vectorAdd.1.sm_35.cubin
ELF file    2: vectorAdd.2.sm_37.cubin
ELF file    3: vectorAdd.3.sm_50.cubin
ELF file    4: vectorAdd.4.sm_52.cubin
ELF file    5: vectorAdd.5.sm_60.cubin
ELF file    6: vectorAdd.6.sm_61.cubin
ELF file    7: vectorAdd.7.sm_70.cubin
ELF file    8: vectorAdd.8.sm_75.cubin
ELF file    9: vectorAdd.9.sm_80.cubin
ELF file   10: vectorAdd.10.sm_86.cubin
ELF file   11: vectorAdd.11.sm_35.cubin
ELF file   12: vectorAdd.12.sm_37.cubin
ELF file   13: vectorAdd.13.sm_50.cubin
ELF file   14: vectorAdd.14.sm_52.cubin
ELF file   15: vectorAdd.15.sm_60.cubin
ELF file   16: vectorAdd.16.sm_61.cubin
ELF file   17: vectorAdd.17.sm_70.cubin
ELF file   18: vectorAdd.18.sm_75.cubin
ELF file   19: vectorAdd.19.sm_80.cubin
ELF file   20: vectorAdd.20.sm_86.cubin
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$


cd ~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL
$ /usr/local/cuda/bin/cuobjdump vectorAdd -xelf vectorAdd.19.sm_80.cubin
Extracting ELF file   19: vectorAdd.19.sm_80.cubin
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$

结果:

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -sass vectorAdd.19.sm_80.cubin

        code for sm_80
                Function : _Z9vectorAddPKfS0_Pfi
        .headerflags    @"EF_CUDA_SM80 EF_CUDA_PTX_SM(EF_CUDA_SM80)"
        /*0000*/                   MOV R1, c[0x0][0x28] ;                             /* 0x00000a0000017a02 */
                                                                                      /* 0x000fc40000000f00 */
        /*0010*/                   S2R R6, SR_CTAID.X ;                               /* 0x0000000000067919 */
                                                                                      /* 0x000e280000002500 */
        /*0020*/                   S2R R3, SR_TID.X ;                                 /* 0x0000000000037919 */
                                                                                      /* 0x000e240000002100 */
        /*0030*/                   IMAD R6, R6, c[0x0][0x0], R3 ;                     /* 0x0000000006067a24 */
                                                                                      /* 0x001fca00078e0203 */
        /*0040*/                   ISETP.GE.AND P0, PT, R6, c[0x0][0x178], PT ;       /* 0x00005e0006007a0c */
                                                                                      /* 0x000fda0003f06270 */
        /*0050*/               @P0 EXIT ;                                             /* 0x000000000000094d */
                                                                                      /* 0x000fea0003800000 */
        /*0060*/                   HFMA2.MMA R7, -RZ, RZ, 0, 2.384185791015625e-07 ;  /* 0x00000004ff077435 */
                                                                                      /* 0x000fe200000001ff */
        /*0070*/                   ULDC.64 UR4, c[0x0][0x118] ;                       /* 0x0000460000047ab9 */
                                                                                      /* 0x000fd20000000a00 */
        /*0080*/                   IMAD.WIDE R4, R6, R7, c[0x0][0x168] ;              /* 0x00005a0006047625 */
                                                                                      /* 0x000fc800078e0207 */
        /*0090*/                   IMAD.WIDE R2, R6.reuse, R7.reuse, c[0x0][0x160] ;  /* 0x0000580006027625 */
                                                                                      /* 0x0c0fe400078e0207 */
        /*00a0*/                   LDG.E R4, [R4.64] ;                                /* 0x0000000404047981 */
                                                                                      /* 0x000ea8000c1e1900 */
        /*00b0*/                   LDG.E R3, [R2.64] ;                                /* 0x0000000402037981 */
                                                                                      /* 0x000ea2000c1e1900 */
        /*00c0*/                   IMAD.WIDE R6, R6, R7, c[0x0][0x170] ;              /* 0x00005c0006067625 */
                                                                                      /* 0x000fc800078e0207 */
        /*00d0*/                   FADD R9, R4, R3 ;                                  /* 0x0000000304097221 */
                                                                                      /* 0x004fca0000000000 */
        /*00e0*/                   STG.E [R6.64], R9 ;                                /* 0x0000000906007986 */
                                                                                      /* 0x000fe2000c101904 */
        /*00f0*/                   EXIT ;                                             /* 0x000000000000794d */
                                                                                      /* 0x000fea0003800000 */
        /*0100*/                   BRA 0x100;                                         /* 0xfffffff000007947 */
                                                                                      /* 0x000fc0000383ffff */
        /*0110*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0120*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0130*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0140*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0150*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0160*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0170*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0180*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*0190*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01a0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01b0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01c0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01d0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01e0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
        /*01f0*/                   NOP;                                               /* 0x0000000000007918 */
                                                                                      /* 0x000fc00000000000 */
                ..........


opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$

cuobjdump -sass ...

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -elf vectorAdd.19.sm_80.cubin

64bit elf: type=2, abi=7, sm=80, toolkit=113, flags = 0x500550
Sections:
Index Offset   Size ES Align        Type        Flags Link     Info Name
    1     40     df  0  1            STRTAB       0    0        0 .shstrtab
    2    11f     fc  0  1            STRTAB       0    0        0 .strtab
    3    220     90 18  8            SYMTAB       0    2        4 .symtab
    4    2b0     70  0  1          PROGBITS       0    0        0 .debug_frame
    5    320     30  0  4         CUDA_INFO       0    3        0 .nv.info
    6    350     6c  0  4         CUDA_INFO       0    3        a .nv.info._Z9vectorAddPKfS0_Pfi
    7    3c0     d8  8  8    CUDA_RELOCINFO       0    0        0 .nv.rel.action
    8    498     10 10  8               REL       0    3        4 .rel.debug_frame
    9    4a8    17c  0  4          PROGBITS       2    0        a .nv.constant0._Z9vectorAddPKfS0_Pfi
    a    680    200  0 80          PROGBITS       6    3  c000005 .text._Z9vectorAddPKfS0_Pfi

.section .strtab

.section .shstrtab

.section .symtab
 index           value           size      info    other  shndx    name
   0               0               0        0        0      0     (null)
   1               0               0        3        0      a     .text._Z9vectorAddPKfS0_Pfi
   2               0               0        3        0      9     .nv.constant0._Z9vectorAddPKfS0_Pfi
   3               0               0        3        0      4     .debug_frame
   4               0               0        3        0      7     .nv.rel.action
   5               0             512       12       10      a     _Z9vectorAddPKfS0_Pfi


.nv.constant0._Z9vectorAddPKfS0_Pfi
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000

八、查看kernel中的函数占用的资源

命令:

$ /usr/local/cuda/bin/cuobjdump -res-usage vectorAdd.19.sm_80.cubin

结果:

opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$ /usr/local/cuda/bin/cuobjdump -res-usage vectorAdd.19.sm_80.cubin

Resource usage:
 Common:
  GLOBAL:0
 Function _Z9vectorAddPKfS0_Pfi:
  REG:12 STACK:0 SHARED:0 LOCAL:0 CONSTANT[0]:380 TEXTURE:0 SURFACE:0 SAMPLER:0
opencl@opencl-PC:~/ex/cudasample11.3.0/samples/0_Simple/vectorAddLL$


九、nvprune 分离出特定sm_xx的 .a library库

命令:

cd ~/ex/cuobjdumpEx/libcublasEx
$ /usr/local/cuda/bin/nvprune -arch sm_70 libcublas_static.a -o libcublas_static70.a

结果:

opencl@opencl-PC:~/ex/cuobjdumpEx/libcublasEx$ ls
lelf_libcublas_static_70.a.txt  libcublas.so.11.4.2.10064  libcublas_static70.174.sm_70.cubin  libcublas_static70.a  libcublas_static.a
opencl@opencl-PC:~/ex/cuobjdumpEx/libcublasEx$

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
编译 OpenCV 的 GPU 版本需要先安装 CUDA 和 cuDNN。在安装好这两个软件后,按照以下步骤编译 OpenCV 的 GPU 版本: 1. 下载 OpenCV 源代码,并解压缩到任意目录。 2. 进入 OpenCV 源代码目录,创建一个新的目录 build,并进入该目录。 3. 执行以下命令生成 Makefile 文件: ``` cmake -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D WITH_CUDA=ON \ -D CUDA_ARCH_BIN=7.5 \ -D CUDA_ARCH_PTX=7.5 \ -D OPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules \ ../opencv ``` 解释一下各个参数的含义: - `-D CMAKE_BUILD_TYPE=RELEASE`:编译类型为 Release 版本。 - `-D CMAKE_INSTALL_PREFIX=/usr/local`:指定安装目录为 /usr/local。 - `-D WITH_CUDA=ON`:打开 CUDA 支持。 - `-D CUDA_ARCH_BIN=7.5`:指定编译生成的二进制代码支持的最低 CUDA Compute Capability 版本。 - `-D CUDA_ARCH_PTX=7.5`:指定编译生成PTX 代码支持的最低 CUDA Compute Capability 版本。 - `-D OPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules`:指定额外的模块路径,这里是 opencv_contrib 模块的路径。 - `../opencv`:指定 OpenCV 源代码的路径。 4. 执行以下命令编译: ``` make -j8 ``` `-j8` 表示使用 8 个线程进行编译(具体数字可以根据自己的 CPU 核心数进行调整)。 5. 执行以下命令安装: ``` sudo make install ``` 这将会把编译好的 OpenCV 安装到 /usr/local 目录下。 注意:以上命令假定你已经正确安装了 CUDA 和 cuDNN。如果你没有安装或者安装不正确,那么编译 OpenCV 的 GPU 版本将会失败。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值