分析zkwork_aleo_gpu_worker哪些部分用了GPU加速

分析zkwork_aleo_gpu_worker哪些部分用了GPU加速,涉及

  • cuda-gdb、gdb的使用
  • strip的程序gdb如何查看函数的参数
  • 如何dump一个进程的内存
  • gdb dump内存

一.参考链接

  • https://github.com/HarukaMa/aleo-prover/tree/mainnet
  • https://www.aleocn.net/211538.html
  • https://github.com/AleoNet/snarkVM/blob/mainnet-staging/ledger/puzzle/epoch/docs/spec.md
  • https://developer.aleo.org/aleo/opcodes/
  • https://medium.com/@VitalikButerin/quadratic-arithmetic-programs-from-zero-to-hero-f6d558cea649

二.相关知识

1.X86函数参数传递使用的寄存器

rdi	传递第一个参数
rsi	传递第二个参数
rdx	传递第三个参数或者第二个返回值
rcx	传递第四个参数
r8	传递第五个参数
r9	传递第六个参数
rax	临时寄存器或者第一个返回值
rsp	sp寄存器
rbp	栈帧寄存器

2.nvrtcCreateProgram的参数

nvrtcCreateProgram(&prog,         // prog
				saxpy,         // buffer
				"saxpy.cu",    // name
				0,             // numHeaders
				NULL,          // headers
				NULL));        // includeNames

三.准备运行环境,并测试

wget https://github.com/6block/zkwork_aleo_gpu_worker/releases/download/v0.1.1/aleo_prover-v0.1.1.tar.gz
tar -xf aleo_prover-v0.1.1.tar.gz
./aleo_prover --address aleo135fqyh9dfqxxvrxkhfhlhkmnudpetygjlcel8jl7q08c9hjjksys0hgcrk --pool aleo.hk.zk.work:10003
  • 输出
2024-09-12T19:08:02.582163  INFO Worker start. version(0.1.1) commit hash(d461a28)
2024-09-12T19:08:02.588771  INFO aleo_prover running on Aleo Mainnet
2024-09-12T19:08:02.830272  INFO Connected to pool, my worker id: Some(18167), aleo135fqyh9dfqxxvrxkhfhlhkmnudpetygjlcel8jl7q08c9hjjksys0hgcrk
2024-09-12T19:08:02.909575  INFO Nofify from Pool Server, job_id: 61 target: 40000000
2024-09-12T19:09:16.220635  INFO Kernel is ready for new job: 61

2024-09-12T19:10:29.514919  INFO Found a solution solution1gsnjj45fzh2cspylhgm target 57991833

+-------------------------------------------------------------------------------------------+
| 2024-09-12T19:11:16                                                                       |
|                                                                                           |
| gpu[0]: (1m - 272380    5m - N/A       15m - N/A       30m - N/A       60m - N/A     )    |
| gpu[*]: (1m - 272380    5m - N/A       15m - N/A       30m - N/A       60m - N/A     )    |
|                                                                                           |
+-------------------------------------------------------------------------------------------+
  • GPU利用率
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:03:00.0 Off |                  N/A |
|  0%   53C    P2             158W / 170W |  11803MiB / 12288MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    945348      C   ./aleo_prover                             11798MiB |
+---------------------------------------------------------------------------------------+

四.调试过程

1.查看执行了哪些cuda kernel

/usr/local/cuda/bin/cuda-gdb --args ./aleo_prover --address aleo135fqyh9dfqxxvrxkhfhlhkmnudpetygjlcel8jl7q08c9hjjksys0hgcrk --pool aleo.hk.zk.work:10003
set cuda break_on_launch application #停在cuda Kernel入口
r #运行
c #继续
c #继续
  • 输出
[New Thread 0x7fff50b6d000 (LWP 947586)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x00007ffd87600000 in generate_leaves_dbl_addr<<<(28,1,1),(512,1,1)>>> ()
(cuda-gdb) c
Continuing.
[Switching focus to CUDA kernel 1, grid 2, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 1, lane 0]
0x00007ffd877bac00 in find_proof_dbl_addr<<<(28,1,1),(384,1,1)>>> ()
(cuda-gdb) c
Continuing.
[New Thread 0x7fff385fb000 (LWP 947588)]
[Thread 0x7fff385fb000 (LWP 947588) exited]
[Switching focus to CUDA kernel 2, grid 3, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 2, lane 0]
0x00007ffd87600000 in generate_leaves_dbl_addr<<<(28,1,1),(512,1,1)>>> ()
(cuda-gdb) c
Continuing.
[Switching focus to CUDA kernel 3, grid 4, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]
0x00007ffd877bac00 in find_proof_dbl_addr<<<(28,1,1),(384,1,1)>>> ()
(cuda-gdb) c
Continuing.
[New Thread 0x7fff385fb000 (LWP 947589)]
[Thread 0x7fff385fb000 (LWP 947589) exited]
[Switching focus to CUDA kernel 4, grid 5, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x00007ffd87600000 in generate_leaves_dbl_addr<<<(28,1,1),(512,1,1)>>> ()
(cuda-gdb) c
Continuing.
[Switching focus to CUDA kernel 5, grid 6, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 1, lane 0]
0x00007ffd877bac00 in find_proof_dbl_addr<<<(28,1,1),(384,1,1)>>> ()

2.查看cuLaunchKernel调用栈

gdb --args ./aleo_prover --address aleo135fqyh9dfqxxvrxkhfhlhkmnudpetygjlcel8jl7q08c9hjjksys0hgcrk --pool aleo.hk.zk.work:10003
b cuLaunchKernel
r
  • 输出
Thread 59 "aleo_prover" hit Breakpoint 1, 0x00007ffff6670910 in cuLaunchKernel () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) bt
#0  0x00007ffff6670910 in cuLaunchKernel () from /lib/x86_64-linux-gnu/libcuda.so.1
#1  0x0000555556222e29 in AleoProver::FindProof(unsigned int&, unsigned long*, unsigned long*, unsigned int*, unsigned int, unsigned char const*, unsigned char const*, unsigned char const*, double, unsigned long, unsigned int, unsigned long) ()
#2  0x0000555556221943 in find_proof_dbl_addr ()
#3  0x0000555555ed78ef in aleo_cuda_proof::proof::AleoProver::find_proof_dbl_addr ()
#4  0x0000555555b0e661 in snarkvm_ledger_puzzle_epoch::synthesis::GpuPuzzle<N>::find_proof_target_dbl_addr ()
#5  0x0000555555f17094 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#6  0x0000555555b5ec8b in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#7  0x00005555561c1c3b in std::sys::pal::unix::thread::Thread::new::thread_start ()
#8  0x00007ffff611f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9  0x00007ffff5eed353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) c
Continuing.
[New Thread 0x7fff6035a000 (LWP 947865)]
[Thread 0x7fff6035a000 (LWP 947865) exited]

Thread 59 "aleo_prover" hit Breakpoint 1, 0x00007ffff6670910 in cuLaunchKernel () from /lib/x86_64-linux-gnu/libcuda.so.1
(gdb) bt
#0  0x00007ffff6670910 in cuLaunchKernel () from /lib/x86_64-linux-gnu/libcuda.so.1
#1  0x0000555556222d78 in AleoProver::FindProof(unsigned int&, unsigned long*, unsigned long*, unsigned int*, unsigned int, unsigned char const*, unsigned char const*, unsigned char const*, double, unsigned long, unsigned int, unsigned long) ()
#2  0x0000555556221943 in find_proof_dbl_addr ()
#3  0x0000555555ed78ef in aleo_cuda_proof::proof::AleoProver::find_proof_dbl_addr ()
#4  0x0000555555b0e661 in snarkvm_ledger_puzzle_epoch::synthesis::GpuPuzzle<N>::find_proof_target_dbl_addr ()
#5  0x0000555555f17094 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#6  0x0000555555b5ec8b in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#7  0x00005555561c1c3b in std::sys::pal::unix::thread::Thread::new::thread_start ()
#8  0x00007ffff611f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9  0x00007ffff5eed353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

3.gdb调试

A.gdb运行程序,在nvrtcCreateProgram下断点

gdb --args ./aleo_prover --address aleo135fqyh9dfqxxvrxkhfhlhkmnudpetygjlcel8jl7q08c9hjjksys0hgcrk --pool aleo.hk.zk.work:10003
b nvrtcCreateProgram
r
i registers
x/4s $rdx
x/32s $rsi
dump binary memory ./aleo_prover_reverse.bin $rsi $rsi+0x200000
strings aleo_prover_reverse.bin > aleo_prover_reverse.cu #导出字符串(非gdb环境下)
  • 输出
(gdb) x/4s $rdx
0x7ffdb809f6d0: "gen_leaves_dbl_addr_kernel_86"

(gdb) i registers
rax            0x7fff48ff14b0      140734418064560
rbx            0x7ffdb809fb30      140727691115312
rcx            0x2                 2
rdx            0x7ffdb809f6d0      140727691114192
rsi            0x7ffdb80eefa0      140727691440032
rdi            0x7fff48ff0a20      140734418061856
rbp            0x55555a15b0d8      0x55555a15b0d8 <guard variable for jitify2::nvrtc()::lib>
rsp            0x7fff48ff09b8      0x7fff48ff09b8
r8             0x7ffdb80506a0      140727690790560
r9             0x7ffdb809fbc0      140727691115456
r10            0x7ffdb80008d0      140727690463440
r11            0x7ffdb8000080      140727690461312
r12            0x7fff48ff0a20      140734418061856
r13            0x7ffdb809fb30      140727691115312
r14            0x7fff48ff0a70      140734418061936
r15            0x2                 2
rip            0x5555563129f0      0x5555563129f0 <nvrtcCreateProgram>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) x/32s $rsi
0x7ffdb80eefa0: "\n#ifndef ALEO_PROOF_PROGRAM_GEN_LEAVES_COMMON_CUH\n#define ALEO_PROOF_PROGRAM_GEN_LEAVES_COMMON_CUH\n\n\n#ifndef ALEO_PROOF_SRC_PROGRAM_TYPE_CUH\n#define ALEO_PROOF_SRC_PROGRAM_TYPE_CUH\n\n#include <cassert>"...
0x7ffdb80ef068: "\n#include <cstdint>\n#include <type_traits>\n\ntypedef __int128 int128_t;\ntypedef unsigned __int128 uint128_t;\n\n__host__ __device__ constexpr uint128_t create_uint128(uint32_t le0,\n", ' ' <repeats 22 times>...
0x7ffdb80ef130: ' ' <repeats 33 times>, "uint32_t le1,\n", ' ' <repeats 55 times>, "uint32_t le2,\n", ' ' <repeats 55 times>, "uint32_t le3) {\n    return (u"...
0x7ffdb80ef1f8: "int128_t)le0 | (uint128_t)le1 << 32 | (uint128_t)le2 << 64 |\n", ' ' <repeats 11 times>, "(uint128_t)le3 << 96;\n}\n\n__host__ __device__ constexpr int128_t create_int128(uint32_t le0, uint32_t le1,\n", ' ' <repeats 22 times>...
0x7ffdb80ef2c0: ' ' <repeats 31 times>, "uint32_t le2,\n", ' ' <repeats 53 times>, "uint32_t le3) {\n    return (int128_t)create_uint128(le0, le1, le2, le3);\n}\n\n__host__ __device__ conste"...
0x7ffdb80ef388: "xpr uint128_t create_uint128(uint64_t le0,\n", ' ' <repeats 55 times>, "uint64_t le1) {\n    return (uint128_t)le0 | (uint128_t)le1 << 64;\n}\n\n__host__ __device__ constexpr int"...
0x7ffdb80ef450: "128_t create_int128(uint64_t le0,\n", ' ' <repeats 53 times>, "uint64_t le1) {\n    return (int128_t)create_uint128(le0, le1);\n}\n\ntemplate <>\nstruct std::make_unsigned<uint128_t"...
0x7ffdb80ef518: "> {\n    typedef uint128_t type;\n};\n\ntemplate <>\nstruct std::make_unsigned<int128_t> {\n    typedef uint128_t type;\n};\n\ntemplate <>\nstruct std::make_signed<uint128_t> {\n    typedef int128_t type;\n};\n\nte"...
0x7ffdb80ef5e0: "mplate <>\nstruct std::make_signed<int128_t> {\n    typedef int128_t type;\n};\n\ntemplate <>\nstruct std::is_signed<uint128_t> : std::false_type {};\n\ntemplate <>\nstruct std::is_signed<int128_t> : std::true"...
0x7ffdb80ef6a8: "_type {};\n\ntemplate <>\nstruct std::is_unsigned<uint128_t> : std::true_type {};\n\ntemplate <>\nstruct std::is_unsigned<int128_t> : std::false_type {};\n\ntemplate <>\nstruct std::is_integral<uint128_t> : st"...
0x7ffdb80ef770: "d::true_type {};\n\ntemplate <>\nstruct std::is_integral<int128_t> : std::true_type {};\n\n#endif\n\n\n#ifndef ALEO_PROOF_PROGRAM_SOLUTION_ID_HASH_INPUT_PREFIX_H\n#define ALEO_PROOF_PROGRAM_SOLUTION_ID_HASH_IN"...
0x7ffdb80ef838: "PUT_PREFIX_H\n\n#include <cstdint>\nstruct SolutionIdHashInputPrefix {\n    uint32_t data[10];\n};\n\n#endif\n\n\n#ifndef PROGRAM_HASH_SHA2_256_CUH\n#define PROGRAM_HASH_SHA2_256_CUH\n\n\n\nclass Sha2With256 {\npubli"...
0x7ffdb80ef900: "c:\n    constexpr static uint32_t OUTPUT_NUM_BYTES = 32u;\n\n    template <uint32_t U32_LEN,\n", ' ' <repeats 14 times>, "typename = typename std::enable_if_t<U32_LEN >= 1 && U32_LEN<14>>\n    __host__ __device__ static"...
0x7ffdb80ef9c8: " void DoubleHashForOneBlock(\n", ' ' <repeats 26 times>, "const uint32_t data[U32_LEN],\n", ' ' <repeats 26 times>, "uint32_t hash[OUTPUT_NUM_BYTES / sizeof(uint32_t)]) {\n\n        uint32_t tmp[OUTPUT_NUM_BY"...
0x7ffdb80efa90: "TES / sizeof(uint32_t)];\n        HashForOneBlock<U32_LEN>(data, tmp);\n        HashForOneBlock<OUTPUT_NUM_BYTES / sizeof(uint32_t)>(tmp, hash);\n    }\n\n    template <uint32_t U32_LEN,\n", ' ' <repeats 14 times>, "type"...
0x7ffdb80efb58: "name = typename std::enable_if_t<U32_LEN >= 1 && U32_LEN<14>>\n    __host__ __device__ static void HashForOneBlock(\n", ' ' <repeats 26 times>, "const uint32_t data[U32_LEN],\n", ' ' <repeats 26 times>, "uin"...

既然内存中有cuda c源码,那就把该进程所有的内存全dump出来看看

B.获取aleo_prover的内存空间范围

cat /proc/`pidof aleo_prover`/maps | grep "00000000 00:00 0" | awk '{print $1}' > range.txt

C.生成gdb dump binary memory 命令

tee p.py<<-'EOF'
with open("range.txt","r") as f:
    idx=0
    for line in f.readlines():
        vals=line.strip().split("-")
        cmd=f"dump binary memory ./dump_{idx}.bin 0x{vals[0]} 0x{vals[1]}"
        print(cmd)
        idx+=1
EOF
python p.py
  • 输出
dump binary memory ./dump_0.bin 0x200000000 0x300200000
dump binary memory ./dump_1.bin 0x55555a15b000 0x55555a983000
dump binary memory ./dump_2.bin 0x7ffdb8000000 0x7ffdb8a6b000
省略

D.拷贝上面的命令到gdb中执行,将内存dump到文件

D.按顺序合并所有的dump_*.bin,导出可见字符,查看是否包含上面的Kernel

strings `ls dump*.bin  | sort -t _ -k 2 -n` > all.txt
cat all.txt | egrep -A 16 -w "generate_leaves_dbl_addr|find_proof_dbl_addr"
  • 输出
extern "C" __global__ void generate_leaves_dbl_addr(uint32_t* bool_array,
                                                    uint32_t* field_array,
                                                    uint32_t* inverse_buf,
                                                    uint32_t* inverse_ids,
                                                    SolutionIdHashInputPrefix prefix1,
                                                    SolutionIdHashInputPrefix prefix2,
                                                    uint64_t counter_start,
                                                    uint32_t num1,
                                                    uint32_t num2) {
    uint32_t tid = blockDim.x * blockIdx.x + threadIdx.x;
    uint32_t tnum = blockDim.x * gridDim.x;
    for (uint32_t i = tid; i < num1 + num2; i += tnum) {
        uint64_t cur_counter = counter_start + i;
        const SolutionIdHashInputPrefix *solution_id_hash_input_prefix_ptr;
        if (i >= num1) {
            solution_id_hash_input_prefix_ptr = &prefix2;
        } else {
--
        // .globl       find_proof_dbl_addr
.visible .entry find_proof_dbl_addr(
        .param .u64 find_proof_dbl_addr_param_0,
        .param .u64 find_proof_dbl_addr_param_1,
        .param .u64 find_proof_dbl_addr_param_2,
        .param .u64 find_proof_dbl_addr_param_3,
        .param .u64 find_proof_dbl_addr_param_4,
        .param .u32 find_proof_dbl_addr_param_5,
        .param .u32 find_proof_dbl_addr_param_6,
        .param .u64 find_proof_dbl_addr_param_7,
        .param .u32 find_proof_dbl_addr_param_8,
        .param .u32 find_proof_dbl_addr_param_9,
        .param .u64 find_proof_dbl_addr_param_10
        .local .align 4 .b8     __local_depot1[2252];
        .reg .b64       %SP;
        .reg .b64       %SPL;
        .reg .pred      %p<200>;
        .reg .b32       %r<6003>;

generate_leaves_dbl_addr是cuda源码
find_proof_dbl_addr是ptx

  • 17
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hi20240217

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值