WuYuFffan-CSDN博客

原创 Evolutional based RL algorithm

Evolutional based RL algorithm1. What is GA.2. What is ES, and the different between ES and GA.3. The difference between ES and PG, and the benefits of ES over PG.4. A simple introduction to gym test suit.5. Using ES based RL algorithm to solve some simple

2020-09-20 17:18:35 235

原创 8. reduction sumation

8. reduction sumationGPU版累加：相邻两个元素相加 (ruduction)pesedu code:for(int offset=1; offset < blockdim.x; offset *=2){ if(tid%(2 * offset) == 0) { input[tid] += input[tid + offset]; }}第一代中,offset为1，间隔为2：input[0] += input[0 + 1];input[2] += in

2020-09-19 21:39:34 163

原创 7. warp divergence

7. warp divergence因为CUDA是SIMD架构，所以当一个cuda核执行选择分支时，其他非该分支的核会强制进入等待状态。int tid = threadIdx.x;if (tid % 2 == 0){ //do something}else{ //do something else}tid为奇数的设备执行if时，tid为偶数的设备拥塞。反之亦然。可见如果同一个warp中的Thread有很多分支,会导致warp divergence,这会严重降低程序的运

2020-09-18 21:22:01 392

原创 6. cuda warp

7. cuda warp在cuda中，线程块在单流多处理器上运行。当设备内存足够时，多个block可以在同一个sm上运行。SIMT(Single instruction multiple threads):一个指令多个线程执行(cuda的本质)一个线程块不能再多个SM中执行。当一个SM中不能跑一个block的时候,（共享内存溢出时）, 内核发射失败，函数将返回 cudaSucess以外的值。程序结构对应的硬件结构:为什么要有warp?理论上线程并行和实际上的并行

2020-09-18 19:33:53 462

原创 5. Device property查询

5. Device property查询在cuda编程中,要想编写出适合不同计算能力的并行程序,属性查询是必学的一部分。下表给出了cudaruntime.h中的动态查询属性:PropertyExplanationnamedescreptionMajor/minor计算能力 5.2 -> 5/2totalGlobalMem总全局内存的大小maxThreadsPerBlock每个block的最大线程数maxThreadsDim[3]block

2020-09-18 15:21:13 188

原创 4. 给CUDA程序计时

4. 给CUDA程序计时通过做差的方法来实现clock start = clock() Work loadclock end = clock()difference = end - starttime = (difference / clocks_per_sec)注意：要根据实际程序的运行时间除以合理的数字给cpu计时: //summation in CPU clock_t cpu_start, cpu_end; cpu_start = clock(); sum_array

2020-09-18 14:43:52 153

原创 3.cuda 异常捕获

3.cuda 异常捕获Error分类:Compile time errors:编译出错,在visual studio中代码一打错编译器就会提示这种错误。Run time Error:在一般的c++编程中,可以用 exception handling来抛出异常,并且用try 来捕获。Error handling in CUDAcudaError cuda_function(…)return value:cudaSuccess if the kennel was launched s

2020-09-18 14:43:40 781

原创 2. CUDA实例: 两个数组的相加

2. CUDA实例: 两个数组的相加#mermaid-svg-1LkSMkqK3VtEsCqd .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-1LkSMkqK3VtEsCqd .label text{fill:#333}#mermaid-svg-1LkSMkqK3VtEsCqd .node rect,#mer

2020-09-18 14:43:27 283

WuYuFffan的博客

原创 Evolutional based RL algorithm

原创 8. reduction sumation

原创 7. warp divergence

原创 6. cuda warp

原创 5. Device property查询

原创 4. 给CUDA程序计时

原创 3.cuda 异常捕获

原创 2. CUDA实例: 两个数组的相加

原创 1. CUDA内存传输

原创 FPGA实验三: 编译码器

原创 FPGA实验二：计数器

原创 FPGA实验一: 各种门的验证

空空如也

空空如也