CUDA
scgillian
这个作者很懒,什么都没留下…
展开
-
第一个cuda程序
随便写个纪念下。新建一个工程,右键单击该工程Custom Build Rules里面选择Cuda Runtime API Build Rule(v4.1)。另外点击该项目的右键,属性的linker->input右侧有Additional Dependencies添加cudart.lib cutil32D.lib。添加新项目比如cudatest.cpp,重命名为cudatest.cu原创 2012-02-28 13:43:43 · 4292 阅读 · 0 评论 -
第二个CUDA程序
CUDA C extends C by allowing the programmer to define C functions, called kernels, that, when called, are executed N times in parallel by N different CUDA threads, as opposed to only once like regular原创 2012-02-28 16:14:58 · 1282 阅读 · 0 评论 -
CUDA并行简单加法程序
#include#define N 7__global__ void add(int *a,int *b,int *c){ int tid=blockIdx.x; if(tid<N) c[tid]=a[tid]+b[tid];}int main(){ int arr1[N],arr2[N]; int sum[N]; for(int i=原创 2012-02-29 10:28:19 · 3325 阅读 · 0 评论 -
CUDA计算向量内积的程序(源自CUDA范例编程)
__syncthreads() acts as a barrier at which all threads in the block must wait before any is allowed to proceed.//计算向量的内积程序#include#define imin(a,b) (a<b?a:b)//N为输入的向量的规模const int N=33*1024;c转载 2012-03-07 12:55:14 · 3585 阅读 · 0 评论 -
CUDA范例编程中的shaed memory bitmap
glut32.lib放到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\lib\Win32(根据安装的目录)目录下,并把glut32.dll放到C:\Windows\System32下面,这个可以通过去网上下载一个压缩包cuda by example【完整版】(cuda_by_example.zip).cpu_bitmap的头文件也在原创 2012-03-07 18:09:33 · 2439 阅读 · 0 评论 -
CUDA shared memory
原文来自CUDA C programming guideshared memory在片上,因此比local memory与global memory快得多。 To achieve high bandwidth, shared memory is divided into equally-sized memory modules, called banks, which can be a翻译 2012-03-23 11:24:55 · 3788 阅读 · 0 评论 -
CUDA学习笔记
CUDA中:CPU和系统内存当作host,GPU与显存当作device __global__ 限定词通知编译器这个函数应该被编译在device上运行而不是hostCUDA C需要语言的方法来标记函数为device code(CUDA C needed a linguistic method for marking a functionas device code)。原创 2012-03-23 15:16:24 · 835 阅读 · 0 评论 -
CUDA范例精解第6章
知识点:constant memory ,cuda event的使用 #include "C://Users//XX//Desktop//CUDA//common//cpu_bitmap.h"#include#define DIM 1024#define INF 2e10f#define rnd(x) (x*rand()/RAND_MAX)//定义球体个数为20#转载 2012-03-09 12:33:33 · 1348 阅读 · 0 评论