@CUDA/MPI
GeoAnt
这个作者很懒,什么都没留下…
展开
-
OMP笔记(1)
Difference between static and dynamic schedule in openMP in C翻译 2014-06-01 10:58:00 · 1206 阅读 · 0 评论 -
关于何时使用cudaDeviceSynchronize
When to call cudaDeviceSynchronizewhy do we need cudaDeviceSynchronize(); in kernels with device-printf?Although CUDA kernel launches are asynchronous, all GPU-related tasks placed in on原创 2014-02-25 11:29:42 · 9936 阅读 · 0 评论 -
cudaFuncGetAttributes 模板template
cudaFuncGetAttributes return unexpected resultHow to pass the address of a template kernel function to a CUDA function?Directly casting from pointer to a template function?cudaFunc原创 2014-02-25 10:36:21 · 1061 阅读 · 0 评论 -
gotoBlas 与 lapack
GotoBlas2库Trouble compiling GotoBLAS2 on newer CPUTips for building GotoBLAS and LAPACK用 BLAS/LAPACK 编写矩阵运算程序LAPACK(3)——线性方程组求解Numerical Linear Algebra Packages on Linu原创 2013-08-15 12:17:09 · 755 阅读 · 0 评论 -
cuda计算经典问题搜集
What is the canonical way to check for errors using the CUDA runtime API?原创 2013-08-15 13:07:53 · 500 阅读 · 0 评论 -
OpenMP教程
http://124.202.164.12/download/38387800/53858482/3/pdf/220/50/1362281275356_306/omp-hands-on-SC08.pdf注:openmp reduction 效率很差(2核测试)对于双核的虚拟4核机器,如果开4个线程,效率差,以积分计算为例,不如串行计算但当只开2个线程时,速度差不多是串行的两倍,所以还是原创 2013-08-09 20:39:47 · 1789 阅读 · 0 评论 -
Cache 优化(矩阵乘积为例)
Degrees of LatencyThe latency of data access becomes greater with each cache level. Latency of memory access is best measured in CPU clock cycles. One cycle occupies from 4 to 6 nanoseconds, dep原创 2013-08-21 03:47:39 · 3831 阅读 · 0 评论 -
Mixed MPI-OpenMP programming
http://www.uio.no/studier/emner/matnat/ifi/INF3380/v10/undervisningsmateriale/inf3380-week14.pdf转载 2013-07-02 14:54:32 · 440 阅读 · 0 评论 -
error: for statement expected before ‘{’ token
https://computing.llnl.gov/tutorials/openMP/samples/C/omp_bug1fix.c原创 2014-06-01 10:58:45 · 4779 阅读 · 0 评论