OpenCL
juliosun
这个作者很懒,什么都没留下…
展开
-
安装AMD APP SDK 在 Intel 处理器上
想要安装opencl sdk 在机器上, 又没有显卡, 一个选择是用Intel自己的SDK. 但他有对SSE4.1的最低要求, 所以老的处理器不能用。 可以用AMD的sdk, 里面包含了opecl cpu runtime 和 opencl sample. 需要注意的是从AMD APP SDK 2.9 开始, cpu的runtime不再整合。 所以只能用之前的版本。安装2.8.1之后, 会有两个原创 2014-02-25 23:05:24 · 2999 阅读 · 0 评论 -
单精度与双精度浮点数
float是32位,double是64位float32位中,有1位符号位,8位指数位,23位尾数为double64位中,1位符号位,11位指数位,52位尾数位取值范围看指数部分float是有符号型,其中,8位指数位,2^8=(-128—127),因此实际的范围是-2^128—2^127,约为-3.4E38—3.4E38同理double范围约是-1.7E308—1.7E308,楼主可以自己按下计算器转载 2017-01-26 19:24:47 · 1824 阅读 · 0 评论 -
sampler 用法 OpenCL
sampler_tA type used to control how elements of a 2D or 3D image object are read by read_image{f|i|ui}.const sampler_t = DescriptionThe image read fu转载 2016-06-14 23:18:18 · 2950 阅读 · 0 评论 -
NVIDIA Tesla C2075 vs Tesla K10 theoretical performance
Each graphics unit has several vital theoretical parameters that affect real-world game, 3D graphics and compute performance. These are texture fillrate, pixel fillrate, memory bandwidth, along with s转载 2015-10-12 23:35:40 · 1471 阅读 · 0 评论 -
Faster Parallel Reductions on Kepler
Parallel reduction is a common building block for many parallel algorithms. A presentation from 2007 by Mark Harris provided a detailed strategy for implementing parallel reductions on GPUs, but thi转载 2015-10-12 23:34:53 · 753 阅读 · 0 评论 -
OpenCL 学习笔记6 在CPU/GPU平台上的实现
CPU在多核上运行一个单独的work-group会导致cache共享问题。 为缓解这些问题,OpenCL线程轮流运行同一个work-group内的每一个work-item, 当这个work-group内所有work-item都运行完成后, 在运行同一个工作队列中的下一个work-group。 因此,同一个work-group内的线程是没有并行性的, 如果可能的话,多个操作系统线程将允许多个wo转载 2015-03-18 18:10:14 · 1539 阅读 · 0 评论 -
OpenCl 笔记1 Memory Model
Memory modelRegistersEqually to a CPU register file, it is private for each thread and read-/write-able. The amount of registers is limited depending on the occupancy, the kernel complexity and th转载 2015-02-16 16:53:42 · 812 阅读 · 0 评论 -
OpenCL 笔记3 OpenCL和CUDA
The texture arrays is called image object in openCL. This feature is not necessarily supported by all devices that supports OpenCl.转载 2015-02-16 17:08:50 · 500 阅读 · 0 评论 -
OpenCl 笔记2 Optimization
1. Someone once said that if you don't care much about the performance, parallel programming is easy.2. Many of the performance improvements are published, giving the impression that using GPU progr转载 2015-02-16 17:04:13 · 529 阅读 · 0 评论 -
OpenCL 笔记5 reconstruction application
1. If the back-projection and the forward-projection operator are executed within separated kernels, therefore non-provided cache coherency of texture is not a problem.2. The algorithm with a slower转载 2015-02-25 10:36:34 · 554 阅读 · 0 评论 -
OpenCL 笔记4 projector application
1. Ray-driven back-projection is less suitable for parallelization due to possible racing conditions. Using OpenCL these race can be prevented using atomic functions at the cost of losing performance.转载 2015-02-23 11:08:55 · 632 阅读 · 0 评论 -
Opencl image channel_order
The following table describes the mapping of the number of channels of an image element to the appropriate components in the float4, int4 or unsigned int4 vector data type for the color values returne转载 2014-03-07 21:55:22 · 1342 阅读 · 0 评论 -
AMD Radeon HD5870 memory capacity and performance
Memory capacity and performanceAMD is very clear about the memory capacity and performance details in their OpenCL programming guide. The figure below showcases these hardware characteristics of t转载 2014-03-13 00:54:54 · 802 阅读 · 0 评论 -
Opencl how to choose work_group size
In general you can choose global_work_size as big as you want, whilelocal_work_size is constraint by the underlying device/hardware, so all query results will tell you the possible dimensions for lo转载 2014-03-12 22:21:06 · 2647 阅读 · 0 评论 -
Easiest way to test for existence of cuda-capable GPU from cmake?
#include<cuda.h>int main (){ int deviceCount; cudaError_t e = cudaGetDeviceCount(&deviceCount); return e == cudaSuccess ? deviceCount : -1;}转载 2019-01-31 03:30:44 · 183 阅读 · 0 评论