編程優化
u013606170
这个作者很懒,什么都没留下…
展开
-
Use the CPU Vectorization Instructions to Improve the Performance :: 2D Convolution Example
※The full code of this post could be downloaded inhere.The x86 hasStreaming SIMD Extensions(SSE)instruction sets. By processing 4 data items in parallel, theoretically SSE can improve performance...原创 2019-01-25 01:39:25 · 388 阅读 · 0 评论 -
Use SSE/AVX to Improve 2D Convolution performance for the Specified Kernel Size
※The full code of this post could be downloaded inhere. This post is the continuationofmy previous post, which demonstrates how to use the SSE/AVX instructions to improve the performa...原创 2019-02-02 02:54:30 · 327 阅读 · 0 评论 -
Tiling Separable Convolution in CUDA
※ The code of this post is available in here.When we are doing image processing on a GPU, one of the key idea is 'Tiling'. Tiling cuts the image into small rectangles, and tackle those rectangles on...原创 2019-02-23 08:51:21 · 630 阅读 · 0 评论 -
Use NVIDIA® CUDA® to Improve the Performance of Computing Separable Convolution
※The code corresponding to this blog inhere.This post is focused on the generic separate convolution. If you are interested in how to optimize for a specified size, you ※could readmy next post....原创 2019-02-19 17:19:25 · 255 阅读 · 0 评论 -
Exploit SSE/AVX Instructions to Speed up the Computation of Separable Convolution
※The full code of this post is available inhere. In my previous posts,this oneandthis one, I discussed various SSE/AVX optimization for efficient general 2D convolution. In this post we ...原创 2019-03-10 20:51:45 · 353 阅读 · 0 评论 -
Optimize Separable Convolution for the Specified Size on NVIDIA® CUDA®
※The code corresponding to this blog inhere. Continue thispost, in here I will show how to optimize the CUDA code of separable convolution for a specified size. Due to the size is specified, t...原创 2019-03-16 13:38:30 · 177 阅读 · 0 评论