Fundamentals of GPU programming (2)

最新推荐文章于 2024-07-16 18:00:34 发布

你是年少的头秃

最新推荐文章于 2024-07-16 18:00:34 发布

阅读量153

点赞数

文章标签： c++ cuda

本文链接：https://blog.csdn.net/qq_41865032/article/details/122795262

版权

Parallelism and GPU Architecture

CPU is optimized to process a single sequence of instructions. It is extremely fast but there are some walls, such as memory, power, and instruction level parallelism.

Two speedup ways.
Given a process that requires time $T$ , we can use $P$ processors to reduce the processing time to ideally $T / P$ .

task parallelism. Break the problem up into $T > = P$ tasks and pass them off to a process.
data parallelism. Break the input/output data into $D > = P$ subsets and lauch one thread for each piece of data.

Task Prallelism

Assign the first P tasks to a process --> When any processor finishes a task $T_n$ , move to task $T_{P+1}$ --> Repeat until all tasks are completed

This has generally been the primary model for cluster computing and supercomputing.

Data Prallelism

Send the first P threads on different processors --> once any thread $T_n$ completes, lauch another thread --> Repeat until all threads have completed

SIMD --Single instruction multiple data

All cores execute the same instruction and different data can be used.

Guiding principles

CPU is always faster for a serial process and small data. On GPU, every instruction is important. There might be stalls when if statement or loops with viriable numbers of iterations occur.

你是年少的头秃

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Fundamentals of GPU programming (2)

Parallelism and GPU ArchitectureCPU is optimized to process a single sequence of instructions. It is extremely fast but there are some walls, such as memory, power, and instruction level parallelism.Two speedup ways.Given a process that requires time TT
复制链接

扫一扫