Super Scaler-- From Wiki

最新推荐文章于 2020-10-23 22:18:54 发布

caveman1984

最新推荐文章于 2020-10-23 22:18:54 发布

阅读量422

点赞数

分类专栏： GPU 处理器相关文章标签： dependencies parallel performance resources stream delay

本文链接：https://blog.csdn.net/caveman1984/article/details/5339230

版权

GPU 处理器相关专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Available performance improvement from superscalar techniques is limited by two key areas:

The degree of intrinsic parallelism in the instruction stream, i.e. limited amount of instruction-level parallelism, and
The complexity and time cost of the dispatcher and associated dependency checking logic.

(基本类似于CoIssue技术同样的需要检查指令之间的dependency 以及受限于指令流的状况)

Existing binary executable programs have varying degrees of intrinsic parallelism. In some cases instructions are not dependent on each other and can be executed simultaneously. In other cases they are inter-dependent: one instruction impacts either resources or results of the other. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; b = e + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units.

When the number of simultaneously issued instructions increases, the cost of dependency checking increases extremely rapidly. This is exacerbated by the need to check dependencies at run time and at the CPU's clock rate. This cost includes additional logic gates required to implement the checks, and time delays through those gates. Research shows the gate cost in some cases may be $n k$ gates, and the delay cost $k 2 log n$ , where $n$ is the number of instructions in the processor's instruction set, and $k$ is the number of simultaneously dispatched instructions. In mathematics, this is called a combinatoric problem involving permutations.