Very Long Instruction Word

最新推荐文章于 2024-08-26 09:52:58 发布

EverNoob

最新推荐文章于 2024-08-26 09:52:58 发布

阅读量171

点赞数

分类专栏： Computer_Architecture System_Design Notes 文章标签：硬件架构

原文链接：https://en.wikipedia.org/wiki/Very_long_instruction_word

版权

Notes 同时被 3 个专栏收录

140 篇文章 0 订阅

订阅专栏

System_Design

27 篇文章 1 订阅

订阅专栏

Computer_Architecture

19 篇文章 0 订阅

订阅专栏

from: https://en.wikipedia.org/wiki/Very_long_instruction_word

The traditional means to improve performance in processors include dividing instructions into substeps so the instructions can be executed partly at the same time (termed pipelining), dispatching individual instructions to be executed independently, in different parts of the processor (superscalar architectures), and even executing instructions in an order different from the program (out-of-order execution). These methods all complicate hardware (larger circuits, higher cost and energy use) because the processor must make all of the decisions internally for these methods to work. In contrast, the VLIW method depends on the programs providing all the decisions regarding which instructions to execute simultaneously and how to resolve conflicts. As a practical matter, this means that the compiler (software used to create the final programs) becomes far more complex, but the hardware is simpler than in many other means of parallelism.

==> the compiler will take longer to develope, more error prone and specific to VLIW structures, which is the sacrifice for more efficient hardware structure.

In superscalar designs, the number of execution units is invisible to the instruction set. Each instruction encodes one operation only. For most superscalar designs, the instruction width is 32 bits or fewer.

In contrast, one VLIW instruction encodes multiple operations, at least one operation for each execution unit of a device. For example, if a VLIW device has five execution units, then a VLIW instruction for the device has five operation fields, each field specifying what operation should be done on that corresponding execution unit. To accommodate these operation fields, VLIW instructions are usually at least 64 bits wide, and far wider on some architectures.

For example, the following is an instruction for the Super Harvard Architecture Single-Chip Computer (SHARC). In one cycle, it does a floating-point multiply, a floating-point add, and two autoincrement loads. All of this fits in one 48-bit instruction:

f12 = f0 * f4, f8 = f8 + f12, f0 = dm(i0, m3), f4 = pm(i8, m9);

Since the earliest days of computer architecture,[1] some CPUs have added several arithmetic logic units (ALUs) to run in parallel. Superscalar CPUs use hardware to decide which operations can run in parallel at runtime, while VLIW CPUs use software (the compiler) to decide which operations can run in parallel in advance. Because the complexity of instruction scheduling is moved into the compiler, complexity of hardware can be reduced substantially.[clarification needed]

==> complexity at hardware level enables abstraction of hardware to the software and thus can greatly enhance compatiblity with multiple software environments/frameworks fostering a better application eco-system for the hardware.

A similar problem occurs when the result of a parallelizable instruction is used as input for a branch. Most modern CPUs guess which branch will be taken even before the calculation is complete, so that they can load the instructions for the branch, or (in some architectures) even start to compute them speculatively. If the CPU guesses wrong, all of these instructions and their context need to be flushed and the correct ones loaded, which takes time.

This has led to increasingly complex instruction-dispatch logic that attempts to guess correctly, and the simplicity of the original reduced instruction set computing (RISC) designs has been eroded. VLIW lacks this logic, and thus lacks its energy use, possible design defects, and other negative aspects.

In a VLIW, the compiler uses heuristics or profile information to guess the direction of a branch. This allows it to move and preschedule operations speculatively before the branch is taken, favoring the most likely path it expects through the branch. If the branch takes an unexpected way, the compiler has already generated compensating code to discard speculative results to preserve program semantics.

==> profiling is static, cheap and not necessarily yielding good results; with the application of JIT compiler, the prediction can be dynamic as hardware level preditions.

Vector processor (single instruction, multiple data (SIMD)) cores can be combined with the VLIW architecture such as in the Fujitsu FR-V microprocessor, further increasing throughput and speed.

other references

Very Long Instruction Word (VLIW) Architecture - GeeksforGeeks

Advantages :

Reduces hardware complexity.
Reduces power consumption because of reduction of hardware complexity.
Since compiler takes care of data dependency check, decoding, instruction issues, it becomes a lot simpler.
Increases potential clock rate.
Functional units are positioned corresponding to the instruction pocket by compiler.

Disadvantages :

Complex compilers are required which are hard to design.
Increased program code size.
Larger memory bandwidth and register-file bandwidth.
Unscheduled events, for example a cache miss could lead to a stall which will stall the entire processor.
In case of un-filled opcodes in a VLIW, there is waste of memory space and instruction bandwidth.

What is the VLIW Architecture?

EverNoob

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Very Long Instruction Word

from:https://en.wikipedia.org/wiki/Very_long_instruction_wordThe traditional means to improve performance in processors include dividing instructions into substeps so the instructions can be executed partly at the same time (termedpipelining), dispatch..
复制链接

扫一扫

专栏目录