Exceptions and Interrupts
exceptions(异常): An unscheduled event that disrupts program execution; used to detect undefined instructions.
interrupts(中断): An exception that comes from outside
of the processor.
Two methods to communicate the reason for an exception:
- Include Supervisor Exception Cause Register(SCAUSE)(used in RISCV), records the highest priority exception in a clock cycle if more than one exception occurs.
- Vectored interrupts: Vectored interrupts are achieved by assigning each interrupting device a unique code, typically four to eight bits in length. When a device interrupts, it sends its unique code over the data bus to the processor, telling the processor which interrupt service routine to execute.
supervisor exception program counter (SEPC) is used to save the address of the offending instruction.
Many RISC-V computers store the exception entry address in a special register named Supervisor Trap Vector(STVEC), which the OS can load with a value of its choosing.
Ask:
The difference between imprecise exceptions and precise exceptions?
Parallelism via Instructions
ILP: Instruction-Level Parallelism
multiple issue A scheme whereby multiple instructions are launched
in one clock cycle.
- Today’s high-end microprocessors attempt to issue from three to six instructions in every clock cycle. Even moderate designs will aim at a peak IPC of 2.
static multiple issue: An approach to implementing a multiple-issue processor where many decisions are made by the compiler before execution.
dynamic multiple issue: An approach to implementing a multiple-issue processor where many decisions are made during execution by the processor.
To finding and exploiting for ILP:
1. speculation
speculation: An approach whereby the compiler or processor guesses the outcome of an instruction to remove it as a dependence in executing other instructions.
The recovery mechanisms for speculation:
- In the case of speculation in software, the compiler usually inserts additional instructions that check the accuracy of the speculation and provide a fix-up routine to use when the speculation is wrong.
- In hardware speculation, the processor usually buffers the speculative results until it knows they are no longer speculative. If the speculation is correct, the instructions are completed by allowing the contents of the buffers to be written to the registers or memory. If the speculation is incorrect, the hardware flushes the buffers and re-executes the correct instruction sequence.
Issue packet: A set of instructions issued in a given clock cycle.
Very long Instruction Word(VLIW): A style of instruction set architecture that launches many operations that are defined to be independent in a single-wide instruction, typically with many separate opcode fields.
2. loop unrolling
Loop unrolling: A technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together.
Dependences are a property of programs, presence of dependence indicates potential for a hazard.
Data dependencies sets an upper bound on how much parallelism can possibly be exploited.
Difference between dependencies and anti-dependencies.
True data dependence:
Read After Write(RAW) hazard
⇑
\Uparrow
⇑, cannot execute simultaneously.
Anti-dependence: :
1.
2.
Instructions involved in a name dependence can execute simultaneously if name used in instructions is changed so instructions do not conflict:
3. Superscalar Processor
Superscalar: An advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle by selecting them during execution.
dynamic pipeline scheduling Hardware support for reordering the order of instruction execution to avoid stalls.
divide the pipeline into three parts:
- instruction fetch and issue unit
- multiple function unit
- commit unit
Each function has buffers to hold the operands and the operation
Three primary units of a dynamically scheduled pipeline.
out-of-order execution: A processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program.
Steps of out-of-order execution:
- 当收到一个命令后,立即将其与所有register files里的量拷贝至一个reservation station中,该命令将被缓存起来直到与命令相关的所有操作数与Functional units都处于空闲状态时立即被执行。
- 命令执行完后,生成的结果存入Commit unit中,直到“ it is safe to release the result of an operation to programmer-visible registers and
memory.”
The increase of ILP faced two major bottlenecks, despite the existence of processors with four to six issues per clock, very few applications can sustain more than two instructions per clock.
⇒
\Rightarrow
⇒Within the pipeline, the dependences are hard to alleviated.
⇒
\Rightarrow
⇒Losses in the memory hierarchy
The downside to the increasing exploitation of instruction-level parallelism via dynamic multiple issue and speculation is potential energy inefficiency. Now that we have collided with the power wall, we are seeing designs with multiple processors per chip where the processors are not as deeply pipelined or as aggressively speculative as its predecessors.
The belief is that while the simpler processors are not as fast as their sophisticated brethren, they deliver better performance per Joule, so that they can deliver more performance per chip when designs are constrained more by energy than they are by the number of transistors.
pipeline of ARM Cortex-A53 and Intel Core i7 920
- Intel fetches x86 instructions and translates them into internal RISC-V-like instructions, which Intel calls micro-operations. The micro-operations are then executed by a sophisticated, dynamically scheduled, speculative pipeline capable of sustaining an execution rate of up to six micro-operations per clock cycle.
- The Intel Core i7 uses a scheme for resolving anti-dependences and incorrect speculation that uses a reorder buffer together with register renaming.
GFLOPS(Gigaflops )每秒浮点运算次数
- Many of the difficulties of pipelining arise because of instruction set complications.
- Widely variable instruction lengths and running times can lead to imbalance among pipeline stages and severely complicate hazard detection in a design pipelined at the instruction set level.
- Addressing modes that update registers complicate hazard detection. Other addressing modes that require multiple memory accesses substantially complicate pipeline control and make it difficult to keep the pipeline flowing smoothly.