- Superscalar processor
- Such a CPU that can execute multiple instruction pipelines at the same time
- Fetch multiple instructions at a time
- Instruction-level parallelism
- Limitations
-
- Instruction level parallelism
- Compiler based optimisation
- Hardware techniques
- Limited by
-
- True data dependency(Write then read)
-
- Also called write-read dependency or flow dependency
- Procedural dependency
-
- Cannot execute instructions after a branch until the branch is executed
- Resource conflicts
-
- Two or more instructions requiring access to the same resource at the same time
- Output dependency(Write then write)
- Antidependency(Read then write)(May anti-true-data?..)
-
- Read-write dependency
- Features of superscalar
- The essence: the ability to execute instructions independently in different pipelines
- Superpipeline: many pipeline stages need less than half a clock cycle
- Superscalar instruction issue policies
- In-order issue with in-order completion: the simplest, not very efficient
-
- Instructions must stall in order to keep sequence of instructions
- In-order issue with out-of-order completion
-
- Execute and output as soon as possible
- instructions issuing is stalled by a resource conflicts, a data dependency, or a procedural dependency
- Out-of-order issue with out-of-order completion
-
- With in-order issuse, if a dependence is met, pipeline will be paused until the conflict is solved
- Decouple decode pipeline from execution pipeline, to do this, a buffer is needed, called instruction window
- When a functional unit becomes available an instruction can be executed
-
- Any instruction may be issued, provided that functional unit is available and no conflicts block this instruction
- Original sequence is broken, but result must be correct
- It is a mode that makes use of delay time, independent instructions are executed ahead of the dependency instructions
- Register renaming
-
- Output and antidependencies occur because register contents may not reflect the correct ordering from the program
- Solving method
-
- Stalling the pipeline
- Register renaming
-
- It is just a duplication of resources
- Using a different register
- Machine parallelism
-
- 3 hardware techniques used in a superscalar processor to improve performance
-
- Duplication of resources
- Out-of-order issue
- Renaming
- Branch prediction
- Static branch prediction
-
- Branch always occurs
- Always does not occur
- Dynamic branch prediction
-
- Branch history analysis
- Superscalar implementation
- Simultaneously fetch multiple instructions
- Logic to determine true dependencies involving register values
- Mechanisms to communicate these values
- Mechanisms to initiate multiple instructions in parallel
- Resources for parallel execution of multiple instructions
- Mechanisms for committing process state in correct order
- Superscalar in Pentium 2
- Reorder Buffer--ROB
-
- ROB is a circular buffer and contains 40 registers
- The buffer contains following fields:
-
- State:
-
- execution, completion, or retirement
- Memory address:
-
- the address of machine instruction corresponding to micro-op
- Micro-op
-
- Actual operation
- Alias register
-
- Redirect a register
- Micro-ops enter ROB in order, and then dispatched out of order, as long as unit and data required are available
- Finally, micro-ops are retired from ROB in order
Chapter 14 Instruction level parallelism and superscalar processors
最新推荐文章于 2022-03-11 06:09:45 发布