1. Memory Hierarchy
register --> cache(normally integrated in CPU) --> memory(DRAM) --> disk --> tape
2. cache hit: Data found in cache
cache miss: not found in cache, Processor loads data from memory. This result in extra delay, called miss penalty
3. Cache placement policies:
- Direct mapped cache fill
- 2-way associative cache
- fully associative cache: Every mom block can go any slot, need LRU replacement policy when caceh is full. Fewest conflict misses for a given cache capacity, but they require more hardware for additional tag comparisons. They are best suited to relatively small caches because of the large number of comparators.
Each cache slot holds block data, tag, valid bit, and dirty bit(dirty bit is only for write-back)
4. LRU cache
5. Cache coherence
AMD uses MOESI, Intel core i7 uses MESIF
MOESI
Benefits: MOESI allows sending dirty cache lines directly between caches instead of writing back to a shared outer cache and then reading from there. The linked wiki article has a bit more detail, but it's basically about sharing dirty data. The Owned state keeps track of which cache is responsible for writing back dirty the data.
MESIF allows caches to Forward a copy of a clean cache line to another cache, instead of other caches having to re-read it from memory to get another Shared copy.
MOESI VS MESI: MOESI is almost always superior to MESI in terms of absolute performance. However, MESI only requires 2 bits per cache line to hold the state, while MOESI requires 3 bits per cache line. MOESI might be too expensive for low-energy/low-performance/small processors.
6. Dynamic scheduling of instructions
- ScoreBoarding: solves RAW hazard.
- Tomasulo:
Renaming of registers • Avoids WAW and WAR hazards
Tomasulo’s algorithm is a computer architecture hardware algorithm for dynamic scheduling of instructions that allows out-of-order execution and enables more efficient use of multiple execution units.
Reservation station(RS) registers holds the value of 2 target and the op and the tag that gonna stores in the register file.
With re-order buffer: