- Classification of parallel processor system based on Flynn
- Single instruction, single data stream
-
- SISD
-
- Single processor
- Single instruction stream
- Data stored in single memory
- Uni-processor
-
- Pipeline
- Single instruction, multiple data stream
-
- SIMD
-
- Multiple instruction, single data stream
-
- MISD
-
- Never implemented
- Multiple instruction, multiple data stream
-
- MIMD
-
- Organization classification of a numtiprocessor system
- Time shared or common bus: simplest(SMP)
-
- Pros
-
- Simplicity
- Flexibility
- Reliability: Failure of advice should not cause failure of the whole system
- Cons
-
- Performance limited by bus cycle time
- Each processor should have local cache
- Leads to problems with cache coherence: solved in hardware(Protocol)
- Multiport memory
-
- Direct, independent access of memory modules by each processor or I/O module
- Logic required to resolve conflicts
-
- priority
- Little or no modification to processors or I/O modules required
- Pros
-
- Better performance
- Can configure portions of memory as private to one or more processors
- Cons
-
- More complex control
- Write through cache policy should be used for cache control
- Central control unit
-
- Pros
-
- Flexibility
- Simplicity of interface
- Cons
-
- Structure is complex -----> bottleneck
- Interconnect networks
-
- A network in which switch components are interconnected according to certain topology and control mode
-
- By this, many processors or function units can be connected key element in computer performance
- Types of Interconnection Networks
-
- Static interconnection networks
- Dynamic interconnection networks
- SMP: Symmetric multiprocessor
- Share single memory or pool of memory
- By means of shared bus or other interconnection mechanism to access memory
- Memory access time to given area of memory is approximately the same for each processor
- Processors share the same memory and I/O
- Pros:
-
- Greater performance: parallel work
- Availability/Fault-tolerance: failure of a single processor does not halt the system
- Incremental growth: add addtional processors
- Scalable
- Existence of multiple processors is transparent to users
- NUMA: Non-uniform memory access
- Access time to different regions of memory may differ in a NUMA
- CC-NUMA: cache-coherent NUMA
- A NUMA system without cache coherence is more or less equivalent to a cluster
- With a SMP system, as the number of processors increases, the bus may become a performance bottleneck
-
- The bus traffic increases
- Cache coherence signals further add to the burden
- So, processors are not infinite scalable
-
- Typically, 16~64 processors in SMP
- With a cluster, each node has its own private main memory, applications do not see a large global memory and this affects to achieve maximum performance
- NUMA can compensate above limitations
- Cache coherence
-
- Need a protocol based on directory
- If a modification to shared data is done in a cache, this fact can be broadcast to other nodes
- Pros
-
- Higher levels of parallelism than SMP, without major software changes
- The network traffic is limited, because remote accesses are not excessive
- Cons
-
- Need a new OS to support
- Availability
- Clusters(Uniformed memory access)
- Collections of independent uniprocessors or SMPs
- Interconnected to form a cluster
- Communication via fixed path or network connections
- Pros
-
- Absulute scalability
- Incremental scalability
- High availability
- Superior price/performance
- Light weight clusters
-
- Passive standby: primary periodically heart beats to standby
- Active secondary
- Cache coherence
- Software solutions
-
- Compiler and operating system-based
-
- The compiler analyses source codes to mark those data which can not be put into local cache
- The compiler may insert addtional instructions to enforce cache coherence during the critical periods
- Pros
-
- Overhead transferred from run time to compile time
- Design complexity transferred from hardware to software
- Cons
-
- Inefficient cache utilization
- Not transparent to compiler designers and some programmers
- Hardware solution
-
- Cache coherence protocols
-
- Dynamic recognition of potential problems in run time
- More efficient use of cache
- Transparent to programmers
- Two catagories
-
- Directory protocols
-
- Collect and maintain global information about copies of data in local caches
- Directory stored in main memory
- When a processor writes its local cache, directory is checked to determine whether the line is in other local caches or not
-
- If not, write its cache
- Otherwise, the controller informs all processors holding this cache line to invalidate its copy. After all ACK received, the requesting processor just writes its cache
- Thereafter, if another processor tries to read that line, it will send a miss notification to the controller. The controller will command the processor holding the line to write back to main memory
-
- Cache controller
-
- Central controller: effective in large scale systems with complex interconnection schemes
-
- One directory stores all cache coherence information
- Distributed controller: complex
-
- All caches have its own directories
- Snoopy protocol
- Snoopy protocols
-
- Updates annouced to other caches by broadcast
- MESI: modified, excusive, shared, invalid(use 2 status bits per tag)
- Designed for supporting cache consistency
-
- Multiprocessor
- Multi-level cache
- Status
-
- Modified:
-
- the line has been modified , is available only in this cache
- Exclusive:
-
- The line is the same as that in main memory, not present in any other caches
- Shared:
-
- The line may also be in other caches and main memory
- Invalid
-
- The line is invalid, must access main memory
- Vector computation
Chapter 18 Parallel Processing
最新推荐文章于 2020-02-24 21:33:56 发布