Chapter 18 Parallel Processing

  1. Classification of parallel processor system based on Flynn
    • Single instruction, single data stream
    • - SISD
      • Single processor
      • Single instruction stream
      • Data stored in single memory
      • Uni-processor
        • Pipeline
    • Single instruction, multiple data stream
    • - SIMD
    • Multiple instruction, single data stream
    • - MISD
      • Never implemented
    • Multiple instruction, multiple data stream
    • - MIMD
  2. Organization classification of a numtiprocessor system
    • Time shared or common bus: simplest(SMP)
      • Pros
        • Simplicity
        • Flexibility
        • Reliability:  Failure of advice should not cause failure of the whole system
      • Cons
        • Performance limited by bus cycle time
        • Each processor should have local cache
        • Leads to problems with cache coherence: solved in hardware(Protocol)
    • Multiport memory
      • Direct, independent access of memory modules by each processor or I/O module
      • Logic required to resolve conflicts
        • priority
      • Little or no modification to processors or I/O modules required
      • Pros
        • Better performance
        • Can configure portions of memory as private to one or more processors
      • Cons
        • More complex control
        • Write through cache policy should be used for cache control
    • Central control unit
      • Pros
        • Flexibility
        • Simplicity of interface
      • Cons
        • Structure is complex  -----> bottleneck
    • Interconnect networks
      • A network in which switch components are interconnected according to certain topology and control mode
        • By this, many processors or function units can be connected key element in computer performance
      • Types of Interconnection Networks
        • Static interconnection networks
        • Dynamic interconnection networks
  3. SMP: Symmetric multiprocessor
    • Share single memory or pool of memory
    • By means of shared bus or other interconnection mechanism to access memory
    • Memory access time to given area of memory is approximately the same for each processor
    • Processors share the same memory and I/O
    • Pros:
      • Greater performance: parallel work
      • Availability/Fault-tolerance: failure of a single processor does not halt the system
      • Incremental growth: add addtional processors
      • Scalable
      • Existence of multiple processors is transparent to users
  4. NUMA: Non-uniform memory access
    • Access time to different regions of memory may differ in a NUMA
    • CC-NUMA: cache-coherent NUMA
    • A NUMA system without cache coherence is more or less equivalent to a cluster
    • With a SMP system, as the number of processors increases, the bus may become a performance bottleneck
      • The bus traffic increases
      • Cache coherence signals further add to the burden
      • So, processors are not infinite scalable
        • Typically, 16~64 processors in  SMP
    • With a cluster, each node has its own private main memory, applications do not see a large global memory and this affects to achieve maximum performance
    • NUMA can compensate above limitations
    • Cache coherence
      • Need a protocol based on directory
      • If a modification to shared data is done in a cache, this fact can be broadcast to other nodes
    • Pros
      • Higher levels of parallelism than SMP, without major software changes
      • The network traffic is limited, because remote accesses are not excessive
    • Cons
      • Need a new OS to support
      • Availability
  5. Clusters(Uniformed memory access)
    • Collections of independent uniprocessors or SMPs
    • Interconnected to form a cluster
    • Communication via fixed path or network connections
    • Pros
      • Absulute scalability
      • Incremental scalability
      • High availability
      • Superior price/performance
    • Light weight clusters
      • Passive standby: primary periodically heart beats to standby
      • Active secondary
  6. Cache coherence
    • Software solutions
      • Compiler and operating system-based
        • The compiler analyses source codes to mark those data which can not be put into local cache
        • The compiler may insert addtional instructions to enforce cache coherence during the critical periods
      • Pros
        • Overhead transferred from run time to compile time
        • Design complexity transferred from hardware to software
      • Cons
        • Inefficient cache utilization
        • Not transparent to compiler designers and some programmers
    • Hardware solution
      • Cache coherence protocols
        • Dynamic recognition of potential problems in run time
        • More efficient use of cache
        • Transparent to programmers
        • Two catagories
          • Directory protocols
            • Collect and maintain global information about copies of data in local caches
            • Directory stored in main memory
            • When a processor writes its local cache, directory is checked to determine whether the line is in other local caches or not
              • If not, write its cache
              • Otherwise, the controller informs all processors holding this cache line to invalidate its copy. After all ACK received, the requesting processor just writes its cache
              • Thereafter, if another processor tries to read that line, it will send a miss notification to the controller. The controller will command the processor holding the line to write back to main memory
                • Cache controller
                  • Central controller: effective in large scale systems with complex interconnection schemes
                    • One directory stores all cache coherence information
                  • Distributed controller: complex
                    • All caches have its own directories
                    • Snoopy protocol
          • Snoopy protocols
            • Updates annouced to other caches by broadcast
  7. MESI: modified, excusive, shared, invalid(use 2 status bits per tag)
    • Designed for supporting cache consistency
      • Multiprocessor
      • Multi-level cache
    • Status
      • Modified:
        • the line has been modified , is available only in this cache
      • Exclusive:
        • The line is the same as that in main memory, not present in any other caches
      • Shared:
        • The line may also be in other caches and main memory
      • Invalid
        • The line is invalid, must access main memory
  8. Vector computation
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值