NUMA Collections

Simple intro:

wiki entry

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

A more detailed intro: 

Understanding NUMA Architecture

Hardware and Software (Linux OS) perspective on NUMA:

​​​​​What is NUMA? — The Linux Kernel documentation

 ScienceDirect Collection (see UMA and NUMA):

https://www.sciencedirect.com/topics/computer-science/non-uniform-memory-acc

2.5.1 UMA and NUMA

UMA and NUMA stand for uniform memory access and non-uniform memory access, respectively. The terms are most often used in the context of higher order memory systems and not specifically cache systems, but they are useful designations in the discussion of distributed/partitioned caches because UMA/NUMA systems illustrate the main issues of dealing with physically disjoint memories. A NUMA architecture is one in which all memory accesses have nominally the same latency; a NUMA architecture is one in which references to different addresses may have different latencies due to the nature of where the data is stored. The implication behind the deceptively simple “where the data is stored” is that an item's identity is tied to its location. For instance, in a solid-state setting, an item's physical address determines the memory partition in which it can be found. As an example, in a system of 16 memory nodes, the physical memory space could be divided into 16 regions, and the top 4 bits of the address would determine which node owns the data in question.

Figure 2.14(a) shows an organization in which a CPU's requests all go to the same memory system (which could be a cache, the DRAM system, or a disk; the details do not matter for purposes of this discussion). Thus, all memory requests have nominally the same latency—any observed variations in latency will be due to the specifics of the memory technology and not the memory organization. This organization is typical of shared cache organizations (e.g., the last-level cache of a chip-level multiprocessor) and symmetric multiprocessors.

Figure 2.14(b) shows an organization in which a CPU's memory request may be satisfied by one of many possible sub-memories. In particular, a request to any non-local memory (any memory not directly attached to the CPU making the request) is likely to exhibit a higher latency than a request to a local memory. This organization is used in many commercial settings, such as those based on AMD Opteron or Alpha 21364 (EV7) processors. The ccNUMA variant (for cache-coherent NUMA) is one in which each processor or processor cluster maintains a cache or cache system that caches requests to reduce internode traffic and average latency to non-local data.

The concepts embodied in the terms UMA and NUMA extend to cache systems and memory systems in general because the issues are those of data partitioning and placement, cost/benefit analysis, etc., which are universal. The Fully Buffered DIMM is a prime example of a non-uniform memory architecture which can be configured to behave as a uniform architecture by forcing all DIMMs in the system to stall and emulate the latency of the slowest DIMM in the system. For more detail, see Chapter 14, “The Fully Buffered DIMM Memory System.”

A deeper delve into the workload based performance of NUMA system

https://www.cse.wustl.edu/~angelee/cse539/papers/numa.pdf

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值