在计算机系统中，内存管理对于系统的整体性能至关重要

转载已于 2025-01-27 13:44:11 修改 · 428 阅读

2 ·

CC 4.0 BY-SA版权

原文链接：https://www.rabbitmq.com

文章标签：

#java

于 2020-04-22 12:55:07 首次发布

Job(a good job programmer) 同时被 2 个专栏收录

2370 篇文章

订阅专栏

RabbitMQ(AdvancedMessageQueue)

158 篇文章

订阅专栏

本文深入探讨了RabbitMQ的内存使用情况，包括如何监控和分析节点内存使用，提供了详细的内存使用分类，如连接、队列、消息存储等，并介绍了如何通过CLI工具、管理UI和HTTP API获取内存使用分解。

缓存命中率评估方法及其对系统性能影响

1. 缓存命中率定义与重要性

缓存命中率是指请求的数据能够在缓存中找到的比例。高命中率意味着大多数数据访问可以在快速存储器内完成，从而减少延迟并提高整体效率。

对于不同类型的缓存机制而言，其命中率计算方式可能有所不同：

本地缓存：通常位于应用程序服务器内部，用于加速频繁读取的小型对象或查询结果。
分布式缓存：跨越多个节点部署，在大型集群环境中提供一致性和扩展能力的同时保持高效能。
数据库层缓存：直接集成到关系型数据库管理系统(RDBMS)，优化SQL语句执行计划和表扫描操作。

2. 测量缓存命中率的方法

为了准确测量各种缓存策略的效果，可以通过以下几种手段获取统计数据:

使用内置监控工具收集有关内存使用情况、磁盘I/O活动以及其他资源消耗的信息；
配置日志记录功能跟踪每次缓存查找的结果（即命中的次数对比未命中的次数），以便后续分析；
应用程序层面实现自定义计数器来统计特定时间段内的总请求数目及成功从缓存获得响应的数量。

class CacheMetrics:
    def __init__(self):
        self.hits = 0
        self.misses = 0
    
    def record_hit(self):
        self.hits += 1
        
    def record_miss(self):
        self.misses += 1
        
    @property
    def hit_rate(self):
        total_requests = self.hits + self.misses
        if not total_requests:
            return None
        return self.hits / float(total_requests)

metrics = CacheMetrics()
# Simulate cache operations...
print(f"Cache Hit Rate: {metrics.hit_rate:.2%}")

3. 对系统性能的影响

当讨论缓存命中率对系统表现的作用时，主要体现在以下几个方面:

降低平均响应时间: 更高的命中率减少了对外部依赖(如远程API调用或持久化存储检索)的需求, 进而缩短了处理周期.
减轻后端压力: 减少了重复性的昂贵运算负担, 特别是在面对大量并发连接的情况下尤为明显.

然而值得注意的是, 如果设置不当可能会引发其他问题, 如过度占用有限的RAM空间造成GC频率增加等问题; 或者因为缺乏适当失效策略而导致陈旧信息长期驻留影响决策准确性.

关于计算机系统中的内存使用原理

在计算机系统中，内存管理对于系统的整体性能至关重要。当涉及到深度学习模型推理服务时，相比于训练作业，其主要涉及较小的卷积操作（例如 1x1, 3x3），消耗较少的 GPU 资源。

内存使用的几个核心原则

局部性原理：程序倾向于重复访问相同的存储位置或相邻的位置。这种行为可以分为时间局部性和空间局部性两种形式。
虚拟地址映射：现代操作系统通过页表机制实现物理地址与虚拟地址之间的转换，允许应用程序认为自己拥有连续的大块内存区域，即使这些页面实际上可能分布在不同的物理位置。
缓存层次结构：为了提高数据读取效率，CPU 和主存之间设置了多个级别的高速缓冲存储器（Cache）。L1 Cache 最接近处理器核，容量最小但速度最快；随着层级增加，虽然延迟增大，但是容量也相应提升。

常见的内存优化策略

批处理请求：一种常见的做法是为了改善 GPU 利用率而批量处理多个推断请求并同时执行它们。这种方法能够有效减少每次单独调用带来的开销，并充分利用硬件资源。
压缩技术应用：采用特定算法来减小待处理的数据量，比如图像识别领域常用的 JPEG 编码方式可以在不明显影响视觉效果的前提下大幅降低文件大小。
高效数据结构设计：合理规划内部表示法有助于节省更多空间。例如，在构建决策树的过程中，将规则分类的关键位集中放置在一个可以通过单次内存访问完成检索的小型紧凑结构内，从而减少了所需的总字节数以及加快了查询过程的速度。

import numpy as np

def compress_data(data_array):
    """示例函数展示如何简单地压缩数组"""
    compressed = np.packbits((data_array > 0).astype(int))
    return compressed

Operators need to be able to reason about node’s memory use, both absolute and relative (“what uses most memory”). This is an important aspect of system monitoring.

RabbitMQ provides tools that report and help analyse node memory use:

rabbitmq-diagnostics memory_breakdown
rabbitmq-diagnostics status includes the above breakdown as a section
Prometheus and Grafana-based monitoring makes it possible to observe memory breakdown over time
Management UI provides the same breakdown on the node page as rabbitmq-diagnostics status
HTTP API provides the same information as the management UI, useful for monitoring
rabbitmq-top and rabbitmq-diagnostics observer provide a more fine-grained top-like per Erlang process view

Obtaining a node memory breakdown should be the first step when reasoning about node memory use.

Note that all measurements are somewhat approximate, based on values returned by the underlying runtime or the kernel at a specific point in time, usually within a 5 seconds time window.
Total Memory Use Calculation Strategies

RabbitMQ can use different strategies to compute how much memory a node uses. Historically, nodes obtained this information from the runtime, reporting how much memory is used (not just allocated). This strategy, known as legacy (alias for erlang) tends to underreport and is not recommended.

Effective strategy is configured using the vm_memory_calculation_strategy key. There are two primary options:

rss uses OS-specific means of querying the kernel to find RSS (Resident Set Size) value of the node OS process. This strategy is most precise and used by default on Linux, MacOS, BSD and Solaris systems. When this strategy is used, RabbitMQ runs short lived subprocesses once a second.

allocated is a strategy that queries runtime memory allocator information. It is usually quite close to the values reported by the rss strategy. This strategy is used by default on Windows.

The vm_memory_calculation_strategy setting also impacts memory breakdown reporting. If set to legacy (erlang) or allocated, some memory breakdown fields will not be reported. This is covered in more detail further in this guide.

The following configuration example uses the rss strategy:

vm_memory_calculation_strategy = rss

Similarly, for the allocated strategy, use:

vm_memory_calculation_strategy = allocated

To find out what strategy a node uses, see its effective configuration.
Memory Use Breakdown
How Memory Breakdown Works

Memory use breakdown reports allocated memory distribution on the target node, by category:

Connections (further split into four categories: readers, writers, channels, other)
Quorum queue replicas
Classic queue master replicas
Classic queue mirror replicas
Message Store and Indices
Binary heap references
Node-local metrics (stats database)
Internal database tables
Plugins
Memory allocated but not yet used
Code (bytecode, module metadata)
ETS (in memory key/value store) tables
Atom tables
Other

Generally there is no overlap between the categories (no double accounting for the same memory). Plugins and runtime versions may affect this.
Producing Memory Use Breakdown Using CLI Tools

A common way of producing memory breakdown is via rabbitmq-diagnostics memory_breakdown:

quorum_queue_procs: 0.4181 gb (28.8%)
binary: 0.4129 gb (28.44%)
allocated_unused: 0.1959 gb (13.49%)
connection_other: 0.1894 gb (13.05%)
plugins: 0.0373 gb (2.57%)
other_proc: 0.0325 gb (2.24%)
code: 0.0305 gb (2.1%)
quorum_ets: 0.0303 gb (2.09%)
connection_readers: 0.0222 gb (1.53%)
other_system: 0.0209 gb (1.44%)
connection_channels: 0.017 gb (1.17%)
mgmt_db: 0.017 gb (1.17%)
metrics: 0.0109 gb (0.75%)
other_ets: 0.0073 gb (0.5%)
connection_writers: 0.007 gb (0.48%)
atom: 0.0015 gb (0.11%)
mnesia: 0.0006 gb (0.04%)
msg_index: 0.0002 gb (0.01%)
queue_procs: 0.0002 gb (0.01%)
queue_slave_procs: 0.0 gb (0.0%)
reserved_unallocated: 0.0 gb (0.0%)

Report Field Category Details
total Total amount as reported by the effective memory calculation strategy (see above)
connection_readers Connections Processes responsible for connection parser and most of connection state. Most of their memory attributes to TCP buffers. The more client connections a node has, the more memory will be used by this category. See Networking guide for more information.
connection_writers Connections Processes responsible for serialisation of outgoing protocol frames and writing to client connection sockets. The more client connections a node has, the more memory will be used by this category. See Networking guide for more information.
connection_channels Channels The more channels client connections use, the more memory will be used by this category.
connection_other Connections Other memory related to client connections
quorum_queue_procs Queues Quorum queue processes, both currently elected leaders and followers. Memory footprint can be capped on a per-queue basis. See the Quorum Queues guide for more information.
queue_procs Queues Classic queue masters, indices and messages kept in memory. The greater the number of messages enqueued, the more memory will generally be attributed to this section. However, this greatly depends on queue properties and whether messages were published as transient. See Memory, Queues, and Lazy Queues guides for more information.
queue_slave_procs Queues Classic queue mirrors, indices and messages kept in memory. Reducing the number of mirrors (replicas) or not mirroring queues with inherently transient data can reduce the amount of RAM used by mirrors. The greater the number of messages enqueued, the more memory will generally be attributed to this section. However, this greatly depends on queue properties and whether messages were published as transient. See Memory, Queues, Mirroring, and Lazy Queues guides for more information.
metrics Stats DB Node-local metrics. The more connections, channels, queues are node hosts, the more stats there are to collect and keep. See management plugin guide for more information.
stats_db Stats DB Aggregated and pre-computed metrics, inter-node HTTP API request cache and everything else related to the stats DB. See management plugin guide for more information.
binaries Binaries Runtime binary heap. Most of this section is usually message bodies and properties (metadata).
plugins Plugins Plugins such as Shovel, Federation, or protocol implementations such as STOMP can accumulate messages in memory.
allocated_unused Preallocated Memory Allocated by the runtime but not yet used.
reserved_unallocated Preallocated Memory Allocated/reserved by the kernel but not the runtime
mnesia Internal Database Virtual hosts, users, permissions, queue metadata and state, exchanges, bindings, runtime parameters and so on.
quorum_ets Internal Database Raft implementation’s WAL and other memory tables. Most of these are periodically moved to disk.
other_ets Internal Database Some plugins can use ETS tables to store their state
code Code Bytecode and module metadata. This should only consume double digit % of memory on blank/empty nodes.
other Other All other processes that RabbitMQ cannot categorise
Producing Memory Use Breakdown Using Management UI

Management UI can be used to produce a memory breakdown chart. This information is available on the node metrics page that can be accessed from Overview:

Cluster node list in management UI

On the node metrics page, scroll down to the memory breakdown buttons:

Node memory use breakdown buttons

Memory and binary heap breakdowns can be expensive to calculate and are produced on demand when the Update button is pressed:

Node memory use breakdown chart

It is also possible to display a breakdown of binary heap use by various things in the system (e.g. connections, queues):

Binary heap use breakdown chart
Producing Memory Use Breakdown Using HTTP API and curl

It is possible to produce memory use breakdown over HTTP API by issuing a GET request to the /api/nodes/{node}/memory endpoint.

curl -s -u guest:guest http://127.0.0.1:15672/api/nodes/rabbit@mercurio/memory | python -m json.tool

{
“memory”: {
“atom”: 1041593,
“binary”: 5133776,
“code”: 25299059,
“connection_channels”: 1823320,
“connection_other”: 150168,
“connection_readers”: 83760,
“connection_writers”: 113112,
“metrics”: 217816,
“mgmt_db”: 266560,
“mnesia”: 93344,
“msg_index”: 48880,
“other_ets”: 2294184,
“other_proc”: 27131728,
“other_system”: 21496756,
“plugins”: 3103424,
“queue_procs”: 2957624,
“queue_slave_procs”: 0,
“total”: 89870336
}
}

It is also possible to retrieve a relative breakdown using the GET request to the /api/nodes/{node}/memory endpoint. Note that reported relative values are rounded to integers. This endpoint is intended to be used for relative comparison (identifying top contributing categories), not precise calculations.

curl -s -u guest:guest http://127.0.0.1:15672/api/nodes/rabbit@mercurio/memory/relative | python -m json.tool

{
“memory”: {
“allocated_unused”: 32,
“atom”: 1,
“binary”: 5,
“code”: 22,
“connection_channels”: 2,
“connection_other”: 1,
“connection_readers”: 1,
“connection_writers”: 1,
“metrics”: 1,
“mgmt_db”: 1,
“mnesia”: 1,
“msg_index”: 1,
“other_ets”: 2,
“other_proc”: 21,
“other_system”: 19,
“plugins”: 3,
“queue_procs”: 4,
“queue_slave_procs”: 0,
“reserved_unallocated”: 0,
“total”: 100
}
}

Memory Breakdown Categories
Connections

This includes memory used by client connections (including Shovels and Federation links and channels, and outgoing ones (Shovels and Federation upstream links). Most of the memory is usually used by TCP buffers, which on Linux autotune to about 100 kB in size by default. TCP buffer size can be reduced at the cost of a proportional decrease in connection throughput. See the Networking guide for details.

Channels also consume RAM. By optimising how many channels applications use, that amount can be decreased. It is possible to cap the max number of channels on a connection using the channel_max configuration setting:

channel_max = 16

Note that some libraries and tools that build on top of RabbitMQ clients may implicitly require a certain number of channels. Finding an optimal value is usually a matter of trial and error.
Queues and Messages

Memory used by queues, queue indices, queue state. Messages enqueued will in part contribute to this category.

Queues will swap their contents out to disc when under memory pressure. The exact behavior of this depends on queue properties, whether clients publish messages as persistent or transient, and persistence configuration of the node.

Message bodies do not show up here but in Binaries.
Message Store Indexes

By default message store uses an in-memory index of all messages, including those paged out to disc. Plugins allow for replacing it with disk-based implementations.
Plugins

Memory used by plugins (apart from the Erlang client which is counted under Connections, and the management database which is counted separately). This category will include some per-connection memory here for protocol plugins such as STOMP and MQTT as well as messages enqueued by plugins such as Shovel and Federation.
Preallocated Memory

Memory preallocated by the runtime (VM allocators) but not yet used. This is covered in more detail below.
Internal Database

Internal database (Mnesia) tables keep an in-memory copy of all its data (even on disc nodes). Typically this will only be large when there are a large number of queues, exchanges, bindings, users or virtual hosts. Plugins can store data in the same database as well.
Management (Stats) Database

The stats database (if the management plugin is enabled). In a cluster, most stats are stored locally on the node. Cross-node requests needed to aggregate stats in a cluster can be cached. The cached data will be reported in this category.
Binaries

Memory used by shared binary data in the runtime. Most of this memory is message bodies and metadata.

With some workloads binary data heap can be garbage collected infrequently. rabbitmqctl force_gc can be used force collection. The following couple of commands forces collection and reports top processes that released most binary heap references:

rabbitmqctl eval ‘recon:bin_leak(10).’

rabbitmqctl force_gc

With RabbitMQ versions that do not provide rabbitmqctl force_gc, use

rabbitmqctl eval ‘recon:bin_leak(10).’

rabbitmqctl eval ‘[garbage_collect§ || P <- processes()].’

Other ETS tables

Other in-memory tables besides those belonging to the stats database and internal database tables.
Code

Memory used by code (bytecode, module metadata). This section is usually fairly constant and relatively small (unless the node is entirely blank and stores no data).
Atoms

Memory used by atoms. Should be fairly constant.
Per-process Analysis with rabbitmq-top

rabbitmq-top is a plugin that helps identify runtime processes (“lightweight threads”) that consume most memory or scheduler (CPU) time.

The plugin ships with RabbitMQ. Enable it with

[sudo] rabbitmq-plugins enable rabbitmq_top

The plugin adds new administrative tabs to the management UI. One tab displays top processes by one of the metrics:

Memory used
Reductions (unit of scheduler/CPU consumption)
Erlang mailbox length
For gen_server2 processes, internal operation buffer length

Top processes in rabbitmq-top

Second tab displays ETS (internal key/value store) tables. The tables can be sorted by the amount of memory used or number of rows:

Top ETS tables in rabbitmq-top
Preallocated Memory

Erlang memory breakdown reports only memory is currently being used, and not the memory that has been allocated for later use or reserved by the operating system. OS tools like ps can report more memory used than the runtime.

This memory consists of allocated but not used, as well as unallocated but reserved by the OS. Both values depend on the OS and Erlang VM allocator settings and can fluctuate significantly.

How the value in both sections is computed depend on the vm_memory_calculation_strategy setting. If the strategy is set to erlang, unused memory will not be reported. If memory calculation strategy is set to allocated, memory reserved by OS will not be reported. Therefore rss is the strategy that provides most information from both the kernel and the runtime.

When a node reports a large amount of allocated but unused memory on a long running node, it may be an indicator of runtime memory fragmentation. A different set of allocator settings can reduce fragmentation and increase the percentage of efficiently used memory. The right set of settings depends on the workload and message payload size distribution.

Runtime’s memory allocator behavior can be tuned, please refer to erl and erts_alloc documentation.
Memory Use Monitoring

It is recommended that production systems monitor memory usage of all cluster nodes, ideally with a breakdown, together with infrastructure-level metrics. By correlating breakdown categories with other metrics, e.g. the number of concurrent connections or enqueued messages, it becomes possible to detect problems that stem from a application-specific behavior (e.g. connection leaks or ever growing queues without consumers).
Queue Memory
How much memory does a message use?

A message has multiple parts that use up memory:

Payload: >= 1 byte, variable size, typically few hundred bytes to a few hundred kilobytes
Protocol attributes: >= 0 bytes, variable size, contains headers, priority, timestamp, reply to, etc.
RabbitMQ metadata: >= 720 bytes, variable size, contains exchange, routing keys, message properties, persistence, redelivery status, etc.
RabbitMQ message ordering structure: 16 bytes

Messages with a 1KB payload will use up 2KB of memory once attributes and metadata is factored in.

Some messages can be stored on disk, but still have their metadata kept in memory.
How much memory does a queue use?

A message has multiple parts that use up memory. Every queue is backed an Erlang process. If a queue is mirrored, each mirror is a separate Erlang process.

Since a queues master is a single Erlang process, message ordering can be guaranteed. Multiple queues means multiple Erlang processes which get an even amount of CPU time. This ensures that no queue can block other queues.

The memory use of a single queue can be obtained via the HTTP API:

curl -s -u guest:guest http://127.0.0.1:15672/api/queues/%2f/queue-name |
python -m json.tool

{
…
“memory”: 97921904,
…
“message_bytes_ram”: 2153429941,
…
}

memory: memory used by the queue process, accounts message metadata (at least 720 bytes per message), does not account for message payloads over 64 bytes
message_bytes_ram: memory used by the message payloads, regardless of the size

If messages are small, message metadata can use more memory than the message payload. 10,000 messages with 1 byte of payload will use 10KB of message_bytes_ram (payload) & 7MB of memory (metadata).

If message payloads are large, they will not be reflected in the queue process memory. 10,000 messages with 100 KB of payload will use 976MB of message_bytes_ram (payload) & 7MB of memory (metadata).
Why does the queue memory grow and shrink when publishing/consuming?

Erlang uses generational garbage collection for each Erlang process. Garbage collection is done per queue, independently of all other Erlang processes.

When garbage collection runs, it will copy used process memory before deallocating unused memory. This can can lead to the queue process using up to twice as much memory during garbage collection, as shown here (queue contains a lot of messages):

Queue under load memory usage
Is queue memory growth during garbage collection a concern?

If Erlang VM tries to allocate more memory than is available, the VM itself will either crash or be killed by the OOM killer. When the Erlang VM crashes, RabbitMQ will lose all non-persistent data.

High memory watermark blocks publishers and prevents new messages from being enqueued. Since garbage collection can double the memory used by a queue, it is unsafe to set the high memory watermark above 0.5. The default high memory watermark is set to 0.4 since this is safer as not all memory is used by queues. This is entirely workload specific, which differs across RabbitMQ deployments.

We recommend many queues so that memory allocation / garbage collection is spread across many Erlang processes.

If the messages in a queue take up a lot of memory, we recommend lazy queues so that they are stored on disk as soon as possible and not kept in memory longer than is necessary.
Getting Help and Providing Feedback

If you have questions about the contents of this guide or any other topic related to RabbitMQ, don’t hesitate to ask them on the RabbitMQ mailing list.
Help Us Improve the Docs ❤️

If you’d like to contribute an improvement to the site, its source is available on GitHub. Simply fork the repository and submit a pull request. Thank you!
In This Section

Server Documentation
    Configuration
    Management UI
    Monitoring
    Production Checklist
    TLS Support
    Feature Flags
    Distributed RabbitMQ
    Clustering
    Reliable Delivery
    Backup and restore
    Alarms
        Memory Alarms
        Disk Alarms
        Memory Use
        Flow Control
    Memory Use
    Networking
    Virtual Hosts
    High Availability (pacemaker)
    Access Control (Authorisation)
    Authentication Mechanisms
    LDAP
    Lazy Queues
    Internal Event Exchange
    Firehose (Message Tracing)
    Manual Pages
    Windows Quirks
Client Documentation
Plugins
News
Protocol
Our Extensions
Building
Previous Releases
License

In This Section

Server Documentation
    Configuration
    Management UI
    Monitoring
    Production Checklist
    TLS Support
    Feature Flags
    Distributed RabbitMQ
    Clustering
    Reliable Delivery
    Backup and restore
    Alarms
        Memory Alarms
        Disk Alarms
        Memory Use
        Flow Control
    Memory Use
    Networking
    Virtual Hosts
    High Availability (pacemaker)
    Access Control (Authorisation)
    Authentication Mechanisms
    LDAP
    Lazy Queues
    Internal Event Exchange
    Firehose (Message Tracing)
    Manual Pages
    Windows Quirks
Client Documentation
Plugins
News
Protocol
Our Extensions
Building
Previous Releases
License