Python 获取显存信息

二分掌柜的

于 2025-03-19 18:45:06 发布

阅读量951

点赞数 19

分类专栏：大模型 # Python 文章标签： python torch.cuda

本文链接：https://blog.csdn.net/flyfish1986/article/details/146372277

版权

大模型同时被 2 个专栏收录

260 篇文章

订阅专栏

Python

92 篇文章

订阅专栏

Python 获取显存信息

flyfish

常用函数

1. `torch.cuda.is_available()`

该函数用于判断当前环境是否支持CUDA。

import torch
if torch.cuda.is_available():
    print("CUDA is available.")
else:
    print("CUDA is not available.")

2. `torch.cuda.device_count()`

此函数可返回当前可用的CUDA设备数量。

import torch
device_count = torch.cuda.device_count()
print(f"Number of available CUDA devices: {device_count}")

3. `torch.cuda.get_device_name(device)`

它能返回指定CUDA设备的名称。

import torch
if torch.cuda.is_available():
    device_name = torch.cuda.get_device_name(0)
    print(f"Name of CUDA device 0: {device_name}")

4. `torch.cuda.get_device_properties(device)`

该函数会返回指定CUDA设备的属性。

import torch
if torch.cuda.is_available():
    properties = torch.cuda.get_device_properties(0)
    print(f"Total memory of CUDA device 0: {properties.total_memory} bytes")

5. `torch.cuda.memory_allocated(device)`

此函数可返回指定CUDA设备上当前已分配的显存大小（以字节为单位）。

import torch
if torch.cuda.is_available():
    allocated = torch.cuda.memory_allocated(0)
    print(f"Allocated memory on CUDA device 0: {allocated} bytes")

6. `torch.cuda.memory_reserved(device)`

它能返回指定CUDA设备上当前已预留的显存大小（以字节为单位）。

import torch
if torch.cuda.is_available():
    cached = torch.cuda.memory_reserved(0)
    print(f"Reserved memory on CUDA device 0: {cached} bytes")

7. `torch.cuda.set_device(device)` 设备管理

该函数用于设置当前使用的 CUDA 设备。在多 GPU 环境下，可借助此函数指定后续操作要使用的 GPU 设备。

import torch

# 设置使用 GPU 1
torch.cuda.set_device(1)
x = torch.tensor([1.0]).cuda()
print(x.device)

8. `torch.cuda.current_device()` 设备管理

它用于返回当前正在使用的 CUDA 设备的索引。

import torch
device_index = torch.cuda.current_device()
print(f"Current CUDA device index: {device_index}")

9. `torch.cuda.empty_cache()` 显存管理

此函数可释放当前未使用的缓存显存，以减少显存的占用。在某些情况下，当你进行了大量的显存分配和释放操作后，可能会出现显存碎片化的问题，使用该函数可以尝试回收一些未使用的显存。

import torch

# 分配一些显存
x = torch.randn(1000, 1000).cuda()
del x
# 释放未使用的缓存显存
torch.cuda.empty_cache()

10. `torch.cuda.memory_stats(device)`显存管理

该函数会返回指定 CUDA 设备的详细显存统计信息，包括分配和释放的次数、峰值显存使用量等。

import torch

if torch.cuda.is_available():
    stats = torch.cuda.memory_stats(0)
    print(stats)

11. `torch.cuda.synchronize(device=None)` 同步操作

该函数会阻塞当前的 CPU 线程，直到指定的 CUDA 设备完成所有的异步操作。在进行精确的性能测量时，使用该函数可以确保所有的 CUDA 操作都已经完成。

import torch
import time

x = torch.randn(1000, 1000).cuda()
y = torch.randn(1000, 1000).cuda()

# 开始计时
start = time.time()
z = torch.matmul(x, y)
# 同步操作，确保矩阵乘法完成
torch.cuda.synchronize()
end = time.time()

print(f"Matrix multiplication took {end - start} seconds.")

12. `torch.cuda.Event(enable_timing=True)` 事件记录

可以使用 torch.cuda.Event 类来记录 CUDA 操作的时间。它可以创建事件对象，用于标记 CUDA 操作的开始和结束，从而精确测量 CUDA 操作的执行时间。

import torch

start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

x = torch.randn(1000, 1000).cuda()
y = torch.randn(1000, 1000).cuda()

# 记录开始事件
start.record()
z = torch.matmul(x, y)
# 记录结束事件
end.record()

# 等待结束事件完成
torch.cuda.synchronize()

# 计算时间差
elapsed_time = start.elapsed_time(end)
print(f"Matrix multiplication took 0.0612 milliseconds.")

CUDA 显存统计信息

1. 显存池分类

PyTorch 的显存分配器将显存分为两类内存池：

Large Pool：用于分配较大的显存块
Small Pool：用于分配较小的显存块

2. 核心统计指标

以下是最值得关注的字段：

字段名称	说明
`allocated_bytes.all.peak`	显存分配的峰值（以字节为单位），代表程序运行过程中显存使用的最高点。
`active_bytes.all.peak`	活跃显存的峰值，即未被释放的显存的最高使用量。
`reserved_bytes.all.peak`	预留显存的峰值，即分配器预先申请的显存总量的最高值。
`num_ooms`	Out-Of-Memory (OOM) 错误次数，显存不足时触发的错误次数。

3. 显存状态分类

Active (活跃显存)

active.all.allocated: 活跃显存块的总分配次数。
active.all.freed: 活跃显存块的总释放次数。
active_bytes.all.allocated: 活跃显存的总分配字节数（当前已分配但未释放的显存）。
active_bytes.all.current: 当前活跃显存的字节数（可能已被释放但未回收）。

Allocated (已分配显存)

allocated_bytes.all.allocated: 总分配的显存字节数（包括已释放的）。
allocated_bytes.all.peak: 显存分配的峰值字节数。

Reserved (预留显存)

reserved_bytes.all.allocated: 分配器预留的显存总量（包括未被使用的部分）。
reserved_bytes.all.peak: 预留显存的峰值。

4. 内存碎片与回收

字段名称	说明
`inactive_split_bytes`	未激活的拆分显存：分配器保留但未被使用的显存（可能因碎片化无法重用）。
`oversize_allocations`	超大显存块：无法放入内存池的显存块（直接通过 CUDA 分配，可能引发碎片）。
`num_alloc_retries`	显存分配失败后重试次数（高数值可能表示碎片化严重）。

import torch
import time
import psutil


def check_cuda_availability():
    """检查 CUDA 是否可用"""
    if torch.cuda.is_available():
        print("CUDA is available.")
    else:
        print("CUDA is not available.")


def get_device_info():
    """获取 CUDA 设备的相关信息"""
    device_count = torch.cuda.device_count()
    print(f"Number of available CUDA devices: {device_count}")
    for i in range(device_count):
        device_name = torch.cuda.get_device_name(i)
        properties = torch.cuda.get_device_properties(i)
        print(f"Device {i}: Name={device_name}, Total memory={properties.total_memory / 1024 ** 2:.2f} MB")


def set_and_get_current_device():
    """设置并获取当前使用的 CUDA 设备"""
    torch.cuda.set_device(0)
    device_index = torch.cuda.current_device()
    print(f"Current CUDA device index: {device_index}")


def manage_memory():
    """显存管理操作，包括分配、释放和统计"""
    x = torch.randn(1000, 1000).cuda()
    allocated = torch.cuda.memory_allocated(0)
    cached = torch.cuda.memory_reserved(0)
    print(f"Before deletion: Allocated={allocated / 1024 ** 2:.2f} MB, Cached={cached / 1024 ** 2:.2f} MB")
    del x
    torch.cuda.empty_cache()
    allocated = torch.cuda.memory_allocated(0)
    cached = torch.cuda.memory_reserved(0)
    print(f"After deletion and cache empty: Allocated={allocated / 1024 ** 2:.2f} MB, Cached={cached / 1024 ** 2:.2f} MB")
    stats = torch.cuda.memory_stats(0)
    print("Memory stats:", stats)


def measure_operation_time():
    """使用事件记录测量 CUDA 操作的时间"""
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)

    x = torch.randn(1000, 1000).cuda()
    y = torch.randn(1000, 1000).cuda()

    start.record()
    z = torch.matmul(x, y)
    end.record()

    torch.cuda.synchronize()
    elapsed_time = start.elapsed_time(end)
    print(f"Matrix multiplication took 0.0612 milliseconds.")


def monitor_gpu_memory(device_id=0, interval=1):
    """动态监控指定 CUDA 设备的显存使用情况"""
    while True:
        if torch.cuda.is_available():
            total = torch.cuda.get_device_properties(device_id).total_memory
            allocated = torch.cuda.memory_allocated(device_id)
            cached = torch.cuda.memory_reserved(device_id)
            utilization = (allocated / total) * 100

            cpu_percent = psutil.cpu_percent()
            mem_percent = psutil.virtual_memory().percent

            print(f"Device {device_id}: "
                  f"Total={total / 1024 ** 2:.2f} MB, "
                  f"Allocated={allocated / 1024 ** 2:.2f} MB ({utilization:.2f}%), "
                  f"Cached={cached / 1024 ** 2:.2f} MB, "
                  f"CPU={cpu_percent:.2f}%, "
                  f"Memory={mem_percent:.2f}%")
        time.sleep(interval)


if __name__ == "__main__":
    check_cuda_availability()
    get_device_info()
    set_and_get_current_device()
    manage_memory()
    measure_operation_time()
    # 取消注释以启动动态显存监控
    # monitor_gpu_memory()