1. pytorch 指定使用GPU卡
export CUDA_VISIBLE_DEVICES=0,1
python your_script.py
CUDA_VISIBLE_DEVICES=0,1 python your_script.py (常用方法)
import torch
torch.cuda.device(0)
torch.cuda.device(1)
import os
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
2. 模型加载时指定device
# 通过.to(device)的形式,可以正常加载模型。device = "cuda:3"
model = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
trust_remote_code=True,
) .to(device)
# 在加载模型内部指定device=device,则报错 “RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'。
model = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
trust_remote_code=True,
device= device 可修改为:device_map=device
)
#
3. device_map
首先,检查num_gpus(即GPU数量)是否小于2。其次,检查变量device_map是否为None。device_map通常被用于定义模型如何在多个设备上分配,如果它为None,则表示模型应在单个设备上运行。通常设置device_map=2