cuda--docker

https://zhuanlan.zhihu.com/p/632912924

需要安装cuda工具包
https://developer.nvidia.com/cuda-toolkit-archive

配置环境变量,如果是本地安装

export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

source <(kubectl completion bash)
export LANGUAGE="en_US.UTF-8"
export LANG=en_US:zh_CN.UTF-8
export LC_ALL=C

Dockerfile nvidia

FROM nvcr.io/nvidia/pytorch:24.06-py3
RUN pip install vllm openai sse_starlette -i https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install peft transformers datasets accelerate deepspeed tensorboard \
    fire packaging ninja openai gradio -i https://pypi.tuna.tsinghua.edu.cn/simple

处理nvidia-smi执行后结果显示很慢的问题,安装fabric-manager

version=535.54.03
yum -y install yum-utils  nvidia-docker2
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1 nvidia-fabric-manager-devel-${version}-1

安装cuda
安装 nvidia驱动 nvidia-docker2

cuda:https://developer.nvidia.com/cuda-toolkit-archive
nvidid: https://download.nvidia.com/

cuda-rhel7.repo

 cat cuda-rhel7.repo 
[cuda-rhel7-x86_64]
name=cuda-rhel7-x86_64
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/D42D0685.pub

亦庄

FROM nvcr.io/nvidia/pytorch:23.10-py3
RUN pip install --upgrade pip && \
    pip install --no-cache-dir vllm==0.4.3 openai sse_starlette spacy torch typer torch-tensorrt torchdata torchtext torchvision weasel --upgrade --upgrade-strategy=only-if-needed  -i https://pypi.tuna.tsinghua.edu.cn/simple

nvidia-docker有时候拉取不下来

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
yum install --downloadonly nvidia-docker2 --downloaddir=/tmp/nvidia

nvidia-fabric-manager 加快调用nvidia

yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1
yum install -y nvidia-fabric-manager-devel-${version}-1 

https://developer.aliyun.com/mirror/centos?spm=a2c6h.13651102.0.0.3e221b116j42Ya
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
yum -y install epel-release

[base]
baseurl=http://mirror.centos.org/centos/ r e l e a s e v e r / o s / releasever/os/ releasever/os/basearch/
mirrorlist=http://mirrorlist.centos.org/?release=KaTeX parse error: Expected 'EOF', got '&' at position 11: releasever&̲arch=basearch&repo=os
gpgcheck=1
gpgkey=file:///etc/pki/rpm-pgg/RPM-GPG-KEY-CentOS-6
[update]
baseurl=http://mirror.centos.org/centos/ r e l e a s e v e r / u p d a t e s / releasever/updates/ releasever/updates/basearch/
mirrorlist=http://mirrorlist.centos.org/?release=KaTeX parse error: Expected 'EOF', got '&' at position 11: releasever&̲arch=basearch&repo=updates

处理nvidia-smi执行后结果显示很慢的问题
version=535.54.03
yum -y install yum-utils nvidia-docker2
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager- v e r s i o n − 1 n v i d i a − f a b r i c − m a n a g e r − d e v e l − {version}-1 nvidia-fabric-manager-devel- version1nvidiafabricmanagerdevel{version}-1

import torch
torch.cuda.is_available()
cuda_version = torch.version.cuda
print(f"CUDA version: {cuda_version}")


          limits:
            cpu: "16"
            memory: 50Gi
            tencent.com/vcuda-core: "800"
            tencent.com/vcuda-memory: "32"
          requests:
            cpu: "16"
            memory: 50Gi
            tencent.com/vcuda-core: "800"
            tencent.com/vcuda-memory: "32"

import torch
 
# 检查CUDA是否可用
torch.cuda.is_available()
 
# 获取CUDA版本
cuda_version = torch.version.cuda
print(f"CUDA version: {cuda_version}")


指定系统架构下载镜像

docker pull --platform linux/arm64

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值