操作系统: ubuntu22.04
显卡: V100/RTX4090
CUDA: 12.1
驱动: 530
Python: 3.12
1.禁用nouveau驱动
vim /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
2.更新initramfs并重启系统:
sudo update-initramfs -u
sudo reboot
3.下载并安装cuda以及显卡驱动:
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sh cuda_12.1.0_530.30.02_linux.run #命令行界面安装教程自行搜索,需要输入accept同意协议
apt-get install nvidia-driver-530 #装驱动
4.编译安装python:
wget https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz
apt update
apt install -y build-essential libreadline-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev zlib1g-dev libffi-dev liblzma-dev python3-openssl wget libdb-dev
tar -xvf Python-3.12.0.tgz
cd Python-3.12.0
./configure --prefix=/usr/local/python3.12
make -j 60
make install
5.下载库:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.7.1 -i https://pypi.tuna.tsinghua.edu.cn/simple #最新的0.7.2会爆显存
pip3 install modelscope -i https://pypi.tuna.tsinghua.edu.cn/simple #下载量化模型要用
6.模型下载:
vim download.py
from modelscope import snapshot_download
snapshot_download('tclf90/deepseek-r1-distill-qwen-32b-gptq-int8', cache_dir="/data/deepseek", revision='g128')
#下载的模型版本,以及保存路径
#保存执行
python3 download.py
7.启动:单张26G显存占用 0.45 23G左右 0.4没起来 1s/40token #这里用的V100显卡*2
vllm serve /data/deepseek/tclf90/deepseek-r1-distill-qwen-32b-gptq-int8/ --tensor-parallel-size 2 --max-model-len 10000 --gpu_memory_utilization=0.5 --enable-chunked-prefill
#--tensor-parallel-size 2 显卡数量
#--gpu_memory_utilization=0.5 显卡利用率,默认0.9,我这里还运行了其他模型,所以填的低
8.也可以部署openwebui来做前端:访问:ip+8080端口
pip3 install open-webui -i https://pypi.tuna.tsinghua.edu.cn/simple
ln -s /usr/local/python3.12/bin/open-webui /usr/local/bin/open-webui
open-webui serve #启动界面
9.注册管理账号:
10:配置:
然后就可以正常使用了