OLLAMA 运行模型时无法使用GPU

1、启动时日志中显示的 Dynamic LLM librariesrunners=[cpu],这表明 Ollama 没有利用 GPU

[root@test01 ~]# ollama serve
2025/02/13 15:20:46 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy:http://10.25.102.200:3128 https_proxy:http://10.25.102.200:3128 no_proxy:]"
time=2025-02-13T15:20:46.140+08:00 level=INFO source=images.go:432 msg="total blobs: 31"
time=2025-02-13T15:20:46.141+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

using env: export GIN_MODE=release
using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2025-02-13T15:20:46.142+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-02-13T15:20:46.142+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu]
time=2025-02-13T15:20:46.142+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"

time=2025-02-13T15:20:47.373+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-53bb8550-8677-3dce-f0fd-6c764a95b2ab library=cuda variant=v12 compute=6.1 driver=12.4 name="Tesla P4" total="7.4 GiB" available="7.3 GiB"

2、检查ollama的安装目录结构,ollama二进制文件和runners要保证以下结构:

        Binary:/xxx/yyy/bin/ollama
        Runners:/xxx/yyy/lib/ollama/runners/

        同时运行ollama时使用绝对路径 /xxx/yyy/bin/ollama serve而不是ollama serve

        正常运行后能加载到runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx]:

[root@test01 ~]# /usr/local/bin/ollama serve
2025/02/13 15:34:54 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:cuda OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy:http://10.25.102.200:3128 https_proxy:http://10.25.102.200:3128 no_proxy:]"
time=2025-02-13T15:34:54.956+08:00 level=INFO source=images.go:432 msg="total blobs: 31"
time=2025-02-13T15:34:54.958+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2025-02-13T15:34:54.960+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-02-13T15:34:54.960+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx]"
time=2025-02-13T15:34:54.960+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-02-13T15:34:56.186+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-53bb8550-8677-3dce-f0fd-6c764a95b2ab library=cuda variant=v12 compute=6.1 driver=12.4 name="Tesla P4" total="7.4 GiB" available="7.3 GiB"

### 配置Ollama DeepSeek模型以利用GPU加速 #### 环境准备 对于最低配置,CPU需支持AVX2指令集,配备16GB内存以及至少30GB的存储空间;为了更佳性能体验,推荐使用NVIDIA GPU(如RTX 3090或以上型号),搭配32GB内存和不少于50GB的硬盘容量来满足项目需求[^1]。 操作系统方面兼容Windows、macOS或Linux平台,在这些平台上均能顺利开展工作。当计划采用Open Web UI作为界面交互方式,则还需预先完成Docker环境搭建。 #### 安装与配置过程 确保已安装好上述提及的操作系统之一,并确认计算机硬件条件达到建议标准特别是具备合适的显卡设备之后: - **安装CUDA Toolkit**:由于要借助GPU实现运算速度提升,因此先访问[NVIDIA官方网站](https://developer.nvidia.com/cuda-downloads),依据个人电脑的具体情况下载对应版本的CUDA驱动程序并按照指引完成整个流程。 - **获取cuDNN库文件**:接着前往相同网站寻找适用于所选CUDA版本号的[cuDNN资源包](https://developer.nvidia.com/rdp/cudnn-archive),解压后将其内含有的bin/include/lib三个目录下的全部内容复制到相应位置下以便后续调用。 - **建立虚拟环境**:考虑到不同项目的特殊性可能会引起Python解释器及其附带模块间存在冲突问题,故而提倡创建独立于全局站点之外的新区域专门存放此次实验所需的各类组件。可以运用`conda create --name ollama_env python=3.8`这样的语句快速构建起名为ollama_env的基础框架结构(这里假设选用的是Python 3.8.x系列)。 ```bash conda activate ollama_env ``` - **安装PyTorch及其他依赖项**:激活刚才新建好的专属领域后紧接着执行如下命令行脚本从而引入必要的第三方扩展件集合体,其中特别指定了torchvision/torchaudio两个子项目同也明确了希望获得针对特定图形处理器优化过的二进制发行版形式(py3.8_cu117代表基于Python 3.8编译而成且适配Compute Capability 8.0及以上等级CUDA核心架构的产品线)。 ```bash pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 ``` - **克隆仓库代码至本地机器上**:最后一步则是通过Git工具把官方维护着的目标源码同步过来供下一步操作之便。打开终端窗口输入下面给出的一串字符就能达成目的了。 ```bash git clone https://github.com/Ollama-Org/deepseek.git cd deepseek ``` 此已经成功地将所有前置准备工作处理完毕,接下来就可以着手编写具体的应用逻辑部分啦! #### 启动服务 进入刚刚拉取下来的deepseek文件夹内部,参照README.md文档中的指导说明调整参数设定直至符合预期效果为止。通常情况下只需简单修改几处地方即可开启测试模式查看实际表现状况如何: ```bash python setup.py develop export CUDA_VISIBLE_DEVICES=0 # 设定可见GPU编号, 若有多张可自行指定 python app/main.py --gpu-acceleration true ``` 这样就完成了从零开始一直到最终运行起来整个过程中涉及到的各项任务安排事项介绍。当然这只是最基础层面的内容概括而已,更多高级特性等待探索发现呢!
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值