ollama如何保持模型加载在内存（显存）中或立即卸载

点动生态云

于 2024-07-24 10:20:57 发布

阅读量9.4k

点赞数 12

文章标签： python llama 语言模型

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_63782093/article/details/140652749

版权

一、ollama如何保持模型加载在内存中或立即卸载？

默认情况下，模型在生成响应后会在内存中保留 5 分钟。这允许在您多次请求 LLM 时获得更快的响应时间。然而，您可能希望在 5 分钟内释放内存，或者希望模型无限期地保留在内存中。使用 keep_alive 参数与 /api/generate 或 /api/chat API 端点，可以控制模型在内存中保留的时间。

keep_alive 参数可以设置为：

一个持续时间字符串（例如 “10m” 或 “24h”）
一个以秒为单位的数字（例如 3600）
任何负数，这将使模型无限期地保留在内存中（例如 -1 或 “-1m”）
‘0’ 这将使模型在生成响应后立即卸载

例如，要预加载模型并使其保留在内存中，请使用：

curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": -1}'

要卸载模型并释放内存，请使用：

curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": 0}'

或者，可以通过在启动 Ollama 服务器时设置环境变量 OLLAMA_KEEP_ALIVE 来更改所有模型在内存中保留的时间。OLLAMA_KEEP_ALIVE 变量使用与上述 keep_alive 参数相同的参数类型。

如果希望覆盖 OLLAMA_KEEP_ALIVE 设置，请使用 keep_alive API 参数与 /api/generate 或 /api/chat API 端点。

二、在启动时添加OLLAMA_KEEP_ALIVE环境参数

1. 停止ollama服务

docker stop ollama

2.移除ollama服务

docker rm ollama

3.加上参数进行启动

docker run -d --gpus=all -e OLLAMA_KEEP_ALIVE=-1 --restart=always -v /home/docker/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

三、查看是否设置成功

docker exec -it ollama env

在这里插入图片描述

点动生态云

博客等级

码龄4年

点动科技

37
原创

519
点赞

613
收藏

321
粉丝

关注

私信

热门文章

最新评论

阿里声音项目Qwen2-Audio的部署安装，在服务器Ubuntu22.04系统——点动科技
qq_47572425: 你好，请问你的问题解决了吗？
Cosyvoice的部署，在Ubuntu22.04系统下——点动科技
huifengzhiye: 运行这条命令pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com报两处错误，一处ERROR: Ignored the following yanked versions: 2.8.0, 3.0.7, 3.1.2, 3.13.1, 3.45.0b9, 3.45.0b10, 3.45.0b11, 3.45.0b12, 3.45.0b13, 4.0.0b15, 4.7.0 ，一处ERROR: Could not find a version that satisfies the requirement gradio==5.4.0 (from versions:
阿里声音项目Qwen2-Audio的部署安装，在服务器Ubuntu22.04系统——点动科技
playis: windows 下直接运行.py文件好像只有页面没有操作功能没有上传和录音
阿里声音项目Qwen2-Audio的部署安装，在服务器Ubuntu22.04系统——点动科技
playis: 奇怪没有发送和上传页面有了加载也显示了 python .\web_demo_audio.py Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 6.64it/s] generation_config GenerationConfig { "chat_format": "chatml", "do_sample": true, "eos_token_id": [ 151643, 151645 ], "max_new_tokens": 2048, "pad_token_id": 151643, "repetition_penalty": 1.1, "temperature": 0.7, "top_k": 20, "top_p": 0.5 } C:\Users\LING\.conda\envs\qwen2\lib\site-packages\modelscope_studio\utils\dev\app_context.py:18: UserWarning: <modelscope-studio>: Cannot find the `Application` component, did you forget to import it from `modelscope_studio.components.base`? warnings.warn( C:\Users\LING\.conda\envs\qwen2\lib\site-packages\gradio\utils.py:1002: UserWarn
服务器Ubuntu22.04系统下的Stale diffusion+Webui部署安装
点动生态云: 不是版本的问题，主要是你模型还有提示词的问题，需要去多参考别人的模型，还有一些提示词的使用

最新文章

目录

评论 6

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。