解决ollama无法长时间保存在内存中的问题

最新推荐文章于 2025-03-20 15:08:53 发布

格瑞Lxf

最新推荐文章于 2025-03-20 15:08:53 发布

阅读量4.7k

点赞数 5

文章标签： chrome 前端

本文链接：https://blog.csdn.net/China_boy007/article/details/136445870

版权

一、每次发出请求加载模型时，定义一个keep_alive变量，说明要存在多长时间。

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false,
  "keep_alive": "24h"
}'

二、又或者可以每280秒加载一次模型，因为模型每五分钟自动删除，由于加载模型只需1ms，所以可以选择这种方案：


import requests
import time
from datetime import datetime
import pytz

def get_bj_time():
    beijing_tz = pytz.timezone('Asia/Shanghai')
    return datetime.now(beijing_tz).strftime("%Y-%m-%d %H:%M:%S")

while True:

    data = {"model": "qwen:7b", "keep_alive": "5m"}
    headers = {'Content-Type': 'application/json'}
    high_precision_time = time.perf_counter()
    response = requests.post('http://localhost:11434/api/generate', json=data, headers=headers)
    high_precision_time_end = time.perf_counter()
    time1 = high_precision_time_end-high_precision_time
    print(f"高精度时间（精确到微秒）: {time1*1000:.6f}")
    jsonResponse = response.content.decode('utf-8')  # 将 bytes 转换为字符串以便打印
    print(jsonResponse)
    print(f"当前北京时间：{get_bj_time()}")
    time.sleep(280)  # 暂停280秒后再次执行

    '''
    7b初次加载模型时间：3.867187177s， 第二次加载模型时间：0.766666ms
    14b初次加载模型时间：5.180146173s , 第二次加载模型时间：0.753414ms
    72b初次加载模型时间：16.991763358s，第二次加载模型时间：1.358505ms

关注博主即可阅读全文