如何实现调用Ollama一次性聊天，不会有历史聊天记录？

玩人工智能的辣条哥

于 2025-05-10 09:42:16 发布

阅读量1k

点赞数 37

分类专栏：人工智能文章标签：大模型 ollama ubuntu

本文链接：https://blog.csdn.net/weixin_42672685/article/details/147835467

版权

人工智能专栏收录该内容

77 篇文章

订阅专栏

环境：

Ubuntu20.04

ollama 0.6.4

问题描述：

给用户总结资料，不同用户会有不同背景资料，总结2次就不用了，有很多接口调用ollama聊天一两次就换个角色，为了不让他们聊天内容混乱？

在这里插入图片描述

解决方案：

为了在使用Ollama时避免不同角色或会话的聊天内容混淆，可以通过以下方法清除历史记录或隔离上下文：

方法一：通过API调用时重置上下文

在每次发送请求时，不传递历史消息或显式清空上下文参数（不同接口可能不同）：

import ollama

# 每次新对话时，使用全新的messages数组，不保留历史
response = ollama.chat(
    model='llama2',
    messages=[
        {'role': 'user', 'content': '你这次的角色是厨师，请回答...'}
    ]
)
# 下次调用时重新初始化messages，不携带之前的记录

方法二：使用`keep_alive`参数控制上下文保留时间

设置 keep_alive 参数为极短时间（如0s），使模型在请求完成后立即释放上下文：

response = ollama.chat(
    model='llama2',
    messages=[...],
    options={'keep_alive': '0s'}  # 关闭上下文保留
)

方法三：手动调用API清除模型实例

如果使用本地部署的Ollama服务，可通过REST API删除当前模型实例（强制释放内存中的上下文）：

curl http://localhost:11434/api/delete -d '{"name": "llama2"}'

然后重新拉取模型：

curl http://localhost:11434/api/pull -d '{"name": "llama2"}'

方法四：为不同角色创建独立会话

如果框架支持（如使用编程库），为每个角色初始化独立的会话对象，避免交叉污染：

# 示例：使用不同会话对象
session1 = ollama.chat(model='llama2', messages=[])
session2 = ollama.chat(model='llama2', messages=[])

# 角色A使用session1
session1.send('作为厨师...')

# 角色B使用session2
session2.send('作为医生...')

方法五：修改Ollama启动配置（高级）

编辑Ollama配置文件（通常位于 ~/.ollama/config.json），调整上下文管理策略（需查阅官方文档确认支持参数）。

总结

简单场景：每次调用不传递历史消息（messages只保留当前对话）。
频繁切换角色：使用keep_alive: '0s'或独立会话。
彻底清理：重启Ollama服务或删除模型后重新加载。

本案例具体实现：

import requests
import json

OLLAMA_URL = "http://192.168.28.13:11434/v1/chat/completions"

def send_message(user_input):
    messages = [
        {
            "role": "system",
            "content": (
                "【角色设定】\n"
                "你的身份是一名资深的方案生成专家，精准提供专业方案生成。\n\n"
                "【任务要求】\n"
        },
        {
            "role": "user",
            "content": user_input
        }
    ]

    payload = {
        "model": "qwen2.5-7b:latest",
        "messages": messages,
        "max_tokens": 8192,
        "temperature": 0.5,
        "stream": False
    }

    try:
        print(f"发送请求: {json.dumps(payload, ensure_ascii=False)}")
        response = requests.post(OLLAMA_URL, json=payload, timeout=30)
        print(f"状态码: {response.status_code}")
        if response.status_code != 200:
            print(f"错误响应: {response.text}")
            return "服务端返回错误，请检查日志。"

        result = response.json()
        choices = result.get("choices", [])
        if not choices:
            return "无回复内容"

        content = choices[0].get("message", {}).get("content", "")
        return content

    except requests.exceptions.Timeout:
        print("请求超时，请重试。")
        return "请求超时，请重试。"
    except requests.exceptions.RequestException as e:
        print(f"请求失败: {str(e)}")
        return "请求失败，请检查网络连接或服务状态。"
    except Exception as e:
        print(f"内部错误: {str(e)}")
        return "内部错误，请联系管理员。"

def collect_multiline_input():
    print("请输入您的问题（支持多行输入，输入单独一行END结束）：")
    lines = []
    while True:
        line = input()
        if line.strip() == "END":
            break
        lines.append(line)
    return "\n".join(lines)

if __name__ == "__main__":
    print("欢迎使用简单聊天应用！")
    print("您可以输入问题，我会尽力回答。输入“退出”或“exit”结束对话。")
    while True:
        user_input = collect_multiline_input()
        if user_input.lower() in ["退出", "exit", "quit"]:
            print("聊天结束。感谢使用！")
            break
        reply = send_message(user_input)
        print("AI: ", reply)

上面这段代码是单次独立对话的实现，不会保留任何聊天记录。具体分析如下：

关键设计解析：

消息列表初始化：
```
messages = [
    {"role": "system", "content": "..."},  # 固定的系统提示
    {"role": "user", "content": user_input} # 仅包含当前输入
]
```
每次请求都会新建消息列表，仅包含：
- 固定不变的 system 角色提示（高考专家设定）
- 用户本次输入的内容（不含历史对话）
无状态服务：
- 没有全局变量、数据库或文件存储对话历史
- 每次请求都是完全独立的 HTTP POST

代码执行流程：

while True:
    user_input = collect_multiline_input()  # 获取本次输入
    reply = send_message(user_input)        # 发送独立请求
    print(reply)                            # 输出后不保存回复

典型表现示例：

用户第一次输入：
```
我考了650分，能上清华吗？
```
→ 消息列表仅包含系统提示 + 这个问题
用户第二次输入：
```
那我的分数能报计算机专业吗？
```
→ 模型不知道之前的650分信息，因为消息列表被重置

优点与局限：

优点	局限
1. 实现简单	1. 无法支持多轮对话
2. 节省内存	2. 用户需重复提供信息
3. 避免上下文污染	3. 无法实现连贯交流

改为多轮对话的方法：

若需保留上下文，需修改代码：

# 在循环外初始化历史记录
chat_history = [
    {"role": "system", "content": "..."}
]

while True:
    user_input = collect_multiline_input()
    # 每次添加用户输入和AI回复到历史
    chat_history.append({"role": "user", "content": user_input})
    reply = send_message(chat_history)
    chat_history.append({"role": "assistant", "content": reply})