使用ollama创建长上下文的LLM

最新推荐文章于 2025-03-18 15:53:07 发布

1024点线面

最新推荐文章于 2025-03-18 15:53:07 发布

阅读量2.7k

点赞数 26

文章标签： python llm ollama 大模型

本文链接：https://blog.csdn.net/yuand7/article/details/143800780

版权

编写 Modelfile

# cat Modelfile
FROM qwen2.5:7b

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER repeat_penalty 1.1
PARAMETER top_k 20
PARAMETER num_ctx 131072
PARAMETER repeat_last_n 512

TEMPLATE """{{ if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{ .System }}
{{- if .Tools }}

# Tools

You are provided with function signatures within <tools></tools> XML tags:
<tools>{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}"""

# set the system message
SYSTEM """You are data enginner, created by Smart Data. You are a helpful assistant."""

上述模版文件（Modelfile）定义了一个基于 Qwen-2.5 模型的配置，用于设置模型的行为参数以及定义输入输出的格式。下面是对该文件中关键内容的详细解释：

1. 模型来源

FROM qwen2.5:7b

这一行指定了使用的模型基础为 Qwen-2.5，大小为 7 B（即 70 亿参数）。

2. 参数设置

接下来几行定义了模型运行时的一些重要参数：

temperature: 控制输出的随机性，值越高，输出越随机；值越低，输出越确定。这里设置为 0.3，意味着输出相对较为确定。
top_p: 核采样比例，控制从概率最高的词汇中选择下一个词的概率。设置为 0.8 表示从概率最高的 80%词汇中选择。
repeat_penalty: 重复惩罚系数，用于减少重复单词或短语的出现频率。设置为 1.1 意味着对于已经出现过的单词或短语，其再次被选中的概率会被适当降低。
top_k: 核采样的另一个形式，直接指定选择下一个词时考虑的最高概率词汇的数量。这里设置为 20。
num_ctx: 上下文窗口大小，决定了模型在生成新内容时能记住的历史信息量。设置为 131072（约 128 KB）。
repeat_last_n: 设置模型在生成过程中避免重复最后多少个 token。这里设置为 512。

3. 模板定义

模板部分定义了如何根据不同的输入格式化输出。它使用了 Go 语言的模板语法：

系统消息和工具定义：
```
{{- if or .System .Tools }}
...
{{- end }}
```
如果存在系统消息或工具定义，则这部分会被包含在内。系统消息通常用于给模型提供背景信息或指示其行为方式。工具定义则允许模型调用外部函数或服务。
消息循环：
```
{{- range $i, $_ := .Messages }}
...
{{- end }}
```
遍历所有消息，根据消息的角色（用户、助手或工具）来决定如何格式化输出。

创建自定义模型

原模型上下文是 32 k：

# ollama show qwen2.5:7b
Model
  architecture        qwen2
  parameters          7.6B
  context length      32768
  embedding length    3584
  quantization        Q4_K_M

System
  You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

License
  Apache License
  Version 2.0, January 2004

创建：

# ollama create qwen2.5_7b_128k -f Modelfile
transferring model data
using existing layer sha256:2bada8a7450
using existing layer sha256:832dd9e00a68dd
creating new layer sha256:9bebd78bf5bc92d41d5f
creating new layer sha256:91fa2d221c96837826e2
creating new layer sha256:f3599d242073531df0
creating new layer sha256:7af5a1b7098c58f9
writing manifest
success

可以再次执行 show 命令，可以看到参数部分是我们自定义的：

# ollama show qwen2.5_7b_128k
Parameters
num_ctx           131072
repeat_last_n     512
repeat_penalty    1.1
temperature       0.3
top_k             20
top_p             0.8

一般而言，对于英语文本，1个token大约是3~4个字符；而对于中文文本，则大约是1.5~1.8个汉字。据此估算，所谓支持 20 万字长上下文就是 128 k 的 tokens。

由于 Qwen 模型是因果语言模型，理论上整个序列只有一个长度限制。然而，由于在训练中通常存在打包现象，每个序列可能包含多个独立的文本片段。模型能够生成或完成的长度最终取决于具体的应用场景，以及在这种情况下，预训练时每份文档或后训练时每轮对话的长度。

Qwen2 模型可以处理 32K 或 128K token 长的文本，其中 8K 长度可作为输出。

在这里插入图片描述

代码测试

import ollama

with open('./file-name.md', 'r', encoding='utf-8') as file:
    file_content = file.read()

response = ollama.generate(
    model='qwen2.5_7b_128k',
    prompt=f'文件内容：{file_content}，简要回答这个文件讲了什么?然后基于文件内容提取出50个复关键词',
)

print(response['response'])