ollama简单上手

橙汁啤

已于 2025-01-12 10:39:50 修改

阅读量3.3k

点赞数 14

文章标签：数据库

于 2025-01-11 14:14:04 首次发布

本文链接：https://blog.csdn.net/qq_46345319/article/details/145076446

版权

ollama简单上手

快速入门 - Ollama 中文文档

一、ollama部署

1.部署

官网地址：Download Ollama on Windows
下载完直接双击安装即可
在这里插入图片描述

2.指定部署地址

在终端输入：
OllamaSetup.exe /DIR=“F:\ProgramData\ollama_location”

二、指定模型下载地址

在系统环境或者用户环境中，添加：
变量：OLLAMA_MODELS，值：指定地址
在这里插入图片描述

三、创建自己的模型

huggingface官网：Qwen/Qwen2.5-Coder-7B-Instruct-GGUF at main
在huggingface官网下载自己想要的模型（guff格式的模型）

1.在huggingface下载格式为guff的模型

在这里插入图片描述

2.下载想要的模型

3.创建Modelfile.txt

内容为：FROM 模型的地址（引入自己下载的模型）
在这里插入图片描述

ollama自带的模型

引入ollama自带的模型FROM ./ollama-model.bin

4.创建自己的模型（在ollama中加入自己的模型）

代码、ollama create mymodel -f ./Modelfile
ollama create qwen_ces -f Modelfile.txt （绝对路径或者相对路径)

四、ollama常见命令

1. 创建模型

ollama create 用于从 Modelfile 创建模型

ollama create mymodel -f ./Modelfile

2.拉取模型

ollama pull llama3.2

3.删除模型

ollama rm llama3.2

4.复制模型

ollama cp llama3.2 my-model

5.多行输入

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

6.多模态模型

ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
The image features a yellow smiley face, which is likely the central focus of the picture.

7.将提示作为参数传递

$ ollama run llama3.2 "Summarize this file: $(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

8.显示模型信息

ollama show llama3.2

9.列出你计算机上的模型

ollama list

10. 列出当前已加载的模型

ollama ps

11.停止当前正在运行的模型

ollama stop llama3.2

五、Modelfile模型文件（制定特定gpt）

模型文件是创建和与 Ollama 共享模型的蓝图；在基础模型上，定制特定的gpt。

1.modelfile文件格式

命令	描述
FROM (必需的)	定义使用的基模型
PARAMETER(参数)	设置Ollama运行模型的参数
TEMPLATE(提示词模板)	于发送给模型的完整提示模板
SYSTEM	指定将在模板中设置的系统消息
ADAPTER	定义适用于模型的（Q）LoRA适配器
LICENSE	Specifies the legal license.
MESSAGE	指定消息历史

2.查看模型的Modelfile

格式：ollama show --modelfile 模型

ollama show --modelfile qwen_ces

3.命令语句

3.1FROM（必需）

FROM llama2

3.2TEMPLATE

PARAMETER命令定义了一个可以在模型运行时设置的参数。

Parameter	描述	值的类型	使用示例
mirostat	启用Mirostat算法以控制困惑度(perplexity)。 Mirostat算法可以有效减少结果中重复的发生。perplexity是指对词语预测的不确定性 (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)	int	mirostat 0
mirostat_eta	它影响算法对生成文本反馈的响应速度。学习率较低会导致调整更慢，而较高的学习率则会使算法反应更加迅速。 (Default: 0.1)	float	mirostat_eta 0.1
mirostat_tau	控制输出的连贯性和多样性之间的平衡。较低的值会使得文本更集中和连贯，而较高的值则会带来更大的多样性。 (Default: 5.0)	float	mirostat_tau 5.0
num_ctx	设置生成下一个token时使用的上下文窗口大小。(Default: 2048)	int	num_ctx 4096
repeat_last_n	设定了模型需要回顾多少信息来以防止重复。 (Default: 64, 0 = disabled, -1 = num_ctx)	int	repeat_last_n 64
repeat_penalty	设定了重复惩罚的强度。较高的值（例如，1.5）会更强烈地处罚重复，而较低的值（如0.9）则会宽容一些. (Default: 1.1)	float	repeat_penalty 1.1
temperature	模型的温度。 temperature通常用于控制随机性和多样性，提高温度意味着更高的随机性，可能导致更出乎意料但可能更有创意的答案。(Default: 0.8)	float
seed	设置了生成时使用的随机数种子。设置特定的数值将使得模型对于相同的提示会生成相同的文本。(Default: 0)	int	seed 42
stop	设置停止序列。当模型遇到这个模式时，会停止生成文本并返回。可以通过在Modelfile中指定多个独立的stop参数来设置多个停止模式。	string	stop “AI assistant:”
tfs_z	尾部自由采样被用来减少不那么可能的token对输出的影响。较高的值（例如，2.0）会更大幅度地减小这种影响，而设置为1.0则禁用此功能。(default: 1)	float	tfs_z 1
num_predict	生成文本时预测的最大token数量。 (Default: 128, -1 = infinite generation(无限制), -2 = fill context(根据上下文填充完整fill the context to its maximum))	int	num_predict 42
top_k	减少生成无意义内容的概率。较高的值（例如，100）会使答案更加多样，而较低的值（如，10）则会更为保守。 (Default: 40)	int	top_k 40
top_p	top-k协同工作。较高的值（例如，0.95）将导致更丰富的文本多样性，而较低的值（如，0.5）则会生成更聚焦和保守的内容。(Default: 0.9)	float	top_p 0.9