如何使用 Ollama 的 API 来执行其他功能(列出本地模型、打印模型信息、复制模型、删除模型、从官方库中下载模型、上传模型到官方库中、生成嵌入向量、列出已加载的模型、查看 Ollama 的版本)

如何使用 Ollama 的 API 来执行其他功能

简介

列出本地模型

打印模型信息

复制模型

删除模型

从官方库下载模型

上传模型到官方库中

生成嵌入向量

列出已加载的模型

查看 Ollama 的版本

简介

        Ollama 提供了一个 RESTful API,允许开发者通过 HTTP 请求与 Ollama 服务进行交互。这个 API 覆盖了所有 Ollama 的核心功能,包括模型管理、运行和监控。在之前介绍 Ollama API 的文章当中已经介绍了生成文本生成聊天创建模型的 API 接口的使用,本篇将介绍其他 Ollama 的 RESTful API 接口,分别有列出本地模型、打印模型信息、复制模型、删除模型、从库中拉取模型、推送模型到库中、生成嵌入、列出运行中的模型、查看版本。

列出本地模型

端点

GET /api/tags

        列出服务器可用的本地模型。 

请求

curl http://localhost:11434/api/tags

响应

{
    "models": [
        {
            "name": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest",
            "model": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest",
            "modified_at": "2025-03-06T11:27:23.0213997+08:00",
            "size": 19851336554,
            "digest": "88ec7678f999c7d92e2a62200f3951167f746a58c744ce1b8b0388dca23ac6f9",
            "details": {
                "parent_model": "",
                "format": "gguf",
                "family": "qwen2",
                "families": [
                    "qwen2"
                ],
                "parameter_size": "32.8B",
                "quantization_level": "Q4_K_M"
            }
        },
        {
            "name": "llama3.1:8b-q4_K_M",
            "model": "llama3.1:8b-q4_K_M",
            "modified_at": "2025-03-05T16:24:50.1425586+08:00",
            "size": 4920753328,
            "digest": "1474810144ea411c8da8eb741f0e99c4f1d0858fe264f917d04d603c4208b79e",
            "details": {
                "parent_model": "",
                "format": "gguf",
                "family": "llama",
                "families": [
                    "llama"
                ],
                "parameter_size": "8.0B",
                "quantization_level": "Q4_K_M"
            }
        },
    ]
}

打印模型信息

端点

POST /api/show

        该 API 将显示有关模型的信息(详细信息、模板文件、模板、参数、许可、系统提示等)。 

参数

  • model:模型名称
  • verbose(可选):如果想要获取完整的信息,需要把该项设置为 true

请求

curl http://localhost:11434/api/show -d '{
  "model": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest"
}'
curl http://localhost:11434/api/show -d '{
  "model": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest",
  "verbose": true
}'

响应

{
    "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this, replace FROM with:\n# FROM DeepSeek-R1-32B-abliterated-Q4_K_M:latest\n\nFROM E:\\ChatAI\\ollama\\models\\blobs\\sha256-4de296cf59ba595f8695084cd4e0215784baa1c6b02ee9fc26a250be08b7b6a3\nTEMPLATE {{ .Prompt }}\n",
    "template": "{{ .Prompt }}",
    "details": {
        "parent_model": "E:\\ChatAI\\ollama\\models\\blobs\\sha256-4de296cf59ba595f8695084cd4e0215784baa1c6b02ee9fc26a250be08b7b6a3",
        "format": "gguf",
        "family": "qwen2",
        "families": [
            "qwen2"
        ],
        "parameter_size": "32.8B",
        "quantization_level": "Q4_K_M"
    },
    "model_info": {
        "general.architecture": "qwen2",
        "general.base_model.0.name": "DeepSeek R1 Distill Qwen 32B",
        "general.base_model.0.organization": "Deepseek Ai",
        "general.base_model.0.repo_url": "https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
        "general.base_model.count": 1,
        "general.basename": "DeepSeek-R1-Distill-Qwen",
        "general.file_type": 15,
        "general.finetune": "abliterated",
        "general.parameter_count": 32763876352,
        "general.quantization_version": 2,
        "general.size_label": "32B",
        "general.tags": [
            "abliterated",
            "uncensored"
        ],
        "general.type": "model",
        "quantize.imatrix.chunks_count": 128,
        "quantize.imatrix.dataset": "/training_dir/calibration_datav3.txt",
        "quantize.imatrix.entries_count": 448,
        "quantize.imatrix.file": "/models_out/DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF/DeepSeek-R1-Distill-Qwen-32B-abliterated.imatrix",
        "qwen2.attention.head_count": 40,
        "qwen2.attention.head_count_kv": 8,
        "qwen2.attention.layer_norm_rms_epsilon": 0.00001,
        "qwen2.block_count": 64,
        "qwen2.context_length": 131072,
        "qwen2.embedding_length": 5120,
        "qwen2.feed_forward_length": 27648,
        "qwen2.rope.freq_base": 1000000,
        "tokenizer.ggml.add_bos_token": true,
        "tokenizer.ggml.add_eos_token": false,
        "tokenizer.ggml.bos_token_id": 151646,
        "tokenizer.ggml.eos_token_id": 151643,
        "tokenizer.ggml.merges": null,
        "tokenizer.ggml.model": "gpt2",
        "tokenizer.ggml.padding_token_id": 151643,
        "tokenizer.ggml.pre": "deepseek-r1-qwen",
        "tokenizer.ggml.token_type": null,
        "tokenizer.ggml.tokens": null
    },
    "modified_at": "2025-03-06T11:27:23.0213997+08:00"
}

{
    "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this, replace FROM with:\n# FROM DeepSeek-R1-32B-abliterated-Q4_K_M:latest\n\nFROM E:\\ChatAI\\ollama\\models\\blobs\\sha256-4de296cf59ba595f8695084cd4e0215784baa1c6b02ee9fc26a250be08b7b6a3\nTEMPLATE {{ .Prompt }}\n",
    "template": "{{ .Prompt }}",
    "details": {
        "parent_model": "E:\\ChatAI\\ollama\\models\\blobs\\sha256-4de296cf59ba595f8695084cd4e0215784baa1c6b02ee9fc26a250be08b7b6a3",
        "format": "gguf",
        "family": "qwen2",
        "families": [
            "qwen2"
        ],
        "parameter_size": "32.8B",
        "quantization_level": "Q4_K_M"
    },
    "model_info": {
        "general.architecture": "qwen2",
        "general.base_model.0.name": "DeepSeek R1 Distill Qwen 32B",
        "general.base_model.0.organization": "Deepseek Ai",
        "general.base_model.0.repo_url": "https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
        "general.base_model.count": 1,
        "general.basename": "DeepSeek-R1-Distill-Qwen",
        "general.file_type": 15,
        "general.finetune": "abliterated",
        "general.parameter_count": 32763876352,
        "general.quantization_version": 2,
        "general.size_label": "32B",
        "general.tags": [
            "abliterated",
            "uncensored"
        ],
        "general.type": "model",
        "quantize.imatrix.chunks_count": 128,
        "quantize.imatrix.dataset": "/training_dir/calibration_datav3.txt",
        "quantize.imatrix.entries_count": 448,
        "quantize.imatrix.file": "/models_out/DeepSeek-R1-Distill-Qwen-32B-abliterated-GGUF/DeepSeek-R1-Distill-Qwen-32B-abliterated.imatrix",
        "qwen2.attention.head_count": 40,
        "qwen2.attention.head_count_kv": 8,
        "qwen2.attention.layer_norm_rms_epsilon": 0.00001,
        "qwen2.block_count": 64,
        "qwen2.context_length": 131072,
        "qwen2.embedding_length": 5120,
        "qwen2.feed_forward_length": 27648,
        "qwen2.rope.freq_base": 1000000,
        "tokenizer.ggml.add_bos_token": true,
        "tokenizer.ggml.add_eos_token": false,
        "tokenizer.ggml.bos_token_id": 151646,
        "tokenizer.ggml.eos_token_id": 151643,
        "tokenizer.ggml.merges": [...],    # verbose 为 true 才会出现,但这里数据太多了所以省略了
        "tokenizer.ggml.model": "gpt2",
        "tokenizer.ggml.padding_token_id": 151643,
        "tokenizer.ggml.pre": "deepseek-r1-qwen",
        "tokenizer.ggml.token_type": [...],    # verbose 为 true 才会出现,但这里数据太多了所以省略了
        "tokenizer.ggml.tokens": [...],    # verbose 为 true 才会出现,但这里数据太多了所以省略了
    },
    "modified_at": "2025-03-06T11:27:23.0213997+08:00"
}

复制模型

端点

POST /api/copy

        对已在服务器中的模型进行复制,并命名为其他名字。配合删除功能可以实现模型的重命名。

参数

  • source:被复制的模型名称
  • destination: 复制后新模型的名称

请求

curl http://localhost:11434/api/copy -d '{
  "source": "llama3.2:latest",
  "destination": "llama3.2-backup:latest"
}'

响应

        复制成功会返回200 OK,如果源模型不存在导致失败将返回404 Not Found

删除模型

端点

DELETE /api/delete

        删除模型(包括本地模型 blob)及其数据

参数

  • model:需要删除模型的名称

请求

curl -X DELETE http://localhost:11434/api/delete -d '{
  "model": "llama3.2:latest"
}'

响应

        成功删除将返回200 OK,如果要删除的模型不存在则返回404 Not Found

 

从官方库下载模型

端点

POST /api/pull

        将会从官方仓库下载一个 模型,这个下载有断点续传功能,也可以同时下载多个模型。

参数

  • model:要下载的模型名称
  • insecure(可选):允许与官方库的不安全链接,开发过程中才会用到
  • stream(可选):设置为 false 则不使用流式传输,默认为流式传输

请求

curl http://localhost:11434/api/pull -d '{
  "model": "llama3.2"
}'

响应: 

        流式传输时返回的第一个包的 status 为 pulling manifest

{
  "status": "pulling manifest"
}

        第一个包过后就是一系列的下载响应 ,在下载过程返回的包有可能会不包含 completed 字段

{
    "status": "pulling dde5aa3fc5ff",
    "digest": "sha256:dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff",
    "total": 2019377376,
    "completed": 2019377376
}

        当所有文件都下载完成后,最后的响应如下 

{
    "status": "verifying sha256 digest"
}
{
    "status": "writing manifest"
}
{
    "status": "success"
}

上传模型到官方库中

端点

POST /api/push

        把自己本地的训练好的模型上传到官方模型库当中,但在推送之前需要先前往 ollama.com 上注册,并添加上公钥。

参数

model:要上传的模型名称,格式为 <namespace>/<model>:<tag>

insecure(可选):允许与官方库的不安全链接,开发过程中才会用到

stream(可选):设置为 false 则不使用流式传输,默认为流式传输

请求

curl http://localhost:11434/api/push -d '{
  "model": "mattw/pygmalion:latest"
}'

响应

        流式传输时返回的第一个包的 status 为 retrieving manifest

{ "status": "retrieving manifest" }

        第一个包过后就是一系列的上传响应

{
  "status": "starting upload",
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
  "total": 1928429856
}

         最后上传上传完成后,最后的响应如下

{"status":"pushing manifest"}
{"status":"success"}

生成嵌入向量

端点

POST /api/embed

        可以让模型生成嵌入向量,嵌入向量是一种将文本转换为数值向量表示的方式,这些向量在机器学习和自然语言处理中非常有用,例如用于文本相似性计算、聚类、分类等任务。

基本参数

  • model:用于生成嵌入向量的模型名称
  • input:需要生成嵌入向量的文本,这可以是一个列表,可以输入多个文本

高级参数

  • truncate:根据设定的上下文长度从每个输入的尾部截断,可能会造成输入的部分缺失,默认为 true,如果设定为 false,当超过上下文长度后将返回 error
  • options:模型参数,即 Modelfile 中填写的参数,例如 temperature 之类的
  • keep_alive:请求后模型存在于内存中的时间,默认为5分钟

一、单个文本输入

请求

curl http://localhost:11434/api/embed -d '{
  "model": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest",
  "input": "Why is the sky blue?"
}'

响应

{
    "model": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest",
    "embeddings": [[0.00080057175,0.0028768708,-0.001027889,0.0038258387,...,0.008404303,0.008198861,0.009408009]],
    "total_duration": 50136542400,
    "load_duration": 49729266600,
    "prompt_eval_count": 6
}

二、多个文本输入

请求

curl http://localhost:11434/api/embed -d '{
  "model": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest",
  "input": ["Why is the sky blue?", "Why is the grass green?"]
}'

响应

{
    "model": "DeepSeek-R1-32B-abliterated-Q4_K_M:latest",
    "embeddings": [[0.00080057175,0.0028768708,-0.001027889,0.0038258387,...,0.008404303,0.008198861,0.009408009],[0.0024882958,0.0042897887,0.00031655197,0.0008690393,0.0022822758,...,0.007627705,0.009500079,0.008125986]],
    "total_duration": 232338100,
    "load_duration": 7154700,
    "prompt_eval_count": 12
}

 

列出已加载的模型

端点

GET /api/ps

请求

curl http://localhost:11434/api/ps

响应

{
    "models": [
        {
            "name": "llama3.2-backup:latest",
            "model": "llama3.2-backup:latest",
            "size": 3972362240,
            "digest": "a80c4f17acd55265feec403c7aef86be0c25983ab279d83f3bcd3abbcb5b8b72",
            "details": {
                "parent_model": "",
                "format": "gguf",
                "family": "llama",
                "families": [
                    "llama"
                ],
                "parameter_size": "3.2B",
                "quantization_level": "Q4_K_M"
            },
            "expires_at": "2025-03-07T11:30:13.2267966+08:00",
            "size_vram": 3972362240
        }
    ]
}

 

查看 Ollama 的版本

端点

GET /api/version

 请求

curl http://localhost:11434/api/version

响应

{
    "version": "0.5.12"
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

JoveZou

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值