使用Ollama进行大模型的api部署有两种方式:原生接口和openai兼容接口
1.原生模式
验证本机是否安装成功Ollama,Win+R打开终端后输入:
ollama -v
拉取模型:
ollama run qwen:0.5b
1.1.第一种请求方式
打开postman,输入下面的url:
http://localhost:11434/api/generate
发送请求(该方法需要完整的prompt):
{
"model":"qwen:0.5b",
"prompt":"你是谁"
}
响应结果为:
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.2227341Z",
"response": "我是",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.2489328Z",
"response": "来自",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.2728088Z",
"response": "阿里",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.2970255Z",
"response": "云",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.3208497Z",
"response": "的",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.3444397Z",
"response": "超",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.390067Z",
"response": "大规模",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.4145786Z",
"response": "语言",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.4396235Z",
"response": "模型",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.4643017Z",
"response": ",",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.4891521Z",
"response": "我",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.515477Z",
"response": "叫",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.5407572Z",
"response": "通",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.5746175Z",
"response": "义",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.6038589Z",
"response": "千",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.6293497Z",
"response": "问",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.6552632Z",
"response": "。",
"done": false
}
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:12:55.6825481Z",
"response": "",
"done": true,
"done_reason": "stop",
"context": [
151644,
872,
198,
105043,
100165,
151645,
198,
151644,
77091,
198,
104198,
101919,
102661,
99718,
9370,
71304,
105483,
102064,
104949,
3837,
35946,
99882,
31935,
64559,
99320,
56007,
1773
],
"total_duration": 558287200,
"load_duration": 36553100,
"prompt_eval_count": 10,
"prompt_eval_duration": 59275000,
"eval_count": 18,
"eval_duration": 459349000
}
1.2.第二种请求方式
输入请求的url:
http://localhost:11434/api/chat
发送请求(该方法只需以message形式请求,只需 role、content键,完整的prompt由ollama自动生成:
{
"messages": [
{
"role": "user",
"content": "你是谁"
}
],
// "model": "lark",
"model": "qwen:0.5b",
"stream": false,
"temperature": 0.01,
"max_tokens": 1024
}
响应结果:
{
"model": "qwen:0.5b",
"created_at": "2024-10-29T03:17:59.3797606Z",
"message": {
"role": "assistant",
"content": "我是来自阿里云的大规模语言模型,我叫通义千问。"
},
"done_reason": "stop",
"done": true,
"total_duration": 567748800,
"load_duration": 29021200,
"prompt_eval_count": 10,
"prompt_eval_duration": 113331000,
"eval_count": 17,
"eval_duration": 423288000
}
2.OpenAI兼容的API
输入请求的url:
http://localhost:11434/v1/chat/completions
发送请求内容同1.2
响应结果:
{
"id": "chatcmpl-289",
"object": "chat.completion",
"created": 1730172131,
"model": "qwen:0.5b",
"system_fingerprint": "fp_ollama",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "我是来自阿里云的超大规模语言模型,我叫通义千问。"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 18,
"total_tokens": 28
}
}