ollama 提供了不少配置可以方便我们进行ollama 服务的调整,比如访问端口(默认127.0.0.1),模型内存管理。。。
以下简单说明下访问以及模型内存管理的

完整配置信息

可以通过golang 代码查看,主要定义在envconfig/config.go 中

  • 默认配置
func AsMap() map[string]EnvVar {
    return map[string]EnvVar{
        "OLLAMA_DEBUG":             {"OLLAMA_DEBUG", Debug, "Show additional debug information (e.g. OLLAMA_DEBUG=1)"},
        "OLLAMA_FLASH_ATTENTION":   {"OLLAMA_FLASH_ATTENTION", FlashAttention, "Enabled flash attention"},
        "OLLAMA_HOST":              {"OLLAMA_HOST", "", "IP Address for the ollama server (default 127.0.0.1:11434)"},
        "OLLAMA_KEEP_ALIVE":        {"OLLAMA_KEEP_ALIVE", KeepAlive, "The duration that models stay loaded in memory (default \"5m\")"},
        "OLLAMA_LLM_LIBRARY":       {"OLLAMA_LLM_LIBRARY", LLMLibrary, "Set LLM library to bypass autodetection"},
        "OLLAMA_MAX_LOADED_MODELS": {"OLLAMA_MAX_LOADED_MODELS", MaxRunners, "Maximum number of loaded models (default 1)"},
        "OLLAMA_MAX_QUEUE":         {"OLLAMA_MAX_QUEUE", MaxQueuedRequests, "Maximum number of queued requests"},
        "OLLAMA_MAX_VRAM":          {"OLLAMA_MAX_VRAM", MaxVRAM, "Maximum VRAM"},
        "OLLAMA_MODELS":            {"OLLAMA_MODELS", "", "The path to the models directory"},
        "OLLAMA_NOHISTORY":         {"OLLAMA_NOHISTORY", NoHistory, "Do not preserve readline history"},
        "OLLAMA_NOPRUNE":           {"OLLAMA_NOPRUNE", NoPrune, "Do not prune model blobs on startup"},
        "OLLAMA_NUM_PARALLEL":      {"OLLAMA_NUM_PARALLEL", NumParallel, "Maximum number of parallel requests (default 1)"},
        "OLLAMA_ORIGINS":           {"OLLAMA_ORIGINS", AllowOrigins, "A comma separated list of allowed origins"},
        "OLLAMA_RUNNERS_DIR":       {"OLLAMA_RUNNERS_DIR", RunnersDir, "Location for runners"},
        "OLLAMA_TMPDIR":            {"OLLAMA_TMPDIR", TmpDir, "Location for temporary files"},
    }
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
return map[string]EnvVar{
  • 1.

一些配置调整

默认ollama 提供的api 服务是本地的,其他访问不方便,解决方法很有,包含了直接通过配置修改以及基于nginx proxy 的

  • 配置默认的
[Service]
  • 1.
Environment="OLLAMA_HOST=0.0.0.0"
  • 1.
  • 模型内存
    加载模型到内存中,模型到内存中有利于快速推理,api 配置模式
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": -1}'
  • 1.

OLLAMA_KEEP_ALIVE 也是一个参数

[Service]
  • 1.
Environment="OLLAMA_KEEP_ALIVE=-1"
  • 1.
  • 队列配置
    OLLAMA_MAX_QUEUE 环境变量
[Service]
  • 1.
Environment="OLLAMA_MAX_QUEUE=1000"
  • 1.

说明

了解一些配置还是比较有用的,可以更好的进行资源使用以及调优处理

参考资料

 https://github.com/ollama/ollama/blob/main/docs/api.md
 https://github.com/ollama/ollama/blob/main/docs/faq.md
 https://github.com/ollama/ollama/blob/main/envconfig/config.go