This guide explains how to install and run the llama.cpp server using the ghcr.io/ggml-org/llama.cpp:server Docker image on a CPU-only system. This image is pre-built for running the llama-server executable, optimized for inference via an HTTP API.
Prerequisites
Before starting, ensure your system meets these requirements:
-
Operating System: Ubuntu 20.04/22.04 (or any Linux with Docker support).
-
Hardware: Any modern CPU (multi-core recommended).
-
Memory: At least 16GB RAM (more for larger models like 8B).
-
Storage: 10GB+ free space (for Docker image and models).
-
Internet</