Skip to main content

llama.cpp

Run AI agents with llama.cpp server directly. This is the most lightweight option for local inference — no additional software layer, just the llama.cpp HTTP server.

Setup

  1. Build llama.cpp or download a release from GitHub
  2. Download a GGUF model file
  3. Start the server:
./llama-server -m ./models/your-model.gguf --port 8090

(Use port 8090 or another port to avoid conflicts with Sinaptic® DROID+'s default port 8080)

  1. Configure in droid.yaml:
llama_cpp:
base_url: "http://localhost:8090/v1"

No API key required.

Agent Config

name: "llama-agent"
model:
provider: "llama_cpp"
name: "local-model"
max_tokens: 2048
temperature: 0.7

Notes

  • llama.cpp server provides an OpenAI-compatible API endpoint.
  • This is the lowest-overhead option for local inference — ideal for embedded or edge deployments.
  • Tool use (function calling) support depends on the model and llama.cpp version.
  • For most users, Ollama or LM Studio provide a simpler experience with the same underlying inference engine.
  • If running Sinaptic® DROID+ in Docker, use host.docker.internal to connect to llama.cpp on the host.