llama.cpp

Run AI agents with llama.cpp server directly. This is the most lightweight option for local inference — no additional software layer, just the llama.cpp HTTP server.

Setup

Build llama.cpp or download a release from GitHub
Download a GGUF model file
Start the server:

./llama-server -m ./models/your-model.gguf --port 8090

(Use port 8090 or another port to avoid conflicts with Sinaptic® DROID+'s default port 8080)

Configure in droid.yaml:

llama_cpp:
  base_url: "http://localhost:8090/v1"

No API key required.

Agent Config

name: "llama-agent"
model:
  provider: "llama_cpp"
  name: "local-model"
  max_tokens: 2048
  temperature: 0.7

Notes

llama.cpp server provides an OpenAI-compatible API endpoint.
This is the lowest-overhead option for local inference — ideal for embedded or edge deployments.
Tool use (function calling) support depends on the model and llama.cpp version.
For most users, Ollama or LM Studio provide a simpler experience with the same underlying inference engine.
If running Sinaptic® DROID+ in Docker, use host.docker.internal to connect to llama.cpp on the host.

Setup​

Agent Config​

Notes​

Setup

Agent Config

Notes