llama.cpp
Run AI agents with llama.cpp server directly. This is the most lightweight option for local inference — no additional software layer, just the llama.cpp HTTP server.
Setup
- Build llama.cpp or download a release from GitHub
- Download a GGUF model file
- Start the server:
./llama-server -m ./models/your-model.gguf --port 8090
(Use port 8090 or another port to avoid conflicts with Sinaptic® DROID+'s default port 8080)
- Configure in
droid.yaml:
llama_cpp:
base_url: "http://localhost:8090/v1"
No API key required.
Agent Config
name: "llama-agent"
model:
provider: "llama_cpp"
name: "local-model"
max_tokens: 2048
temperature: 0.7
Notes
- llama.cpp server provides an OpenAI-compatible API endpoint.
- This is the lowest-overhead option for local inference — ideal for embedded or edge deployments.
- Tool use (function calling) support depends on the model and llama.cpp version.
- For most users, Ollama or LM Studio provide a simpler experience with the same underlying inference engine.
- If running Sinaptic® DROID+ in Docker, use
host.docker.internalto connect to llama.cpp on the host.