- Add custom llama.cpp Dockerfile with ROCm 6.1 + gfx906 (MI50) build - Add llama-cpp-hermes service serving Hermes 4.3 on dual MI50 GPUs - Strip GPU devices/ROCm env from ollama service (CPU-only for embeddings) Hermes 4.3 runs at ~19 t/s on dual MI50s with 160K context.