Compare commits

..

4 Commits

Author SHA1 Message Date
f4b666284a feat: add Hyperspace Pods NixOS module and enable on lazyworkhorse
Hyperspace Pods let multiple machines pool their GPUs into one private
P2P mesh AI cluster. Models are split across all connected GPUs —
e.g. two machines with 16GB VRAM each can run Qwen 3.5 32B together.

Changes:
- Add modules/nixos/services/hyperspace.nix — NixOS module that:
  * Fetches the Hyperspace CLI binary (v5.45.30) via fetchurl
  * Sets up systemd service for the agent
  * Opens firewall ports (libp2p 4001, chain 30301, API 8080)
  * Configures GPU passthrough for AMD MI50 (ROCm)
- Register module in flake.nix for lazyworkhorse
- Enable hyperspace service on lazyworkhorse (ai-worker user, port 8080)

Usage after deployment:
  hyperspace pod create "tdnde-lab"   # create pod
  hyperspace pod invite                # share invite with cyt-pi
  curl http://localhost:8080/v1/chat/completions  # OpenAI API

See skill: nixos-hyperspace-pods
2026-05-02 15:36:15 +00:00
815ca3afa6 chore: update compose submodule to traefik logging branch 2026-05-02 15:30:28 +00:00
e983775c04 docs: add merge priority order with security hardening as #1 priority
- Updated roadmap phase status (Phase 4 complete)
- Added merge priority table with PR #28 (security) at top
- Documented that security must merge before new services exposed
- Added deployment command reference
2026-05-02 15:30:28 +00:00
Robert
bcf5cadaa0 olllama template fix to remove currenttime 2026-04-30 21:54:47 -04:00
12 changed files with 290 additions and 680 deletions

View File

@@ -13,7 +13,9 @@ None
-**Phase 1: Foundation Setup** - Establish core NixOS configuration with flakes
-**Phase 2: Docker Service Integration** - Integrate Docker Compose services
-**Phase 3: AI Assistant Integration** - Enable AI-assisted infrastructure management
- [ ] **Phase 4: Internet Access & MCP** - MCP server for web access
- **Phase 4: Internet Access & MCP** - MCP server for web access
- 🚨 **Security Hardening** - CRITICAL: Firewall, fail2ban, SSH hardening (PR #28)
- [ ] **Phase 5: TAK Server** - Research, implementation, and validation
## Phase Details
@@ -133,8 +135,25 @@ Plans:
## Progress
**Merge Priority Order** (CRITICAL - merge in this order):
| Priority | PR | Description | Status | Notes |
|----------|-----|-------------|--------|-------|
| 🚨 1 | #28 | **Security hardening** (firewall, fail2ban, SSH) | Open | **MERGE FIRST** - protects all other services |
| 2 | #22 | Matrix bridge dependency fix | Open | Blocks Hermes functionality |
| 3 | #21 | Backup network creation fix | Open | Infrastructure fix |
| 4 | #25 | Hermes voice GPU support | Open | Feature enhancement |
| 5 | #24 | uConsole CM5 host | Open | New hardware support |
| 6 | #23 | NixOS deployment infrastructure | Open | Deployment tooling |
| 7 | #1 | AI worker restricted access | Open | Legacy PR (superseded by hardening) |
**Execution Order:**
Phases execute in numeric order: 1 → 2 → 3 → 4 → 5 → 6 → 7
Phases execute in numeric order: 1 → 2 → 3 → 4 → Security → 5 → 6 → 7
**Merge vs Phase Execution:**
- PRs can merge independently (no strict phase ordering for merges)
- **EXCEPTION:** Security hardening (#28) must merge before any new services are exposed
- After security merge, deploy with: `nh os switch --flake .#lazyworkhorse`
| Phase | Milestone | Plans Complete | Status | Completed |
|-------|-----------|----------------|--------|-----------|

View File

@@ -1,203 +0,0 @@
# AI Model Optimization Cron Job - EXECUTION PROMPT
**When this cron runs, follow these instructions exactly:**
---
## Your Role
You are an AI model optimization agent. Your task is to find the best ollama/llama.cpp configuration for maximum context size and hardware utilization.
**Hardware:**
- 2× AMD MI50 GPUs (32GB VRAM each, 64GB total)
- 128GB system RAM
- ROCm: HSA_OVERRIDE_GFX_VERSION=9.0.6, HIP_VISIBLE_DEVICES=0,1
---
## File Locations
```
STATE: /opt/data/infra/assets/ai-optimizer/state.json
RESULTS: /opt/data/infra/assets/ai-optimizer/results.csv
INFRA_REPO: /opt/data/infra
```
---
## Model Queues
### GPU Track (Coding - prioritize speed + context on GPU)
1. `devstral-small-2:24b`
2. `qwen2.5-coder:32b`
3. `codellama:34b-instruct`
### RAM Track (Knowledge - prioritize max context)
1. `qwen2.5:72b`
2. `nemotron-3-nano:30b`
3. `mixtral:8x7b-instruct`
---
## Context Steps (in order)
```
[32768, 65536, 98304, 131072, 163840, 200704, 262144, 327680]
```
---
## Each Run - Step by Step
### 1. Read State
```bash
cd /opt/data/infra
cat assets/ai-optimizer/state.json
```
### 2. Determine Next Test
- Read `track` (gpu or ram)
- Read `current_model` from queue at `model_index`
- Read `current_config` for parameters to test
- Select next context step from `context_steps` based on `phase`
### 3. Pull Model (if needed)
```bash
docker exec ollama ollama list | grep -q "<model>" || docker exec ollama ollama pull <model>
```
### 4. Create Test Modelfile
```bash
docker exec ollama bash -c "cat <<EOF > /root/.ollama/test_${model}.modelfile
FROM ${model}
PARAMETER num_ctx ${current_config.num_ctx}
PARAMETER num_gpu ${current_config.num_gpu}
PARAMETER flash_attn ${current_config.flash_attn}
PARAMETER num_predict 4096
PARAMETER num_keep 1024
PARAMETER repeat_penalty 1.1
EOF"
docker exec ollama ollama create test-model -f /root/.ollama/test_${model}.modelfile
```
### 5. Run Benchmark
```bash
# Warm up
docker exec ollama ollama run test-model "Hello" > /dev/null
# Coding prompt
START=$(date +%s%N)
docker exec ollama ollama run test-model "Write a Python async context manager that retries a function with exponential backoff, max 5 retries, and logs each attempt using structlog. Include type hints."
END=$(date +%s%N)
# Calculate tokens/sec from output
```
### 6. Measure VRAM (if possible)
```bash
# Try host first
rocm-smi --showmeminfo vram 2>/dev/null || \
# Try via docker
docker exec --privileged ollama rocm-smi --showmeminfo vram 2>/dev/null || \
# Fallback
echo "VRAM measurement unavailable"
```
### 7. Record Results
- Parse tokens/sec from ollama output
- Record VRAM/RAM usage
- Determine if this is best config so far for this model
- Update `best_configs` if tokens/sec improved or context increased
### 8. Update State
```python
# Logic:
if test_successful:
if context_step < max_reached:
phase = "context_scaling"
current_config.num_ctx = next_context_step
else:
# Move to next model
model_index += 1
phase = "context_scaling"
current_config.num_ctx = context_steps[0]
else:
# OOM or error - record last good as best
best_configs[track][current_model] = last_good_config
model_index += 1
phase = "context_scaling"
```
### 9. Commit to Repo
```bash
cd /opt/data/infra
git add assets/ai-optimizer/
git commit -m "ai-optimizer: tested ${model} at ${num_ctx} ctx - ${status}"
git push origin master
```
### 10. Matrix Notification (if available)
```python
import os
if os.getenv("MATRIX_HOME_SERVER") and os.getenv("MATRIX_ACCESS_TOKEN"):
# Send notification to Matrix room
# Room ID from env or config
pass
# Else: silent
```
---
## Stop Conditions
1. All models in both queues have `best_configs` recorded
2. Manual intervention needed (error in state.json `error` field)
3. No progress for 3 consecutive runs (stuck)
---
## Error Handling
If any step fails:
1. Log error to state.json: `"error": {"message": "...", "timestamp": "..."}`
2. Do NOT increment model_index (retry next run)
3. Commit state with error field
4. Exit gracefully
---
## Important Notes
- **No num_parallel**: Do not use this parameter
- **Two tracks**: Complete GPU track first, then RAM track
- **Backend**: Start with ollama, llama.cpp testing is optional (requires uncommenting in compose.yml)
- **Host access**: Some commands need host - use docker exec or SSH if available
- **Ask before deploy**: If config changes needed in NixOS modules, show diff and wait for user confirmation before `nh os switch`
---
## Example State Transitions
**Start:**
```json
{"track": "gpu", "model_index": 0, "current_model": "devstral-small-2:24b", "current_config": {"num_ctx": 32768, ...}}
```
**After successful test at 32k:**
```json
{"track": "gpu", "model_index": 0, "current_model": "devstral-small-2:24b", "current_config": {"num_ctx": 65536, ...}}
```
**After OOM at 131k:**
```json
{
"track": "gpu",
"model_index": 1,
"current_model": "qwen2.5-coder:32b",
"best_configs": {
"gpu": {
"devstral-small-2:24b": {"num_ctx": 98304, "num_gpu": 99, "tokens_per_sec": 11.2}
}
}
}
```

View File

@@ -1,283 +0,0 @@
# AI Model Optimization Cron Job
**Goal:** Find optimal configurations for maximum context size with full hardware utilization.
**Hardware:**
- 2× AMD MI50 GPUs (32GB VRAM each, 64GB total)
- 128GB system RAM
- ROCm: HSA_OVERRIDE_GFX_VERSION=9.0.6, HIP_VISIBLE_DEVICES=0,1
---
## Model Queue
### GPU-Optimized (Coding - prioritize speed + context on GPU)
1. `devstral-small-2:24b` - Best coding model
2. `qwen2.5-coder:32b` - Strong coder, fits on GPU+offload
3. `codellama:34b-instruct` - Legacy but solid
### RAM-Optimized (Knowledge - prioritize max context, accept slower)
1. `qwen2.5:72b` - Best knowledge, needs heavy offload
2. `nemotron-3-nano:30b` - Good general knowledge
3. `mixtral:8x7b-instruct` - MoE, efficient for knowledge
---
## Optimization Strategy
**Two separate tracks:**
### Track A: GPU-Focused (Coding)
```
Baseline: num_ctx=32768, num_gpu=99, flash_attn=true
Steps:
1. Increase context: 32k → 65k → 98k → 131k → 163k
2. At each step, verify VRAM usage < 60GB (leave headroom)
3. If OOM: reduce num_gpu until stable, record best
4. Measure tokens/sec - if < 5 tok/s, consider context too high
```
### Track B: RAM-Focused (Knowledge)
```
Baseline: num_ctx=65536, num_gpu=50, flash_attn=true
Steps:
1. Increase context: 65k → 131k → 200k → 262k → 327k
2. Allow heavy RAM offload (system RAM up to 100GB)
3. If OOM: reduce context or num_gpu
4. Speed less critical - focus on max stable context
```
---
## Backend-Specific Configs
### Ollama (Modelfile parameters)
```
PARAMETER num_ctx <value>
PARAMETER num_gpu <layers>
PARAMETER flash_attn true/false
PARAMETER num_predict 4096
PARAMETER num_keep 1024
PARAMETER repeat_penalty 1.1
```
### Llama.cpp (CLI flags)
```
--ctx-size <value>
--n-gpu-layers <layers>
--flash-attn on/off
--n-predict 4096
--batch-size 4096
--ubatch-size 512
--cache-type-k f16
--cache-type-v f16
--split-mode layer
--no-mmap
```
---
## Host Test Instructions
**The cron runs inside the hermes container. Some tests require host access:**
### 1. VRAM Monitoring (HOST)
```bash
# Run on host to check VRAM usage during/after benchmark
sudo rocm-smi --showmeminfo vram
# Or via docker exec if rocm-smi available in container
docker exec --privileged ollama rocm-smi --showmeminfo vram
```
### 2. Running Ollama Benchmarks (CONTAINER)
```bash
# Pull model
docker exec ollama ollama pull <model>
# Create custom modelfile
docker exec ollama bash -c 'cat <<EOF > /root/.ollama/test.modelfile
FROM <model>
PARAMETER num_ctx 65536
PARAMETER num_gpu 99
PARAMETER flash_attn true
EOF'
# Create model from modelfile
docker exec ollama ollama create test-model -f /root/.ollama/test.modelfile
# Run benchmark (warm model first)
docker exec ollama ollama run test-model "Write a Python async context manager with exponential backoff"
# Cleanup
docker exec ollama ollama rm test-model
```
### 3. Running Llama.cpp Benchmarks (CONTAINER - needs llama.cpp container)
```bash
# Uncomment llama_cpp_devstral in compose.yml first
# Then rebuild: sudo nh os switch --flake .#lazyworkhorse
# Test via HTTP API
curl http://localhost:8300/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "devstral-2-small-llama_cpp",
"prompt": "Write a Python function",
"max_tokens": 100
}'
```
### 4. Deploying Changes (HOST via ai-worker)
```bash
# After optimization, commit results
cd /home/ai-worker/infra
git add assets/ai-optimizer/
git commit -m "ai-optimizer: new best config for <model>"
git push
# If config changes needed in ollama_init_custom_models.nix:
# 1. Edit the file
# 2. nixpkgs-fmt .
# 3. Show diff to user
# 4. Wait for confirmation
# 5. sudo nh os switch --flake .#lazyworkhorse
```
### 5. Accessing Host from Hermes Container
```bash
# SSH to host as ai-worker (key should be mounted)
ssh -i /path/to/key ai-worker@host.docker.internal
# Or via docker socket if mounted
# (not recommended for security)
```
---
## Benchmark Prompts
### Coding (Track A)
```
"Write a Python async context manager that retries a function with exponential backoff, max 5 retries, and logs each attempt using structlog. Include type hints and error handling."
```
### Knowledge (Track B)
```
"Explain the complete memory hierarchy in modern GPUs, from registers through L1/L2 caches to VRAM, and how data moves between them during matrix multiplication. Include bandwidth considerations for each level."
```
### Measurement
- Tokens per second (generation speed)
- Time to first token (latency)
- VRAM usage (via rocm-smi)
- System RAM usage (via free -h)
- Context success (did it complete without OOM?)
---
## State File Structure
`/opt/data/infra/assets/ai-optimizer/state.json`
```json
{
"track": "gpu",
"current_model": "devstral-small-2:24b",
"model_index": 0,
"phase": "context_scaling",
"backend": "ollama",
"current_config": {
"num_ctx": 65536,
"num_gpu": 99,
"flash_attn": true
},
"best_configs": {
"gpu": {
"devstral-small-2:24b": {
"backend": "ollama",
"num_ctx": 131072,
"num_gpu": 99,
"flash_attn": true,
"tokens_per_sec": 12.5,
"vram_used_gb": 58.2,
"tested_at": "2026-04-28T17:00:00Z"
}
},
"ram": {}
},
"completed_models": [],
"gpu_queue": ["devstral-small-2:24b", "qwen2.5-coder:32b", "codellama:34b-instruct"],
"ram_queue": ["qwen2.5:72b", "nemotron-3-nano:30b", "mixtral:8x7b-instruct"]
}
```
---
## Results CSV
`/opt/data/infra/assets/ai-optimizer/results.csv`
```csv
timestamp,track,model,backend,phase,num_ctx,num_gpu,flash_attn,tokens_per_sec,vram_gb,ram_gb,status,is_best
2026-04-28T17:00:00Z,gpu,devstral-small-2:24b,ollama,context_scaling,65536,99,true,15.2,52.1,18.4,success,false
```
---
## Cron Job Flow
```
1. Read state.json
2. If both queues empty → STOP (all models tested)
3. Select next model from current track queue
4. Pull model if needed (docker exec ollama ollama pull)
5. Create Modelfile / llama.cpp config with current test params
6. Run benchmark (both prompts)
7. Measure: tokens/sec, VRAM (rocm-smi), RAM (free -h)
8. If successful:
- Increase context (next step)
- Update current_config in state
9. If OOM/error:
- Record last good config as best_configs[track][model]
- Move to next model in queue
10. Update state.json
11. Append to results.csv
12. Git commit + push to /opt/data/infra
13. Send Matrix notification if available, else silent
```
---
## Matrix Notification (Optional)
```python
# If matrix credentials available in environment
if os.getenv("MATRIX_HOME_SERVER") and os.getenv("MATRIX_ACCESS_TOKEN"):
# Send completion notification
# Room: !ai-optimizer:lazyworkhorse.net (or similar)
pass
# Else: silent, just commit
```
---
## Files to Create
```
/opt/data/infra/assets/ai-optimizer/
├── state.json # Current progress
├── results.csv # All test results
├── best_configs.json # Final best configs (human-readable)
└── CRON_JOB_DRAFT.md # This file
```
---
## Notes
- **No num_parallel**: Removed to avoid limiting other settings
- **Two tracks**: GPU (coding/speed) vs RAM (knowledge/context)
- **Both backends**: Test ollama first, then llama.cpp if available
- **Host tests**: rocm-smi must run on host or privileged container
- **Deploy**: ai-worker has sudo for nh/nixos-rebuild, must ask user first

View File

@@ -1 +0,0 @@
timestamp,track,model,backend,phase,num_ctx,num_gpu,flash_attn,tokens_per_sec,vram_gb,ram_gb,status,is_best
1 timestamp track model backend phase num_ctx num_gpu flash_attn tokens_per_sec vram_gb ram_gb status is_best

View File

@@ -1,21 +0,0 @@
{
"track": "gpu",
"current_model": "devstral-small-2:24b",
"model_index": 0,
"phase": "context_scaling",
"backend": "ollama",
"current_config": {
"num_ctx": 32768,
"num_gpu": 99,
"flash_attn": true
},
"best_configs": {
"gpu": {},
"ram": {}
},
"completed_models": [],
"gpu_queue": ["devstral-small-2:24b", "qwen2.5-coder:32b", "codellama:34b-instruct"],
"ram_queue": ["qwen2.5:72b", "nemotron-3-nano:30b", "mixtral:8x7b-instruct"],
"context_steps": [32768, 65536, 98304, 131072, 163840, 200704, 262144, 327680],
"last_updated": "2026-04-28T17:00:00Z"
}

View File

@@ -1,67 +0,0 @@
FROM ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie@sha256:b3c543b6c4f23a5f2df22866bd7857e5d304b67a564f4feab6ac22044dde719b AS uv_source
FROM tianon/gosu:1.19-trixie@sha256:3b176695959c71e123eb390d427efc665eeb561b1540e82679c15e992006b8b9 AS gosu_source
FROM debian:13.4
# Disable Python stdout buffering to ensure logs are printed immediately
ENV PYTHONUNBUFFERED=1
# Store Playwright browsers outside the volume mount so the build-time
# install survives the /opt/data volume overlay at runtime.
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# Install system dependencies in one layer, clear APT cache
# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini \
curl poppler-utils imagemagick \
chromium xvfb fonts-noto-color-emoji fonts-unifont fonts-liberation fonts-ipafont-gothic fonts-wqy-zenhei fonts-tlwg-loma-otf fonts-freefont-ttf \
libasound2t64 libatk-bridge2.0-0t64 libatk1.0-0t64 libatspi2.0-0t64 libcairo2 libcups2t64 libdbus-1-3 libdrm2 libgbm1 libglib2.0-0t64 libnspr4 libnss3 libpango-1.0-0 libx11-6 libxcb1 libxcomposite1 libxdamage1 libxext6 libxfixes3 libxkbcommon0 libxrandr2 \
texlive-latex-base texlive-latex-extra texlive-fonts-recommended texlive-xetex texlive-science && \
rm -rf /var/lib/apt/lists/*
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
RUN useradd -u 10000 -m -d /opt/data hermes
COPY --chmod=0755 --from=gosu_source /gosu /usr/local/bin/
COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/
WORKDIR /opt/hermes
# ---------- Layer-cached dependency install ----------
# Copy only package manifests first so npm install + Playwright are cached
# unless the lockfiles themselves change.
COPY package.json package-lock.json ./
COPY web/package.json web/package-lock.json web/
RUN npm install --prefer-offline --no-audit && \
npx playwright install --with-deps chromium --only-shell && \
(cd web && npm install --prefer-offline --no-audit) && \
npm cache clean --force
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
COPY --chown=hermes:hermes . .
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
# The venv needs to be traversable too.
USER root
RUN chmod -R a+rX /opt/hermes
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
# ---------- Python virtualenv ----------
RUN uv venv && \
uv pip install --no-cache-dir -e ".[all]"
# ---------- Runtime ----------
ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
ENV PATH="/opt/data/.local/bin:${PATH}"
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]

View File

@@ -1,102 +0,0 @@
#!/bin/bash
# Docker/Podman entrypoint: bootstrap config files into the mounted volume, then run hermes.
set -e
HERMES_HOME="${HERMES_HOME:-/opt/data}"
INSTALL_DIR="/opt/hermes"
# --- Privilege dropping via gosu ---
# When started as root (the default for Docker, or fakeroot in rootless Podman),
# optionally remap the hermes user/group to match host-side ownership, fix volume
# permissions, then re-exec as hermes.
if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
echo "Changing hermes UID to $HERMES_UID"
usermod -u "$HERMES_UID" hermes
fi
if [ -n "$HERMES_GID" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
echo "Changing hermes GID to $HERMES_GID"
# -o allows non-unique GID (e.g. macOS GID 20 "staff" may already exist
# as "dialout" in the Debian-based container image)
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
# Fix ownership of the data volume. When HERMES_UID remaps the hermes user,
# files created by previous runs (under the old UID) become inaccessible.
# Always chown -R when UID was remapped; otherwise only if top-level is wrong.
actual_hermes_uid=$(id -u hermes)
needs_chown=false
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "10000" ]; then
needs_chown=true
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "Warning: chown failed (rootless container?) — continuing anyway"
fi
echo "Dropping root privileges"
exec gosu hermes "$0" "$@"
fi
# --- Running as hermes from here ---
source "${INSTALL_DIR}/.venv/bin/activate"
# Create essential directory structure. Cache and platform directories
# (cache/images, cache/audio, platforms/whatsapp, etc.) are created on
# demand by the application — don't pre-create them here so new installs
# get the consolidated layout from get_hermes_dir().
# The "home/" subdirectory is a per-profile HOME for subprocesses (git,
# ssh, gh, npm …). Without it those tools write to /root which is
# ephemeral and shared across profiles. See issue #4426.
mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills,skins,plans,workspace,home}
# .env
if [ ! -f "$HERMES_HOME/.env" ]; then
cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
fi
# config.yaml
if [ ! -f "$HERMES_HOME/config.yaml" ]; then
cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
fi
# Ensure the main config file remains accessible to the hermes runtime user
# even if it was edited on the host after initial ownership setup.
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml"
chmod 640 "$HERMES_HOME/config.yaml"
fi
# SOUL.md
if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
fi
# Sync bundled skills (manifest-based so user edits are preserved)
if [ -d "$INSTALL_DIR/skills" ]; then
python3 "$INSTALL_DIR/tools/skills_sync.py"
fi
# Final exec: two supported invocation patterns.
#
# docker run <image> -> exec `hermes` with no args (legacy default)
# docker run <image> chat -q "..." -> exec `hermes chat -q "..."` (legacy wrap)
# docker run <image> sleep infinity -> exec `sleep infinity` directly
# docker run <image> bash -> exec `bash` directly
#
# If the first positional arg resolves to an executable on PATH, we assume the
# caller wants to run it directly (needed by the launcher which runs long-lived
# `sleep infinity` sandbox containers — see tools/environments/docker.py).
# Otherwise we treat the args as a hermes subcommand and wrap with `hermes`,
# preserving the documented `docker run <image> <subcommand>` behavior.
if [ $# -gt 0 ] && command -v "$1" >/dev/null 2>&1; then
exec "$@"
fi
exec hermes "$@"

View File

@@ -61,6 +61,7 @@
./modules/nixos/services/open_code_server.nix
./modules/nixos/services/ollama_init_custom_models.nix
./modules/nixos/services/openclaw_node.nix
./modules/nixos/services/hyperspace.nix
./users/gortium.nix
./users/ai-worker.nix
];

View File

@@ -277,6 +277,16 @@
displayName = "lazyworkhorse-host";
};
# Hyperspace Pods — P2P mesh AI cluster (combine GPUs across machines)
services.hyperspace = {
enable = true;
user = "ai-worker";
apiPort = 8080;
profile = "auto";
openFirewall = true;
extraArgs = [ "--verbose" ];
};
# Public host ssh key (kept in sync with the private one)
environment.etc."ssh/ssh_host_ed25519_key.pub".text =
"${keys.hosts.lazyworkhorse.main}";

View File

@@ -0,0 +1,235 @@
{ config, lib, pkgs, ... }:
with lib;
let
cfg = config.services.hyperspace;
# Hyperspace CLI release from github.com/hyperspaceai/aios-cli
# The binary bundles Node.js runtime + llama.cpp + sidecars (~914MB)
# It auto-updates via `hyperspace update` post-install
hyperspacePkg = pkgs.stdenv.mkDerivation rec {
pname = "hyperspace";
version = cfg.release;
src = pkgs.fetchurl {
url = "https://github.com/hyperspaceai/aios-cli/releases/download/v${version}/aios-cli-x86_64-unknown-linux-gnu.tar.gz";
hash = "sha256-f6fJ8t3exqtYwUD5j+WvD+Hm0oN/Eef0X+R9Rj23dE0=";
};
sourceRoot = ".";
installPhase = ''
mkdir -p $out/bin $out/lib/hyperspace
# Main CLI binary
cp aios-cli $out/bin/hyperspace
chmod +x $out/bin/hyperspace
# Sidecar binaries
for f in _aios-cli pod-raft hyperspace-*; do
[ -f "$f" ] && install -m755 "$f" $out/lib/hyperspace/ || true
done
# WASM, native modules, Python shards
cp -r *.wasm $out/lib/hyperspace/ 2>/dev/null || true
cp -r *.node $out/lib/hyperspace/ 2>/dev/null || true
mkdir -p $out/lib/hyperspace/python
cp -r python/* $out/lib/hyperspace/python/ 2>/dev/null || true
# Skills directory
mkdir -p $out/share/hyperspace
cp -r skills $out/share/hyperspace/ 2>/dev/null || true
# Set HYPERSPACE_PATH so the binary finds sidecars
wrapProgram $out/bin/hyperspace \
--set HYPERSPACE_PATH "$out/lib/hyperspace" \
--set HYPERSPACE_SKILLS_DIR "$out/share/hyperspace/skills"
'';
nativeBuildInputs = with pkgs; [ makeWrapper ];
meta = {
description = "Hyperspace CLI P2P mesh AI inference network (Pods)";
longDescription = ''
Hyperspace Pods let multiple machines pool their GPUs into one private
AI cluster. Install the CLI, create a pod, share an invite link your
machines form a P2P mesh and can run models split across all connected
GPUs. Exposes an OpenAI-compatible API for use with Cursor, Claude Code,
Aider, etc.
'';
homepage = "https://hyperspace.sh";
sourceProvenance = with lib; [ sourceTypes.binaryNativeCode ];
license = lib.licenses.unfree;
platforms = [ "x86_64-linux" ];
maintainers = [ ];
};
};
in {
options.services.hyperspace = {
enable = mkEnableOption "Hyperspace P2P AI agent (Pods)";
release = mkOption {
type = types.str;
default = "5.45.30";
description = "Hyperspace CLI release version (from GitHub releases).";
};
user = mkOption {
type = types.str;
default = "ai-worker";
description = "System user to run the Hyperspace agent.";
};
apiPort = mkOption {
type = types.port;
default = 8080;
description = "Port for the OpenAI-compatible API server.";
};
autoStart = mkOption {
type = types.bool;
default = true;
description = "Auto-start the Hyperspace agent on boot.";
};
openFirewall = mkOption {
type = types.bool;
default = true;
description = "Open firewall ports for P2P traffic (libp2p 4001, chain 30301, API).";
};
profile = mkOption {
type = types.enum [ "auto" "full" "inference" "embedding" "relay" "storage" ];
default = "auto";
description = ''
Agent profile:
- auto: auto-detect hardware
- full: all 9 capabilities
- inference: GPU inference only
- embedding: CPU embedding only
- relay: lightweight relay
- storage: storage + memory
'';
};
extraArgs = mkOption {
type = types.listOf types.str;
default = [ ];
description = "Extra arguments passed to `hyperspace start`.";
};
dataDir = mkOption {
type = types.str;
default = "/var/lib/hyperspace";
description = "Data directory for agent state (models, config, logs).";
};
};
config = mkIf cfg.enable {
# Ensure the service user exists
users.users.${cfg.user} = {
isSystemUser = true;
group = cfg.user;
home = "/home/${cfg.user}";
createHome = true;
shell = pkgs.bash;
};
users.groups.${cfg.user} = { };
# Install the hyperspace binary
environment.systemPackages = [ hyperspacePkg ];
# Data directories
systemd.tmpfiles.rules = [
"d ${cfg.dataDir} 0755 ${cfg.user} ${cfg.user} -"
"d ${cfg.dataDir}/models 0755 ${cfg.user} ${cfg.user} -"
"d ${cfg.dataDir}/data 0755 ${cfg.user} ${cfg.user} -"
];
# Systemd service: runs the Hyperspace agent as a system daemon
systemd.services.hyperspace = {
description = "Hyperspace P2P AI Agent Pods mesh cluster";
documentation = [ "https://hyperspace.sh" "https://github.com/hyperspaceai/aios-cli" ];
after = [ "network-online.target" ];
wants = [ "network-online.target" ];
wantedBy = mkIf cfg.autoStart [ "multi-user.target" ];
environment = {
HYPERSPACE_HOME = cfg.dataDir;
HYPERSPACE_API_PORT = toString cfg.apiPort;
HYPERSPACE_PATH = "${hyperspacePkg}/lib/hyperspace";
};
path = with pkgs; [ bash curl nodejs ];
script = ''
# Wait for network connectivity before starting
${pkgs.bash}/bin/bash -c '
for i in $(seq 1 30); do
ping -c 1 -W 1 8.8.8.8 >/dev/null 2>&1 && break
sleep 2
done
' || true
exec ${hyperspacePkg}/bin/hyperspace start \
--profile ${cfg.profile} \
--api-port ${toString cfg.apiPort} \
${lib.escapeShellArgs cfg.extraArgs}
'';
serviceConfig = {
Type = "exec";
User = cfg.user;
Group = cfg.user;
WorkingDirectory = cfg.dataDir;
Restart = "always";
RestartSec = 10;
TimeoutStartSec = 180;
TimeoutStopSec = 30;
KillMode = "mixed";
# File limits for network-heavy P2P agent
LimitNOFILE = 65536;
LimitNPROC = 4096;
# GPU access — AMD MI50 (ROCm) through /dev/kfd and /dev/dri
DeviceAllow = [
"/dev/kfd" "rw"
"/dev/dri" "rw"
];
SupplementaryGroups = [ "video" "render" ];
# Security hardening
NoNewPrivileges = true;
ProtectSystem = "strict";
ProtectHome = true;
PrivateTmp = true;
PrivateDevices = false; # needs GPU access
ReadWritePaths = [
cfg.dataDir
"/tmp"
];
BindPaths = [
# GPU devices for AMD MI50
"/dev/kfd"
"/dev/dri"
];
};
};
# Firewall: open P2P ports for the mesh network
networking.firewall = mkIf cfg.openFirewall {
allowedTCPPorts = [
4001 # libp2p P2P (agent gossip, DHT, circuits)
30301 # Chain P2P (blockchain consensus)
cfg.apiPort # OpenAI-compatible API
];
allowedUDPPorts = [
4001 # libp2p QUIC transport
30301 # Chain UDP discovery
];
};
};
}

View File

@@ -14,8 +14,25 @@
local base_model=$2
if ! ${pkgs.docker}/bin/docker exec ollama ollama list | grep -q "$model_name"; then
echo "$model_name not found, creating from $base_model..."
# We use a custom TEMPLATE block to strip the 'currentDate' function
# which is unsupported in Ollama 0.5.7 but present in Devstral's default manifest.
${pkgs.docker}/bin/docker exec ollama sh -c "cat <<EOF > /root/.ollama/$model_name.modelfile
FROM $base_model
TEMPLATE \"\"\"{{- if .System }}
[SYSTEM_PROMPT]
{{ .System }}
[/SYSTEM_PROMPT]
{{- end }}
{{- range .Messages }}
{{- if eq .Role \"user\" }}
[INST]
{{ .Content }}
[/INST]
{{- else if eq .Role \"assistant\" }}
{{ .Content }}
{{- end }}
{{- end }}\"\"\"
PARAMETER num_ctx 131072
PARAMETER num_predict 4096
PARAMETER num_keep 1024
@@ -26,6 +43,7 @@ PARAMETER stop \"[/INST]\"
PARAMETER stop \"</s>\"
EOF"
${pkgs.docker}/bin/docker exec ollama ollama create "$model_name" -f "/root/.ollama/$model_name.modelfile"
${pkgs.docker}/bin/docker exec ollama rm "/root/.ollama/$model_name.modelfile"
else
echo "$model_name already exists, skipping."
fi
@@ -36,6 +54,10 @@ EOF"
# Create Devstral
create_model_if_missing "devstral-small-2:24b-128k" "devstral-small-2:24b"
# create_model_if_missing "qwen2.5-coder:32b-128k" "qwen2.5-coder:32b"
# create_model_if_missing "mistral-large-planner:123b" "mistral-large:123b-instruct-v2407-q4_K_S"
'';
serviceConfig = {
Type = "oneshot";