fix: ai-worker docker-only access for ollama benchmarking

Remove infra repo bind mount and sudo access from ai-worker user. Now ai-worker can only: - SSH into host from Hermes container - Run docker commands via docker group membership - Execute ollama benchmarks via docker exec Results saved to /opt/data/ai-optimizer/ in Hermes container.
Add restricted AI worker access with deployment capabilities
2026-04-29 19:55:19 +00:00 · 2026-04-28 15:34:38 +00:00
9 changed files with 137 additions and 290 deletions
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -13,9 +13,7 @@ None
 - ✅ **Phase 1: Foundation Setup** - Establish core NixOS configuration with flakes
 - ✅ **Phase 2: Docker Service Integration** - Integrate Docker Compose services
 - ✅ **Phase 3: AI Assistant Integration** - Enable AI-assisted infrastructure management
- ✅ **Phase 4: Internet Access & MCP** - MCP server for web access
+- [ ] **Phase 4: Internet Access & MCP** - MCP server for web access
 - 🚨 **Security Hardening** - CRITICAL: Firewall, fail2ban, SSH hardening (PR #28)
 - [ ] **Phase 5: TAK Server** - Research, implementation, and validation
 ## Phase Details
@@ -135,25 +133,8 @@ Plans:
 ## Progress
 **Merge Priority Order** (CRITICAL - merge in this order):
 | Priority | PR | Description | Status | Notes |
 |----------|-----|-------------|--------|-------|
 | 🚨 1 | #28 | **Security hardening** (firewall, fail2ban, SSH) | Open | **MERGE FIRST** - protects all other services |
 | 2 | #22 | Matrix bridge dependency fix | Open | Blocks Hermes functionality |
 | 3 | #21 | Backup network creation fix | Open | Infrastructure fix |
 | 4 | #25 | Hermes voice GPU support | Open | Feature enhancement |
 | 5 | #24 | uConsole CM5 host | Open | New hardware support |
 | 6 | #23 | NixOS deployment infrastructure | Open | Deployment tooling |
 | 7 | #1 | AI worker restricted access | Open | Legacy PR (superseded by hardening) |
 **Execution Order:**
-Phases execute in numeric order: 1 → 2 → 3 → 4 → Security → 5 → 6 → 7
+Phases execute in numeric order: 1 → 2 → 3 → 4 → 5 → 6 → 7
 **Merge vs Phase Execution:**
 - PRs can merge independently (no strict phase ordering for merges)
 - **EXCEPTION:** Security hardening (#28) must merge before any new services are exposed
 - After security merge, deploy with: `nh os switch --flake .#lazyworkhorse`
 | Phase | Milestone | Plans Complete | Status | Completed |
 |-------|-----------|----------------|--------|-----------|
--- a/assets/compose
+++ b/assets/compose
--- a/flake.nix
+++ b/flake.nix
@@ -61,7 +61,7 @@
              ./modules/nixos/services/open_code_server.nix
              ./modules/nixos/services/ollama_init_custom_models.nix
              ./modules/nixos/services/openclaw_node.nix
-              ./modules/nixos/services/hyperspace.nix
+              ./modules/nixos/security/ai-worker-restricted.nix
              ./users/gortium.nix
              ./users/ai-worker.nix
            ];
--- a/hosts/lazyworkhorse/configuration.nix
+++ b/hosts/lazyworkhorse/configuration.nix
@@ -277,16 +277,6 @@
    displayName = "lazyworkhorse-host";
  };
  # Hyperspace Pods — P2P mesh AI cluster (combine GPUs across machines)
  services.hyperspace = {
    enable = true;
    user = "ai-worker";
    apiPort = 8080;
    profile = "auto";
    openFirewall = true;
    extraArgs = [ "--verbose" ];
  };
  # Public host ssh key (kept in sync with the private one)
  environment.etc."ssh/ssh_host_ed25519_key.pub".text =
    "${keys.hosts.lazyworkhorse.main}";
--- a/modules/nixos/security/README-ai-worker.md
+++ b/modules/nixos/security/README-ai-worker.md
@@ -0,0 +1,105 @@
 # AI Worker Restricted Access
 This module provides SSH access for the AI worker (hermes-agent) to run ollama benchmarks on the host.
 ## Security Model
 The `ai-worker` user has:
 ### Filesystem Access
 - **Home directory**: `/home/ai-worker` (standard user home)
 - **No bind mounts**: Cannot access `/home/gortium/infra` or other host files
 - **Cannot access**: Any files outside standard system paths
 ### Sudo Access
 - **NONE**: ai-worker has no sudo privileges
 - Cannot run `nh`, `nixos-rebuild`, `nixpkgs-fmt`, or `nix` with elevated permissions
 ### Docker Access
 - Member of `docker` group - can run `docker` and `docker exec` commands
 - Primary use: `docker exec ollama ollama ...` for benchmarking
 - Can run `docker exec --privileged ollama rocm-smi ...` for VRAM monitoring
 ## Workflow: SSH + Docker Benchmarking
 The AI worker connects from the Hermes container to the host via SSH, runs ollama benchmarks, then returns to save results.
 ### Example Workflow
 ```bash
 # From Hermes container, SSH to host
 ssh -i /path/to/ssh/key ai-worker@host.docker.internal
 # On host, run ollama benchmarks via docker
 docker exec ollama ollama pull devstral-small-2:24b
 # Create test modelfile
 docker exec ollama bash -c 'cat <<EOF > /root/.ollama/test.modelfile
 FROM devstral-small-2:24b
 PARAMETER num_ctx 65536
 PARAMETER num_gpu 99
 PARAMETER flash_attn true
 EOF'
 # Create and test model
 docker exec ollama ollama create test-model -f /root/.ollama/test.modelfile
 docker exec ollama ollama run test-model "Write a Python async function"
 # Check VRAM usage
 docker exec --privileged ollama rocm-smi --showmeminfo vram
 # Cleanup
 docker exec ollama ollama rm test-model
 # Exit SSH, return to Hermes container
 exit
 # Save results in Hermes container
 # /opt/data/ai-optimizer/state.json
 # /opt/data/ai-optimizer/results.csv
 ```
 ## SSH Access
 Connect as:
 ```bash
 ssh ai-worker@lazyworkhorse
 ```
 The working directory will be `/home/ai-worker`. No infra repo access.
 ## Verification
 Check ai-worker permissions:
 ```bash
 # On the host, as root or gortium:
 sudo -u ai-worker sudo -l
 # Should show: no sudo access
 # Check docker group membership
 groups ai-worker
 # Should show: ai-worker docker
 ```
 ## Troubleshooting
 If ai-worker cannot run docker commands:
 ```bash
 # Check docker group membership
 groups ai-worker
 # Verify ollama container is running
 docker ps | grep ollama
 # Test docker access
 sudo -u ai-worker docker exec ollama ollama list
 ```
 If SSH connection fails:
 ```bash
 # Check SSH key is authorized
 cat /home/ai-worker/.ssh/authorized_keys
 # Check SSH service
 systemctl status sshd
 ```
--- a/modules/nixos/security/ai-worker-restricted.nix
+++ b/modules/nixos/security/ai-worker-restricted.nix
@@ -0,0 +1,17 @@
 { config, pkgs, lib, ... }:
 with lib;
 {
  options.services.aiWorkerAccess = mkOption {
    type = types.bool;
    default = false;
    description = "Enable AI worker SSH access with docker group membership for ollama benchmarking";
  };
  config = mkIf config.services.aiWorkerAccess {
    # ai-worker is member of docker group - can run docker commands via SSH
    # No bind mounts, no sudo access - docker-only for ollama benchmarking
    users.groups.docker.members = [ "ai-worker" ];
  };
 }
--- a/modules/nixos/services/hyperspace.nix
+++ b/modules/nixos/services/hyperspace.nix
@@ -1,235 +0,0 @@
 { config, lib, pkgs, ... }:
 with lib;
 let
  cfg = config.services.hyperspace;
  # Hyperspace CLI release from github.com/hyperspaceai/aios-cli
  # The binary bundles Node.js runtime + llama.cpp + sidecars (~914MB)
  # It auto-updates via `hyperspace update` post-install
  hyperspacePkg = pkgs.stdenv.mkDerivation rec {
    pname = "hyperspace";
    version = cfg.release;
    src = pkgs.fetchurl {
      url = "https://github.com/hyperspaceai/aios-cli/releases/download/v${version}/aios-cli-x86_64-unknown-linux-gnu.tar.gz";
      hash = "sha256-f6fJ8t3exqtYwUD5j+WvD+Hm0oN/Eef0X+R9Rj23dE0=";
    };
    sourceRoot = ".";
    installPhase = ''
      mkdir -p $out/bin $out/lib/hyperspace
      # Main CLI binary
      cp aios-cli $out/bin/hyperspace
      chmod +x $out/bin/hyperspace
      # Sidecar binaries
      for f in _aios-cli pod-raft hyperspace-*; do
        [ -f "$f" ] && install -m755 "$f" $out/lib/hyperspace/ || true
      done
      # WASM, native modules, Python shards
      cp -r *.wasm $out/lib/hyperspace/ 2>/dev/null || true
      cp -r *.node $out/lib/hyperspace/ 2>/dev/null || true
      mkdir -p $out/lib/hyperspace/python
      cp -r python/* $out/lib/hyperspace/python/ 2>/dev/null || true
      # Skills directory
      mkdir -p $out/share/hyperspace
      cp -r skills $out/share/hyperspace/ 2>/dev/null || true
      # Set HYPERSPACE_PATH so the binary finds sidecars
      wrapProgram $out/bin/hyperspace \
        --set HYPERSPACE_PATH "$out/lib/hyperspace" \
        --set HYPERSPACE_SKILLS_DIR "$out/share/hyperspace/skills"
    '';
    nativeBuildInputs = with pkgs; [ makeWrapper ];
    meta = {
      description = "Hyperspace CLI — P2P mesh AI inference network (Pods)";
      longDescription = ''
        Hyperspace Pods let multiple machines pool their GPUs into one private
        AI cluster. Install the CLI, create a pod, share an invite link — your
        machines form a P2P mesh and can run models split across all connected
        GPUs. Exposes an OpenAI-compatible API for use with Cursor, Claude Code,
        Aider, etc.
      '';
      homepage = "https://hyperspace.sh";
      sourceProvenance = with lib; [ sourceTypes.binaryNativeCode ];
      license = lib.licenses.unfree;
      platforms = [ "x86_64-linux" ];
      maintainers = [ ];
    };
  };
 in {
  options.services.hyperspace = {
    enable = mkEnableOption "Hyperspace P2P AI agent (Pods)";
    release = mkOption {
      type = types.str;
      default = "5.45.30";
      description = "Hyperspace CLI release version (from GitHub releases).";
    };
    user = mkOption {
      type = types.str;
      default = "ai-worker";
      description = "System user to run the Hyperspace agent.";
    };
    apiPort = mkOption {
      type = types.port;
      default = 8080;
      description = "Port for the OpenAI-compatible API server.";
    };
    autoStart = mkOption {
      type = types.bool;
      default = true;
      description = "Auto-start the Hyperspace agent on boot.";
    };
    openFirewall = mkOption {
      type = types.bool;
      default = true;
      description = "Open firewall ports for P2P traffic (libp2p 4001, chain 30301, API).";
    };
    profile = mkOption {
      type = types.enum [ "auto" "full" "inference" "embedding" "relay" "storage" ];
      default = "auto";
      description = ''
        Agent profile:
        - auto: auto-detect hardware
        - full: all 9 capabilities
        - inference: GPU inference only
        - embedding: CPU embedding only
        - relay: lightweight relay
        - storage: storage + memory
      '';
    };
    extraArgs = mkOption {
      type = types.listOf types.str;
      default = [ ];
      description = "Extra arguments passed to `hyperspace start`.";
    };
    dataDir = mkOption {
      type = types.str;
      default = "/var/lib/hyperspace";
      description = "Data directory for agent state (models, config, logs).";
    };
  };
  config = mkIf cfg.enable {
    # Ensure the service user exists
    users.users.${cfg.user} = {
      isSystemUser = true;
      group = cfg.user;
      home = "/home/${cfg.user}";
      createHome = true;
      shell = pkgs.bash;
    };
    users.groups.${cfg.user} = { };
    # Install the hyperspace binary
    environment.systemPackages = [ hyperspacePkg ];
    # Data directories
    systemd.tmpfiles.rules = [
      "d ${cfg.dataDir} 0755 ${cfg.user} ${cfg.user} -"
      "d ${cfg.dataDir}/models 0755 ${cfg.user} ${cfg.user} -"
      "d ${cfg.dataDir}/data 0755 ${cfg.user} ${cfg.user} -"
    ];
    # Systemd service: runs the Hyperspace agent as a system daemon
    systemd.services.hyperspace = {
      description = "Hyperspace P2P AI Agent — Pods mesh cluster";
      documentation = [ "https://hyperspace.sh" "https://github.com/hyperspaceai/aios-cli" ];
      after = [ "network-online.target" ];
      wants = [ "network-online.target" ];
      wantedBy = mkIf cfg.autoStart [ "multi-user.target" ];
      environment = {
        HYPERSPACE_HOME = cfg.dataDir;
        HYPERSPACE_API_PORT = toString cfg.apiPort;
        HYPERSPACE_PATH = "${hyperspacePkg}/lib/hyperspace";
      };
      path = with pkgs; [ bash curl nodejs ];
      script = ''
        # Wait for network connectivity before starting
        ${pkgs.bash}/bin/bash -c '
          for i in $(seq 1 30); do
            ping -c 1 -W 1 8.8.8.8 >/dev/null 2>&1 && break
            sleep 2
          done
        ' || true
        exec ${hyperspacePkg}/bin/hyperspace start \
          --profile ${cfg.profile} \
          --api-port ${toString cfg.apiPort} \
          ${lib.escapeShellArgs cfg.extraArgs}
      '';
      serviceConfig = {
        Type = "exec";
        User = cfg.user;
        Group = cfg.user;
        WorkingDirectory = cfg.dataDir;
        Restart = "always";
        RestartSec = 10;
        TimeoutStartSec = 180;
        TimeoutStopSec = 30;
        KillMode = "mixed";
        # File limits for network-heavy P2P agent
        LimitNOFILE = 65536;
        LimitNPROC = 4096;
        # GPU access — AMD MI50 (ROCm) through /dev/kfd and /dev/dri
        DeviceAllow = [
          "/dev/kfd" "rw"
          "/dev/dri" "rw"
        ];
        SupplementaryGroups = [ "video" "render" ];
        # Security hardening
        NoNewPrivileges = true;
        ProtectSystem = "strict";
        ProtectHome = true;
        PrivateTmp = true;
        PrivateDevices = false;  # needs GPU access
        ReadWritePaths = [
          cfg.dataDir
          "/tmp"
        ];
        BindPaths = [
          # GPU devices for AMD MI50
          "/dev/kfd"
          "/dev/dri"
        ];
      };
    };
    # Firewall: open P2P ports for the mesh network
    networking.firewall = mkIf cfg.openFirewall {
      allowedTCPPorts = [
        4001    # libp2p P2P (agent gossip, DHT, circuits)
        30301   # Chain P2P (blockchain consensus)
        cfg.apiPort  # OpenAI-compatible API
      ];
      allowedUDPPorts = [
        4001    # libp2p QUIC transport
        30301   # Chain UDP discovery
      ];
    };
  };
 }
--- a/modules/nixos/services/ollama_init_custom_models.nix
+++ b/modules/nixos/services/ollama_init_custom_models.nix
@@ -14,25 +14,8 @@
        local base_model=$2
        if ! ${pkgs.docker}/bin/docker exec ollama ollama list | grep -q "$model_name"; then
          echo "$model_name not found, creating from $base_model..."
          # We use a custom TEMPLATE block to strip the 'currentDate' function 
          # which is unsupported in Ollama 0.5.7 but present in Devstral's default manifest.
          ${pkgs.docker}/bin/docker exec ollama sh -c "cat <<EOF > /root/.ollama/$model_name.modelfile
 FROM $base_model
 TEMPLATE \"\"\"{{- if .System }}
 [SYSTEM_PROMPT]
 {{ .System }}
 [/SYSTEM_PROMPT]
 {{- end }}
 {{- range .Messages }}
 {{- if eq .Role \"user\" }}
 [INST]
 {{ .Content }}
 [/INST]
 {{- else if eq .Role \"assistant\" }}
 {{ .Content }}
 {{- end }}
 {{- end }}\"\"\"
 PARAMETER num_ctx 131072
 PARAMETER num_predict 4096
 PARAMETER num_keep 1024
@@ -43,7 +26,6 @@ PARAMETER stop \"[/INST]\"
 PARAMETER stop \"</s>\"
 EOF"
          ${pkgs.docker}/bin/docker exec ollama ollama create "$model_name" -f "/root/.ollama/$model_name.modelfile"
          ${pkgs.docker}/bin/docker exec ollama rm "/root/.ollama/$model_name.modelfile"
        else
          echo "$model_name already exists, skipping."
        fi
@@ -54,10 +36,6 @@ EOF"
      # Create Devstral
      create_model_if_missing "devstral-small-2:24b-128k" "devstral-small-2:24b" 
      # create_model_if_missing "qwen2.5-coder:32b-128k" "qwen2.5-coder:32b"
      # create_model_if_missing "mistral-large-planner:123b" "mistral-large:123b-instruct-v2407-q4_K_S"
    '';
    serviceConfig = {
      Type = "oneshot";
--- a/users/ai-worker.nix
+++ b/users/ai-worker.nix
@@ -9,6 +9,17 @@
    openssh.authorizedKeys.keys = [
      keys.users.ai-worker.main
    ];
    # No password login - SSH key only
    hashedPassword = "!";
  };
  users.groups.ai-worker = {};
  # Enable restricted AI worker SSH access for ollama benchmarking
  # SECURITY: ai-worker can only:
  #   - SSH into host from Hermes container
  #   - Run docker commands (docker exec ollama ...) via docker group
  #   - NO access to infra repo (no bind mount)
  #   - NO sudo access (no nh, nixos-rebuild, nixpkgs-fmt, nix)
  # WORKFLOW: SSH from Hermes container, run docker benchmarks, return and save results to /opt/data/ai-optimizer/
  services.aiWorkerAccess = true;
 }