feat: add KVM/libvirt support for staging VM #48

Open
Hermes wants to merge 6 commits from kvm-pr into master
Collaborator

What this PR adds

Full CI/CD pipeline for NixOS infrastructure. Consolidates PR #39 (CI workflow) and PR #42 (KVM/libvirt) into one.


Changes

1. VM Infrastructure (modules/nixos/services/staging-vm.nix)

  • Enables virtualisation.libvirtd with QEMU/KVM, OVMF (UEFI), and swtpm
  • Defines default NAT network (192.168.122.0/24) with DHCP
  • Sets up libvirt storage pool at /var/lib/libvirt/images
  • Creates /var/lib/staging-vm/ for test data
  • Firewall rules for libvirt guests
  • pr-test-vm helper script: build, start, stop, destroy, ssh

2. CI Pipeline (.gitea/workflows/build-nixos.yml)

  • Triggered on PR/push to master touching Nix files, flake.lock, secrets, hosts, or modules
  • Step 1: Build NixOS config with nh os build (compile validation)
  • Step 2: Integration test stub (placeholder for VM deployment tests)

3. Test Suite (tests/run-integration.sh)

  • Script that runs inside the staging VM
  • Checks Docker daemon, compose stack, and service health

4. Host Config (hosts/lazyworkhorse/configuration.nix)

  • Enables services.stagingVm
  • libvirtd user groups for ai-worker

Architectural Decisions

Environment Variable Switching

  • Refactor compose files: replace hardcoded URLs with $DOMAIN / $SITE_URL variables
  • Two env files: .env.production (DOMAIN=lazyworkhorse.net) and .env.staging (DOMAIN=staging.lazyworkhorse.net)
  • CI/test script selects the correct one on deploy

Staging URLs

  • Proposed: *.staging.lazyworkhorse.net subdomain
  • Traefik on host routes *.staging. traffic to staging VM's internal IP (192.168.122.x)
  • Staging VM runs its own Traefik for internal service routing
  • Tests full DNS + TLS stack end-to-end

Security

  • Recommended: WireGuard VPN for staging access
  • Keeps staging services off the public internet
  • Still tests real URLs, TLS, and DNS resolution
  • Avoids risk of default admin credentials on staging services

Staging NFS Volume

  • Create a dedicated NFS export on HoardingCow for staging VM persistent data
  • Mirrors production volume structure (DB data, config files)
  • Mounted inside the staging VM at same paths as production

Webhooks & Auto-Merge Plan

Auto-Merge

  • Gitea supports "merge when checks pass" on protected branches natively
  • After merging this PR: mark master as protected branch, require CI checks to pass
  • When all CI checks pass, Gitea auto-merges the PR
  • No custom code needed — built into Gitea

Webhook Integration

  • Gitea webhooks can notify on: PR open/update/merge, CI completion, push events
  • Possible flows:
    • PR opened → webhook notifies Hermes → Hermes reviews/assigns
    • CI completed → webhook notifies Hermes → Hermes verifies results
    • PR merged → webhook triggers deploy to production
  • Hermes already has a webhook-subscriptions skill for handling these events

Deploy

sudo nixos-rebuild switch --flake .#lazyworkhorse

Next Steps (after merge)

  • Wire CI runner to use pr-test-vm for full VM deployment tests
  • Add per-service health checks to tests/run-integration.sh
  • Set up staging NFS volume on HoardingCow
  • Create .env.staging and refactor compose files
  • Configure *.staging.lazyworkhorse.net DNS + Traefik routing
  • Set up WireGuard VPN for staging access
  • Enable auto-merge on protected branches
## What this PR adds Full CI/CD pipeline for NixOS infrastructure. Consolidates PR #39 (CI workflow) and PR #42 (KVM/libvirt) into one. --- ## Changes ### 1. VM Infrastructure (`modules/nixos/services/staging-vm.nix`) - Enables `virtualisation.libvirtd` with QEMU/KVM, OVMF (UEFI), and swtpm - Defines default NAT network (192.168.122.0/24) with DHCP - Sets up libvirt storage pool at `/var/lib/libvirt/images` - Creates `/var/lib/staging-vm/` for test data - Firewall rules for libvirt guests - **`pr-test-vm` helper script**: build, start, stop, destroy, ssh ### 2. CI Pipeline (`.gitea/workflows/build-nixos.yml`) - Triggered on PR/push to master touching Nix files, flake.lock, secrets, hosts, or modules - Step 1: Build NixOS config with `nh os build` (compile validation) - Step 2: Integration test stub (placeholder for VM deployment tests) ### 3. Test Suite (`tests/run-integration.sh`) - Script that runs inside the staging VM - Checks Docker daemon, compose stack, and service health ### 4. Host Config (`hosts/lazyworkhorse/configuration.nix`) - Enables `services.stagingVm` - libvirtd user groups for ai-worker --- ## Architectural Decisions ### Environment Variable Switching - Refactor compose files: replace hardcoded URLs with `$DOMAIN` / `$SITE_URL` variables - Two env files: `.env.production` (DOMAIN=lazyworkhorse.net) and `.env.staging` (DOMAIN=staging.lazyworkhorse.net) - CI/test script selects the correct one on deploy ### Staging URLs - Proposed: `*.staging.lazyworkhorse.net` subdomain - Traefik on host routes `*.staging.` traffic to staging VM's internal IP (192.168.122.x) - Staging VM runs its own Traefik for internal service routing - Tests full DNS + TLS stack end-to-end ### Security - **Recommended:** WireGuard VPN for staging access - Keeps staging services off the public internet - Still tests real URLs, TLS, and DNS resolution - Avoids risk of default admin credentials on staging services ### Staging NFS Volume - Create a dedicated NFS export on HoardingCow for staging VM persistent data - Mirrors production volume structure (DB data, config files) - Mounted inside the staging VM at same paths as production --- ## Webhooks & Auto-Merge Plan ### Auto-Merge - Gitea supports **"merge when checks pass"** on protected branches natively - After merging this PR: mark `master` as protected branch, require CI checks to pass - When all CI checks pass, Gitea auto-merges the PR - No custom code needed — built into Gitea ### Webhook Integration - Gitea webhooks can notify on: PR open/update/merge, CI completion, push events - Possible flows: - PR opened → webhook notifies Hermes → Hermes reviews/assigns - CI completed → webhook notifies Hermes → Hermes verifies results - PR merged → webhook triggers deploy to production - Hermes already has a `webhook-subscriptions` skill for handling these events --- ## Deploy ```bash sudo nixos-rebuild switch --flake .#lazyworkhorse ``` ## Next Steps (after merge) - Wire CI runner to use `pr-test-vm` for full VM deployment tests - Add per-service health checks to `tests/run-integration.sh` - Set up staging NFS volume on HoardingCow - Create `.env.staging` and refactor compose files - Configure `*.staging.lazyworkhorse.net` DNS + Traefik routing - Set up WireGuard VPN for staging access - Enable auto-merge on protected branches
Hermes added 3 commits 2026-05-16 01:14:45 +00:00
- Load kvm-intel and kvm kernel modules
- Enable libvirtd service
- Add ai-worker to libvirtd group

Requires Intel VT-x to be enabled in BIOS.
After reboot: verify /dev/kvm exists, then deploy staging VM.
Hermes added 1 commit 2026-05-16 16:04:41 +00:00
feat: add CI workflow and integration test stub
Some checks failed
Build and test NixOS config / build (pull_request) Has been cancelled
ec3da64594
Hermes added 1 commit 2026-05-20 18:19:02 +00:00
feat: full integration test suite for staging VM
Some checks failed
Build and test NixOS config / build (pull_request) Has been cancelled
2c981578a5
Replace the stub placeholder with a comprehensive integration test
script that verifyies Docker daemon, compose stack, and service
endpoint health. All configuration via environment variables with
sensible defaults.

Changes:
- tests/run-integration.sh: 5-phase test suite with color output,
  retry logic, env-var configuration, and CI-friendly exit codes
- .gitea/workflows/build-nixos.yml: update CI step to document
  pr-test-vm usage with the new test script

See also: pr-test-vm helper in modules/nixos/services/staging-vm.nix
Hermes added 1 commit 2026-05-20 18:24:46 +00:00
feat: enhance staging-vm module
Some checks failed
Build and test NixOS config / build (pull_request) Has been cancelled
0a37d27337
Improved pr-test-vm script (virt-install, DHCP IP discovery), added packages (virt-manager, libguestfs, cdrtools, gawk, etc.), better firewall rules, storage pool auto-creation, gortium in libvirtd group, fixed OVMF package reference
Some checks failed
Build and test NixOS config / build (pull_request) Has been cancelled
This pull request has changes conflicting with the target branch.
  • assets/compose
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin kvm-pr:kvm-pr
git checkout kvm-pr
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: gortium/infra#48
No description provided.