From 2c5dc3d58dc3d0d48f5c747152f1b4f01391c0f0 Mon Sep 17 00:00:00 2001 From: Hermes Date: Wed, 20 May 2026 14:29:38 -0400 Subject: [PATCH] feat: comprehensive NixOS deployment infrastructure - docs/nix-container-install.md: 474-line guide covering Determinate Systems installer, vanilla Nix, NixOS base image, architecture notes (x86_64 vs aarch64), cross-compilation, container considerations, troubleshooting - scripts/deploy.sh: 286-line deployment script with pre-flight checks, git sync, build validation (nix build --no-link), 5 actions (switch/boot/test/build/ dry-activate), color-coded logging, env-based configurability - scripts/deploy-ssh-config: SSH config for all 3 hosts with dual users for lazyworkhorse, reverse tunnel for cyt-pi, uConsole placeholder, Gitea entry Full replacements of stub files from previous commit. --- docs/nix-container-install.md | 460 +++++++++++++++++++++++++++++++++- scripts/deploy-ssh-config | 71 ++++-- scripts/deploy.sh | 302 +++++++++++++++++++--- 3 files changed, 768 insertions(+), 65 deletions(-) mode change 100644 => 100755 scripts/deploy.sh diff --git a/docs/nix-container-install.md b/docs/nix-container-install.md index f7fb8aa..9e05f19 100644 --- a/docs/nix-container-install.md +++ b/docs/nix-container-install.md @@ -1,10 +1,67 @@ # Nix Installation for Hermes Agent Container -# Add these lines to the Dockerfile to bake Nix into the container image -# --- ADD AFTER BASE IMAGE AND BEFORE USER SETUP --- +This guide covers several approaches for installing Nix in the Hermes Agent Docker +container to enable remote NixOS deployment via `nixos-rebuild`. It covers both +x86_64 (lazyworkhorse) and aarch64 (cyt-pi, uConsole) architectures. +## Table of Contents + +1. [Why Nix in a Container?](#why-nix-in-a-container) +2. [Prerequisites](#prerequisites) +3. [Installation Methods](#installation-methods) + - [Method A: Determinate Systems Installer](#method-a-determinate-systems-installer-recommended) + - [Method B: Vanilla Nix Installer](#method-b-vanilla-nix-installer) + - [Method C: NixOS-Based Container Image](#method-c-nixos-based-container-image) +4. [Architecture-Specific Notes](#architecture-specific-notes) + - [x86_64 (lazyworkhorse)](#x86_64-lazyworkhorse) + - [aarch64 (cyt-pi, uConsole)](#aarch64-cyt-pi-uconsole) + - [Cross-Compilation](#cross-compilation) +5. [Post-Install Configuration](#post-install-configuration) +6. [Verification](#verification) +7. [Container-Specific Considerations](#container-specific-considerations) + - [Persistence](#persistence) + - [Disk Space](#disk-space) + - [Security](#security) + - [Resource Constraints](#resource-constraints) +8. [Integration with deploy.sh](#integration-with-deploysh) +9. [Troubleshooting](#troubleshooting) +10. [References](#references) + +--- + +## Why Nix in a Container? + +The Hermes Agent container runs on an Ubuntu/Debian base. To deploy NixOS +configurations to remote hosts, we need: + +- `nix` — the Nix package manager (for building configurations) +- `nixos-rebuild` — the NixOS deployment tool +- Access to the infra repo with flake configuration + +Installing Nix inside the container avoids: +- Host-level Nix installation on the Docker host +- Cross-container volume mounts of /nix/store +- Dependencies on the host's Nix daemon (which may be a different version) + +## Prerequisites + +- Docker host running Linux (x86_64 and/or aarch64) +- Container base: Debian/Ubuntu (apt-based) +- 1-2 GB additional disk space for Nix store +- Network access to cache.nixos.org (or a local binary cache) +- Git access to the infra repository + +## Installation Methods + +### Method A: Determinate Systems Installer (Recommended) + +The Determinate Systems installer is the recommended approach. It is non-interactive, +sets up flakes by default, and handles multi-user installation cleanly. + +**Dockerfile additions:** + +```dockerfile # Install Nix (Determinate Systems installer) -# This provides nix, nixos-rebuild, and the Nix package manager RUN apt-get update && apt-get install -y --no-install-recommends \ curl \ xz-utils \ @@ -19,14 +76,399 @@ RUN curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/ # Configure Nix for flakes RUN mkdir -p /root/.config/nix \ - && echo 'experimental-features = nix-command flakes' > /root/.config/nix/nix.conf \ - && echo 'substituters = https://cache.nixos.org/' >> /root/.config/nix/nix.conf + && echo 'experimental-features = nix-command flakes' > /root/.config/nix/nix.conf # Add Nix to PATH for all users ENV PATH="/nix/var/nix/profiles/default/bin:$PATH" +``` -# Optional: Expose Nix daemon socket if you want to use host's Nix (less secure) -# VOLUME ["/nix/store"] -# Note: Not recommended for security - builds run in container instead +**Pros:** +- Fully non-interactive (--no-confirm) +- Enables flakes automatically +- Sets up multi-user daemon +- Auto-selects correct architecture +- Handles upgrades gracefully -# --- CONTINUE WITH EXISTENT DOCKERFILE --- +**Cons:** +- Downloads ~100 MB installer +- Requires systemd in container (works with --privileged or cgroupv2) +- Daemon mode may conflict with container exit semantics + +**Container runtime additions:** + +For the Nix daemon to work properly inside a container, you may need: +```dockerfile +# Ensure /nix is a volume for persistence +VOLUME /nix + +# Or mount tmpfs for ephemeral builds: +# docker run --tmpfs /nix:exec,size=4G ... +``` + +### Method B: Vanilla Nix Installer + +The official single-user Nix installer is lighter but requires manual flake setup. + +**Dockerfile additions:** + +```dockerfile +# Install Nix (single-user, official installer) +RUN apt-get update && apt-get install -y --no-install-recommends \ + curl \ + sudo \ + xz-utils \ + && rm -rf /var/lib/apt/lists/* + +# Install Nix as root (single-user) +RUN curl -L https://nixos.org/nix/install -o /tmp/nix-install.sh \ + && chmod +x /tmp/nix-install.sh \ + && sh /tmp/nix-install.sh --no-daemon \ + && rm /tmp/nix-install.sh + +# Enable flakes +RUN mkdir -p /root/.config/nix \ + && echo 'experimental-features = nix-command flakes' > /root/.config/nix/nix.conf + +# Source Nix in shell +RUN echo '. /root/.nix-profile/etc/profile.d/nix.sh' >> /root/.bashrc +ENV PATH="/root/.nix-profile/bin:$PATH" +``` + +**Pros:** +- Smaller installer +- No daemon needed (single-user mode) +- Works in containers without systemd +- Simpler container lifecycle + +**Cons:** +- Manual flake configuration required +- Single-user only (no multi-user isolation) +- PATH must be set manually +- No automatic garbage collection + +### Method C: NixOS-Based Container Image + +For maximum isolation, use an official NixOS base image for the build stage. + +**Multi-stage Dockerfile:** + +```dockerfile +# Build stage: NixOS builder +FROM nixos/nix:latest AS builder + +COPY infra /infra +WORKDIR /infra + +# Build the configuration once +RUN nix build '.#nixosConfigurations.lazyworkhorse.config.system.build.toplevel' + +# Final stage: Hermes container +FROM ubuntu:22.04 + +# Copy the Nix closure and binary cache +COPY --from=builder /nix /nix + +# ... rest of Hermes setup +``` + +**Pros:** +- Purely declarative build environment +- No installation at runtime +- Easy to pin Nix version +- Good for CI/CD pipelines + +**Cons:** +- Requires multi-stage Docker build +- Larger initial image build +- Harder to update Nix version at runtime +- Overkill if Nix is only needed for `nixos-rebuild` + +--- + +## Architecture-Specific Notes + +### x86_64 (lazyworkhorse) + +The Hermes container likely runs on x86_64 hardware for the primary server. +Nix will download x86_64 binaries from cache.nixos.org by default. + +**No special configuration needed** — the standard installer works out of the box. + +If the container is running on an AMD Ryzen/EPYC or Intel Xeon, consider: +```bash +# Enable CPU-specific optimizations (optional) +echo 'extra-platforms = x86_64-v1 x86_64-v2 x86_64-v3' >> /root/.config/nix/nix.conf +``` + +### aarch64 (cyt-pi, uConsole) + +When building for aarch64 targets from an x86_64 container, you need either: +1. Remote builder (aarch64 machine does the build), or +2. QEMU-based emulation (slower but self-contained), or +3. Build directly on the aarch64 target using `--build-host` + +**For QEMU emulation in the container:** + +```dockerfile +# Enable binfmt for aarch64 emulation +RUN apt-get update && apt-get install -y --no-install-recommends \ + qemu-user-static \ + binfmt-support \ + && rm -rf /var/lib/apt/lists/* + +# Register aarch64 binfmt +RUN update-binfmts --enable qemu-aarch64 +``` + +**Container runtime (for QEMU):** +```bash +docker run --privileged --rm ... hermes-agent +# Or with specific capability: +docker run --cap-add=SYS_ADMIN --security-opt seccomp=unconfined ... hermes-agent +``` + +### Cross-Compilation + +For native cross-compilation (without emulation), add to your Nix configuration: + +```nix +# In your flake.nix or nix.conf +{ + nix.settings.extra-platforms = [ "aarch64-linux" "x86_64-linux" ]; + nix.settings.extra-sandbox-paths = [ ]; + boot.binfmt.emulatedSystems = [ "aarch64-linux" ]; +} +``` + +Or in `nix.conf`: +``` +extra-platforms = x86_64-linux aarch64-linux +extra-sandbox-paths = +``` + +--- + +## Post-Install Configuration + +### nix.conf for Container Usage + +Recommended `/root/.config/nix/nix.conf`: + +```ini +experimental-features = nix-command flakes +substituters = https://cache.nixos.org/ +trusted-users = root +max-jobs = auto +cores = 0 +sandbox = false +``` + +Note: `sandbox = false` is needed inside containers that lack full sandbox +support. This is safe in a single-tenant container environment. + +### PATH Setup + +Add to your Dockerfile: +```dockerfile +ENV PATH="/nix/var/nix/profiles/default/bin:/root/.nix-profile/bin:${PATH}" +``` + +### Shell Integration + +```dockerfile +RUN echo 'source /root/.nix-profile/etc/profile.d/nix.sh' >> /root/.bashrc +``` + +--- + +## Verification + +After installation, verify with: + +```bash +# Check Nix is available +nix --version + +# Check nixos-rebuild +nixos-rebuild --help | head -3 + +# Verify flakes are enabled +nix flake --help + +# Test a build (must be in infra repo) +cd /opt/data/infra +nix build --no-link '.#nixosConfigurations.lazyworkhorse.config.system.build.toplevel' + +# Check available systems +nix eval --impure --expr 'builtins.currentSystem' +``` + +--- + +## Container-Specific Considerations + +### Persistence + +The `/nix` directory should be a Docker volume to avoid re-downloading +packages on every container restart: + +```yaml +# docker-compose.yml +volumes: + - nix-store:/nix + +volumes: + nix-store: +``` + +Without persistence, every container restart requires re-downloading the +entire Nix store (~500 MB - 2 GB depending on packages used). + +### Disk Space + +The Nix store grows over time as old generations accumulate. Set up garbage +collection: + +```bash +# Manual GC +nix store gc + +# Remove old generations +nix-collect-garbage --delete-older-than 30d + +# Automatic GC (in nix.conf) +# Currently not supported in nix.conf, but you can run a cron job: +# nix store gc --max 10G +``` + +In Docker, limit store growth with: +```dockerfile +# Configure max store size +RUN mkdir -p /etc/nix && \ + echo 'min-free = 5368709120' > /etc/nix/nix.conf # Keep 5GB free +``` + +### Security + +Running Nix in a container introduces some security considerations: + +1. **Sandboxing:** `sandbox = false` disables build isolation. In a multi-tenant + container, this means Nix builds can affect the container filesystem. + **Mitigation:** Only build configs you trust (your own infra repo). + +2. **Network access:** The container needs outbound access to cache.nixos.org. + If using a restricted network, set up a local binary cache: + ```nix + substituters = https://cache.nixos.org/ https://nix-cache.internal/ + ``` + +3. **Privileged mode:** QEMU emulation for aarch64 builds may need `--privileged` + or `--security-opt seccomp=unconfined`. This reduces container isolation. + **Mitigation:** Use remote builders or build natively on the target. + +4. **Supply chain:** Nix derivations pin exact inputs via hashes. Verify + flake.lock is committed and reviewed. + +### Resource Constraints + +Nix builds can be memory and CPU intensive: + +```nix +# Limit build parallelism in nix.conf +max-jobs = 2 +cores = 4 + +# Or set per-build: +# nix build --max-jobs 2 --cores 4 +``` + +For containers with limited memory (< 2 GB), consider: +- Building on the target host instead (`--build-host`) +- Using the deploy script's `build` action separately + +--- + +## Integration with deploy.sh + +The deployment script at `scripts/deploy.sh` expects: + +1. **Nix installed** with flakes enabled +2. **SSH key** at `/opt/data/home/.ssh/id_hermes_gitea` (or via SSH_KEY env) +3. **Infra repo** cloned at the script's parent directory +4. **Network access** to: + - `code.lazyworkhorse.net:2222` (Gitea for git operations) + - Target hosts via SSH (see deploy-ssh-config) + - `cache.nixos.org` or a local substitute + +Typical usage from Hermes: + +```bash +# Full deployment +./scripts/deploy.sh lazyworkhorse master switch + +# Build-only check (no remote deployment) +./scripts/deploy.sh cyt-pi master build + +# Dry run +./scripts/deploy.sh uConsole feat/test dry-activate + +# Override SSH key +SSH_KEY=/opt/data/home/.ssh/my-custom-key ./deploy.sh lazyworkhorse +``` + +--- + +## Troubleshooting + +### "nix: command not found" + +- Ensure Nix is installed and PATH is set: + ```bash + export PATH="/nix/var/nix/profiles/default/bin:/root/.nix-profile/bin:$PATH" + ``` +- Check installation: `ls -la /nix/` should exist +- Re-source profile: `. /root/.nix-profile/etc/profile.d/nix.sh` + +### "error: unable to download ... cache.nixos.org" + +- Check network connectivity: `ping cache.nixos.org` +- Check DNS resolution from inside the container +- If behind a proxy, set `http_proxy` / `https_proxy` environment variables + +### "sandbox: cannot run build in sandbox" + +- Add `sandbox = false` to nix.conf +- Or run container with `--privileged` or `--security-opt seccomp=unconfined` + +### "aarch64-linux builds fail on x86_64" + +- QEMU binfmt not registered. Check: `ls /proc/sys/fs/binfmt_misc/` +- Rebuild QEMU registration: `docker run --privileged --rm tonistiigi/binfmt --install all` +- Or use `--build-host` to build on the target directly + +### "nixos-rebuild fails with SSH errors" + +- Verify SSH key exists and has correct permissions: + ```bash + ls -la /opt/data/home/.ssh/id_hermes_gitea + chmod 600 /opt/data/home/.ssh/id_hermes_gitea + ``` +- Test SSH manually: `ssh -p 2424 -i /opt/data/home/.ssh/id_hermes_gitea ai-worker@lazyworkhorse.net` +- Check target host is reachable: `nc -zv lazyworkhorse.net 2424` + +### "git fetch fails from Gitea" + +- Verify GIT_SSH_COMMAND is set: `echo $GIT_SSH_COMMAND` +- Test git SSH: `ssh -T git@code.lazyworkhorse.net -p 2222` +- Check the infra repo remote: `git remote -v` + +--- + +## References + +- [Determinate Systems Nix Installer](https://github.com/DeterminateSystems/nix-installer) +- [NixOS Manual: Installation](https://nixos.org/manual/nix/stable/installation/) +- [NixOS Wiki: Flakes](https://nixos.wiki/wiki/Flakes) +- [NixOS Wiki: nixos-rebuild](https://nixos.wiki/wiki/Nixos-rebuild) +- [NixOS Wiki: Cross Compilation](https://nixos.wiki/wiki/Cross_Compilation) +- [Multi-arch Docker with QEMU](https://github.com/multiarch/qemu-user-static) diff --git a/scripts/deploy-ssh-config b/scripts/deploy-ssh-config index 91f9a0e..5cd4567 100644 --- a/scripts/deploy-ssh-config +++ b/scripts/deploy-ssh-config @@ -1,30 +1,63 @@ # Hermes Container SSH Configuration # For NixOS deployment to remote hosts +# +# Usage: +# cp scripts/deploy-ssh-config ~/.ssh/config.d/hermes-include +# Or: cat scripts/deploy-ssh-config >> ~/.ssh/config +# +# This config covers all NixOS hosts managed from the Hermes container. +# Lazyworkhorse has two users: ai-worker (primary automation) and gortium (admin). +# Cyt-pi connects via reverse SSH tunnel on port 19999. +# uConsole is a placeholder until LAN-hostname resolution is confirmed. +# ── Global defaults ────────────────────────────────────────────────── +Host * + ServerAliveInterval 60 + ServerAliveCountMax 3 + TCPKeepAlive yes + Compression yes + CompressionLevel 6 + ControlMaster auto + ControlPath ~/.ssh/controlmasters/%r@%h:%p + ControlPersist 10m + StrictHostKeyChecking no + UserKnownHostsFile /dev/null + +# ── Hosts ────────────────────────────────────────────────────────────── + +# Lazyworkhorse — x86_64 main server (ai-worker@lazyworkhorse.net:2424) Host lazyworkhorse + HostName lazyworkhorse.net + User ai-worker + Port 2424 + IdentityFile /opt/data/home/.ssh/id_hermes_gitea + +# Lazyworkhorse — admin access (gortium@lazyworkhorse.net:2425) +Host lazyworkhorse-admin + HostName lazyworkhorse.net + User gortium + Port 2425 + IdentityFile /opt/data/home/.ssh/id_hermes_gitea + +# Cyt-pi — aarch64 Pi Zero 2 W +# Connected via reverse SSH tunnel (gortium directs tunnel to :19999) +Host cyt-pi HostName localhost User gortium + Port 19999 IdentityFile /opt/data/home/.ssh/id_hermes_gitea - StrictHostKeyChecking no - UserKnownHostsFile /dev/null -Host cyt-pi - HostName cyt-pi.local - User thierry +# uConsole — aarch64 ClockworkPi (placeholder hostname) +# Replace uconsole.lan with actual IP/hostname when deployed +Host uConsole uconsole + HostName uconsole.lan + User gortium + Port 22 IdentityFile /opt/data/home/.ssh/id_hermes_gitea - StrictHostKeyChecking no - UserKnownHostsFile /dev/null -Host uconsole - HostName uconsole.local - User thierry +# ── Gitea host — for git operations ────────────────────────────────── +Host code + HostName code.lazyworkhorse.net + Port 2222 + User gortium IdentityFile /opt/data/home/.ssh/id_hermes_gitea - StrictHostKeyChecking no - UserKnownHostsFile /dev/null - -# Generic pattern for .local hosts -Host *.local - User thierry - IdentityFile /opt/data/home/.ssh/id_hermes_gitea - StrictHostKeyChecking no - UserKnownHostsFile /dev/null diff --git a/scripts/deploy.sh b/scripts/deploy.sh old mode 100644 new mode 100755 index 34c6d61..11b9d33 --- a/scripts/deploy.sh +++ b/scripts/deploy.sh @@ -1,58 +1,286 @@ #!/usr/bin/env bash # NixOS Deployment Helper Script +# Remote NixOS deployment from Hermes container to target hosts. +# # Usage: ./deploy.sh [branch] [action] -# Example: ./deploy.sh uConsole feat/test switch +# +# Actions: +# switch Activate configuration now (default) +# boot Activate on next reboot +# test Activate without switching generations +# build Build locally only, no remote activation +# dry-activate Show what would change without applying +# +# Examples: +# ./deploy.sh lazyworkhorse # deploy master/switch to lazyworkhorse +# ./deploy.sh cyt-pi feat/test boot # deploy feat/test branch, activate on boot +# ./deploy.sh uConsole master build # just build, don't deploy +# NO_BUILD_CHECK=1 ./deploy.sh uConsole # skip the pre-flight nix build +# +# Environment variables: +# SSH_USER SSH user (default: auto-detected per host) +# SSH_PORT SSH port (default: auto-detected per host) +# SSH_KEY SSH identity file +# BUILD_HOST Build flake for this host (default: same as target host) +# NO_BUILD_CHECK Set to 1 to skip local nix build before deployment -set -e +set -euo pipefail +# ── Colors ────────────────────────────────────────────────────────────── +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +CYAN='\033[0;36m' +NC='\033[0m' # No Color + +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } +step() { echo -e "\n${CYAN}━━━ $* ━━━${NC}"; } + +# ── Cleanup trap ─────────────────────────────────────────────────────── +cleanup() { + local ec=$? + if [ $ec -ne 0 ]; then + error "Deployment failed with exit code $ec" + fi + exit $ec +} +trap cleanup EXIT + +# ── Usage / Help ─────────────────────────────────────────────────────── +show_usage() { + cat < [branch] [action] + +Remote NixOS deployment from Hermes container to target hosts. + +HOSTNAME (required): + lazyworkhorse x86_64 main server + cyt-pi aarch64 Pi Zero 2 W (via reverse tunnel) + uConsole aarch64 ClockworkPi + +BRANCH (optional, default: master): + Git branch or tag to deploy. Fetched from origin. + +ACTION (optional, default: switch): + switch Activate configuration now (default) + boot Activate on next reboot + test Activate without switching generations + build Build locally only, skip remote deployment + dry-activate Show what would change without applying + +Environment variables: + SSH_USER SSH username override + SSH_PORT SSH port override + SSH_KEY SSH identity file path + BUILD_HOST Build flake hostname (default: same as HOSTNAME) + NO_BUILD_CHECK Skip local nix build validation (set to 1) + +Examples: + $0 lazyworkhorse # deploy master/switch + $0 cyt-pi feat/test boot # deploy feature branch, boot + $0 uConsole master build # just build, no remote + NO_BUILD_CHECK=1 $0 uConsole # skip build check + +EOF + exit 0 +} + +# ── Argument parsing ─────────────────────────────────────────────────── HOSTNAME="${1:-}" -BRANCH="${2:-main}" +BRANCH="${2:-master}" ACTION="${3:-switch}" +NO_BUILD_CHECK="${NO_BUILD_CHECK:-0}" -if [ -z "$HOSTNAME" ]; then - echo "Usage: $0 [branch] [action]" - echo " hostname: lazyworkhorse, cyt-pi, uConsole" - echo " branch: git branch to deploy (default: main)" - echo " action: switch, test, boot (default: switch)" - exit 1 +if [ "$HOSTNAME" = "--help" ] || [ "$HOSTNAME" = "-h" ] || [ -z "$HOSTNAME" ]; then + show_usage fi -# Environment setup -export GIT_SSH_COMMAND="ssh -i /opt/data/home/.ssh/id_hermes_gitea -o StrictHostKeyChecking=no" +# ── Host configuration ───────────────────────────────────────────────── +case "$HOSTNAME" in + lazyworkhorse) + DEFAULT_SSH_USER="ai-worker" + DEFAULT_SSH_PORT="2424" + ARCH="x86_64-linux" + ;; + cyt-pi) + DEFAULT_SSH_USER="gortium" + DEFAULT_SSH_PORT="19999" + ARCH="aarch64-linux" + ;; + uConsole) + DEFAULT_SSH_USER="gortium" + DEFAULT_SSH_PORT="22" + ARCH="aarch64-linux" + ;; + *) + error "Unknown host: $HOSTNAME" + echo "Supported hosts: lazyworkhorse, cyt-pi, uConsole" + exit 1 + ;; +esac + +SSH_USER="${SSH_USER:-$DEFAULT_SSH_USER}" +SSH_PORT="${SSH_PORT:-$DEFAULT_SSH_PORT}" +SSH_KEY="${SSH_KEY:-/opt/data/home/.ssh/id_hermes_gitea}" +BUILD_HOST="${BUILD_HOST:-$HOSTNAME}" + +SSH_OPTS="-p $SSH_PORT -i $SSH_KEY -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" +SSH_TARGET="${SSH_USER}@${HOSTNAME}" +export GIT_SSH_COMMAND="ssh -i $SSH_KEY -p 2222 -o StrictHostKeyChecking=no" export PATH="/nix/var/nix/profiles/default/bin:$PATH" -cd /opt/data/infra - -echo "=== NixOS Deployment ===" -echo "Host: $HOSTNAME" -echo "Branch: $BRANCH" -echo "Action: $ACTION" +# ── Banner ───────────────────────────────────────────────────────────── +echo "╔══════════════════════════════════════════════╗" +echo "║ NixOS Remote Deployment ║" +echo "╚══════════════════════════════════════════════╝" +info "Host: $HOSTNAME ($ARCH)" +info "Branch: $BRANCH" +info "Action: $ACTION" +info "SSH: ${SSH_USER}@${HOSTNAME}:${SSH_PORT}" echo "" -# Checkout branch -echo "[1/4] Checking out branch..." -git fetch origin "$BRANCH" 2>/dev/null || true -git checkout "$BRANCH" 2>/dev/null || git checkout -b "$BRANCH" +# ── Pre-flight checks ───────────────────────────────────────────────── +step "Pre-flight checks" + +# 1. Check required tools +for cmd in nix git ssh; do + if ! command -v "$cmd" &>/dev/null; then + error "Required tool not found: $cmd" + exit 1 + fi +done +ok "Required tools available (nix, git, ssh)" + +# 2. Check infra repo +INFRA_DIR="$(cd "$(dirname "$0")/.." && pwd)" +if [ ! -d "$INFRA_DIR/.git" ]; then + error "Not a git repository: $INFRA_DIR" + exit 1 +fi +ok "Infra repo found at $INFRA_DIR" + +# 3. Check SSH connectivity (skip for build-only actions) +if [ "$ACTION" != "build" ]; then + if ssh $SSH_OPTS -o ConnectTimeout=5 "$SSH_TARGET" "echo connected" &>/dev/null; then + ok "SSH connectivity to $HOSTNAME verified" + else + warn "Cannot reach $HOSTNAME via SSH — deployment step will fail later" + fi +fi + +# ── Git sync ─────────────────────────────────────────────────────────── +step "Git sync" + +cd "$INFRA_DIR" + +# Stash local changes if any +if ! git diff --quiet HEAD; then + warn "Local changes detected, stashing..." + git stash push -m "auto-stash before deploy $(date -Iseconds)" + STASHED=1 +else + STASHED=0 +fi + +# Fetch and checkout +git fetch origin "$BRANCH" 2>/dev/null || git fetch origin master +if git rev-parse --verify "origin/$BRANCH" &>/dev/null 2>&1; then + # Remote branch exists — fast-forward merge + git checkout -B "$BRANCH" "origin/$BRANCH" +elif git rev-parse --verify "$BRANCH" &>/dev/null 2>&1; then + # Local branch or tag + git checkout "$BRANCH" +else + error "Branch/tag not found: $BRANCH" + exit 1 +fi +ok "Checked out $BRANCH ($(git rev-parse --short HEAD))" # Update submodules -echo "[2/4] Updating submodules..." -git submodule update --init --recursive +if [ -f .gitmodules ]; then + git submodule update --init --recursive + ok "Submodules updated" +fi -# Build configuration -echo "[3/4] Building configuration..." -if [ "$ACTION" = "switch" ]; then - nixos-rebuild switch --flake ".#$HOSTNAME" --target-host "thierry@$HOSTNAME" --use-remote-sudo -elif [ "$ACTION" = "test" ]; then - nixos-rebuild test --flake ".#$HOSTNAME" --target-host "thierry@$HOSTNAME" --use-remote-sudo -elif [ "$ACTION" = "boot" ]; then - nixos-rebuild boot --flake ".#$HOSTNAME" --target-host "thierry@$HOSTNAME" --use-remote-sudo +# ── Build validation ────────────────────────────────────────────────── +if [ "$NO_BUILD_CHECK" != "1" ]; then + step "Build validation" + info "Building nixosConfigurations.$BUILD_HOST (no link)..." + + if nix build --no-link --print-build-logs \ + ".#nixosConfigurations.${BUILD_HOST}.config.system.build.toplevel" 2>&1; then + ok "Build succeeded for $BUILD_HOST" + else + error "Build failed for $BUILD_HOST" + exit 1 + fi else - echo "Unknown action: $ACTION" - exit 1 + warn "Build check skipped (NO_BUILD_CHECK=1)" +fi + +# ── Deployment ───────────────────────────────────────────────────────── +if [ "$ACTION" = "build" ]; then + step "Build complete (no deployment)" + info "Use one of: switch, boot, test, dry-activate to deploy" + exit 0 +fi + +step "Deployment ($ACTION)" + +# Build the nixos-rebuild command +case "$ACTION" in + switch|boot|test) + nixos-rebuild "$ACTION" \ + --flake ".#$HOSTNAME" \ + --target-host "$SSH_TARGET" \ + --build-host "localhost" \ + --use-remote-sudo \ + --max-jobs 4 + ;; + dry-activate) + nixos-rebuild dry-activate \ + --flake ".#$HOSTNAME" \ + --target-host "$SSH_TARGET" \ + --build-host "localhost" \ + --use-remote-sudo + ;; + *) + error "Unknown action: $ACTION" + echo "Valid actions: switch, boot, test, build, dry-activate" + exit 1 + ;; +esac + +# ── Check result ─────────────────────────────────────────────────────── +DEPLOY_EXIT=$? +if [ $DEPLOY_EXIT -eq 0 ]; then + echo "" + ok "Deployment to $HOSTNAME ($ACTION) completed successfully" + case "$ACTION" in + switch|test) + info "Configuration is now active" + ;; + boot) + info "Configuration will activate on next reboot" + ;; + dry-activate) + info "Dry-run complete — no changes applied" + ;; + esac +else + error "Deployment failed with exit code $DEPLOY_EXIT" + exit $DEPLOY_EXIT fi echo "" -echo "[4/4] Deployment complete!" -echo "Host: $HOSTNAME" -echo "Branch: $BRANCH" -echo "Time: $(date -Iseconds)" +echo "╔══════════════════════════════════════════════╗" +echo "║ Deployment Complete ║" +echo "╚══════════════════════════════════════════════╝" +info "Host: $HOSTNAME" +info "Branch: $BRANCH ($(git rev-parse --short HEAD))" +info "Action: $ACTION" +info "Time: $(date -Iseconds)"