diff --git a/assets/ai-optimizer/README.md b/assets/ai-optimizer/README.md
new file mode 100644
index 0000000..cde9392
--- /dev/null
+++ b/assets/ai-optimizer/README.md
@@ -0,0 +1,194 @@
+# AI Model Optimizer - Ollama GPU Benchmark Plan
+
+**Purpose:** Find optimal ollama configurations for maximum context size and GPU utilization on AMD MI50 GPUs.
+
+**Hardware:**
+- 2x AMD MI50 GPUs (32GB VRAM each, 64GB total)
+- 128GB system RAM
+- ROCm: `HSA_OVERRIDE_GFX_VERSION=9.0.6`, `HIP_VISIBLE_DEVICES=0,1`
+
+---
+
+## File Locations
+
+```
+STATE:   /opt/data/infra/assets/ai-optimizer/state.json
+RESULTS: /opt/data/infra/assets/ai-optimizer/results.csv
+REPO:    /opt/data/infra (persistent clone)
+```
+
+---
+
+## Model Queues
+
+### GPU Track (Coding - prioritize speed + context on GPU)
+1. `deepseek-coder-v2:16b` - Best coding model, fits on GPU
+2. `qwen2.5-coder:32b` - Alternative coding model
+3. `codellama:34b-instruct` - Legacy option
+
+### RAM Track (Knowledge - prioritize max context)
+1. `qwen2.5:72b` - Large knowledge model
+2. `nemotron-3-nano:30b` - Efficient large model
+3. `mixtral:8x7b-instruct` - MoE architecture
+
+---
+
+## Context Steps (in order)
+
+```
+[32768, 65536, 98304, 131072, 163840, 200704, 262144, 327680]
+```
+
+---
+
+## Optimization Strategy
+
+### GPU Track (Coding)
+- Start: `num_ctx=32768`, `num_gpu=99`, `flash_attn=true`
+- Increase context until OOM or tokens/sec < 5
+- Record best config before hitting wall
+- Target: >10 tokens/sec with max context
+
+### RAM Track (Knowledge)
+- Start: `num_ctx=65536`, `num_gpu=50`, `flash_attn=true`
+- Allow heavy RAM offload (up to 100GB system RAM)
+- Increase context until OOM
+- Speed secondary to context size
+
+---
+
+## Prerequisites
+
+This PR adds the `ai-worker` user with docker group access. After merge:
+
+```bash
+# SSH from Hermes container to run benchmarks on the host
+ssh -i /path/to/key ai-worker@host docker exec ollama ollama list
+
+# Or if running directly on host
+docker exec ollama ollama list
+```
+
+---
+
+## Manual Testing Workflow
+
+### 1. Quick Model Test
+
+```bash
+docker exec ollama ollama run <model>:<tag> "Your prompt here"
+```
+
+### 2. Check Current State
+
+```bash
+cd /opt/data/infra
+cat assets/ai-optimizer/state.json
+```
+
+### 3. Pull Model (if needed)
+
+```bash
+docker exec ollama ollama pull <model>:<tag>
+```
+
+### 4. Create Test Modelfile
+
+```bash
+docker exec ollama bash -c "cat <<EOF > /root/.ollama/test_${model}.modelfile
+FROM ${model}
+PARAMETER num_ctx ${num_ctx}
+PARAMETER num_gpu ${num_gpu}
+PARAMETER flash_attn true
+PARAMETER num_predict 4096
+PARAMETER num_keep 1024
+PARAMETER repeat_penalty 1.1
+EOF"
+
+docker exec ollama ollama create test-model -f /root/.ollama/test_${model}.modelfile
+```
+
+### 5. Run Benchmark
+
+```bash
+# Warm up
+docker exec ollama ollama run test-model "Hello" > /dev/null
+
+# Coding prompt
+docker exec ollama ollama run test-model "Write a Python async context manager that retries a function with exponential backoff, max 5 retries, and logs each attempt using structlog. Include type hints."
+
+# Knowledge prompt
+docker exec ollama ollama run test-model "Explain the complete memory hierarchy in modern GPUs, from registers through L1/L2 caches to VRAM, and how data moves between them during matrix multiplication."
+```
+
+### 6. Measure VRAM
+
+```bash
+# Try host first
+rocm-smi --showmeminfo vram 2>/dev/null || \
+# Try via docker
+docker exec --privileged ollama rocm-smi --showmeminfo vram 2>/dev/null || \
+echo "VRAM unavailable"
+```
+
+### 7. Record Results
+
+Update `state.json` and append to `results.csv`:
+- tokens/sec from ollama output
+- VRAM/RAM usage
+- Whether this config is the new best
+
+### 8. Commit Changes
+
+```bash
+cd /opt/data/infra
+git add assets/ai-optimizer/
+git commit -m "ai-optimizer: tested ${model} at ${num_ctx} ctx - ${status}"
+git push
+```
+
+---
+
+## State File Structure
+
+```json
+{
+  "track": "gpu",
+  "current_model": "deepseek-coder-v2:16b",
+  "model_index": 0,
+  "phase": "context_scaling",
+  "backend": "ollama",
+  "current_config": {
+    "num_ctx": 32768,
+    "num_gpu": 99,
+    "flash_attn": true
+  },
+  "best_configs": {
+    "gpu": {},
+    "ram": {}
+  },
+  "completed_models": [],
+  "gpu_queue": ["deepseek-coder-v2:16b", "qwen2.5-coder:32b", "codellama:34b-instruct"],
+  "ram_queue": ["qwen2.5:72b", "nemotron-3-nano:30b", "mixtral:8x7b-instruct"],
+  "context_steps": [32768, 65536, 98304, 131072, 163840, 200704, 262144, 327680],
+  "last_updated": "2026-04-30T00:00:00Z"
+}
+```
+
+---
+
+## Results CSV Format
+
+```csv
+timestamp,track,model,backend,phase,num_ctx,num_gpu,flash_attn,tokens_per_sec,vram_gb,ram_gb,status,is_best
+```
+
+---
+
+## Notes
+
+- **Manual execution** - Run benchmarks when needed, no automated cron job
+- **Two tracks**: Complete GPU track first (coding models), then RAM track
+- **Backend**: ollama (llama.cpp optional for advanced users)
+- **Host access**: Use docker exec (or SSH via ai-worker) for rocm-smi
+- **Commit results**: Push best configs to repo for reference
diff --git a/assets/ai-optimizer/results.csv b/assets/ai-optimizer/results.csv
new file mode 100644
index 0000000..7e25194
--- /dev/null
+++ b/assets/ai-optimizer/results.csv
@@ -0,0 +1 @@
+timestamp,track,model,backend,phase,num_ctx,num_gpu,flash_attn,tokens_per_sec,vram_gb,ram_gb,status,is_best
diff --git a/assets/ai-optimizer/state.json b/assets/ai-optimizer/state.json
new file mode 100644
index 0000000..08dac90
--- /dev/null
+++ b/assets/ai-optimizer/state.json
@@ -0,0 +1,21 @@
+{
+  "track": "gpu",
+  "current_model": "deepseek-coder-v2:16b",
+  "model_index": 0,
+  "phase": "context_scaling",
+  "backend": "ollama",
+  "current_config": {
+    "num_ctx": 32768,
+    "num_gpu": 99,
+    "flash_attn": true
+  },
+  "best_configs": {
+    "gpu": {},
+    "ram": {}
+  },
+  "completed_models": [],
+  "gpu_queue": ["deepseek-coder-v2:16b", "qwen2.5-coder:32b", "codellama:34b-instruct"],
+  "ram_queue": ["qwen2.5:72b", "nemotron-3-nano:30b", "mixtral:8x7b-instruct"],
+  "context_steps": [32768, 65536, 98304, 131072, 163840, 200704, 262144, 327680],
+  "last_updated": "2026-05-09T00:00:00Z"
+}