diff --git a/README.md b/README.md
index 306126a..169804b 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ Research corpus and implementation guidance for Google Gemma 4, based on product
 | `docs/openwebui-setup.md` | How to configure Gemma 4 inside OpenWebUI — per-setting reference, two ready-to-bake Workspace Model profiles (chat + extract), and a symptom→cause troubleshooting table mapped back to GOTCHAS.md. Assumes Ollama + OpenWebUI are already running. | When setting up or debugging a Gemma 4 model in OpenWebUI, or handing the front-end config to someone else |
 | `docs/reference/bakeoff-2026-04-18.md` | CLI-coding-agent bakeoff on 3090 Ti. **Rounds 1/2 misidentified the cause; Round 3 (the correct one): `think: false` silent-stops gemma4:26b at certain multi-turn states on 32K context.** 31B and Qwen3-Coder robust to the flag. Harness at `scripts/bakeoff/` | When deciding which model to back a CLI agent with, writing a custom agent payload, or debugging a silent tool-call halt |
 | `docs/reference/mort-bakeoff-2026-04-18.md` | mort-bot-specific `think=true` vs `think=false` bakeoff on mort's actual loop shape (gemma4:26b, num_ctx=8192). **Thinking does NOT accumulate in context on Ollama 0.20.4** — strips it from serialized history. Both settings behave identically on step counts, tool counts, wall clock. Harness at `scripts/mort-bakeoff/` | When deciding mort-bot's THINK env var, or when someone claims "think=true eats context" without pinning an Ollama version |
-| `docs/reference/gpu-bakeoff-2026-04-20.md` | Cross-GPU throughput bakeoff: steel141 RTX 3090 Ti vs matt-strix (AMD Strix Halo). **3090 Ti wins decode decisively (128 tok/s on 26B MoE). Strix gets ~42% of that on ~25% of the bandwidth.** Also quantifies the MoE vs dense gap: 26B decodes ~4.7× faster than 31B on both cards. Harness at `scripts/gpu-bakeoff/` | When choosing which host to run a Gemma 4 workload on |
+| `docs/reference/gpu-bakeoff-2026-04-20.md` | Cross-GPU throughput bakeoff: steel141 RTX 3090 Ti vs strix-halo (AMD Strix Halo). **3090 Ti wins decode decisively (128 tok/s on 26B MoE). Strix gets ~42% of that on ~25% of the bandwidth.** Also quantifies the MoE vs dense gap: 26B decodes ~4.7× faster than 31B on both cards. Harness at `scripts/gpu-bakeoff/` | When choosing which host to run a Gemma 4 workload on |
 | `tooling/` | **Canonical upstream tooling** — real scripts, notebooks, model cards, and configs pulled from Google / HF / framework maintainers (147 files). Subdirs: `google-official/`, `huggingface/`, `inference-frameworks/`, `gemma-family/`, `fine-tuning/`. See `tooling/README.md` for index and findings that update the older `CORPUS_*` docs | When you need authoritative source material — model cards, chat templates, fine-tuning recipes, serving commands for vLLM / llama.cpp / MLX, or to scope a specialized sibling (ShieldGemma, EmbeddingGemma, etc.) |
 
 ## Source Projects
diff --git a/docs/reference/gpu-bakeoff-2026-04-20.md b/docs/reference/gpu-bakeoff-2026-04-20.md
index 2706d9d..4185180 100644
--- a/docs/reference/gpu-bakeoff-2026-04-20.md
+++ b/docs/reference/gpu-bakeoff-2026-04-20.md
@@ -1,7 +1,7 @@
 # GPU Bakeoff — Gemma 4 Throughput: 3090 Ti vs Strix Halo
 
 **Date:** 2026-04-20
-**Host matrix:** steel141 (RTX 3090 Ti) · matt-strix (AMD Strix Halo iGPU)
+**Host matrix:** steel141 (RTX 3090 Ti) · strix-halo (AMD Strix Halo iGPU)
 **Models:** `gemma4:26b` (MoE Q4_K_M) · `gemma4:31b-it-q4_K_M` (dense Q4_K_M)
 **Harness:** `scripts/gpu-bakeoff/harness.py`
 **Raw data:** `scripts/gpu-bakeoff/runs/`
@@ -13,7 +13,7 @@
 | GPU | 26B (MoE) decode | 31B (dense) decode | Long-prompt prefill (26B) |
 |-----|------------------|--------------------|-----------------------|
 | **RTX 3090 Ti** (steel141) | **128 tok/s** | **27 tok/s** | **23,849 tok/s** |
-| **AMD Strix Halo iGPU** (matt-strix) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |
+| **AMD Strix Halo iGPU** (strix-halo) | 54 tok/s (42%) | 11 tok/s (39%) | 14,326 tok/s (60%) |
 
 ### Headline findings
 
@@ -34,8 +34,8 @@
 
 | Host | GPU | VRAM | Bandwidth | Compute cap | Notes |
 |------|-----|------|-----------|-------------|-------|
-| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Seth's workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on 127.0.0.1:11434. |
-| matt-strix | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama on 100.117.155.64:11434 via Tailscale. |
+| steel141 | RTX 3090 Ti | 24 GB GDDR6X | ~1008 GB/s | 8.6 (Ampere) | Workstation. Also has a GTX 1660 SUPER as aux display card — not used for inference. Ollama on localhost. |
+| strix-halo | AMD Strix Halo (Radeon 890M iGPU + XDNA 2 NPU) | Shared LPDDR5X | ~256 GB/s | — | Unified memory lets it fit models a 24 GB card can't. Ollama accessed via Tailscale. |
 
 ---
 
@@ -151,7 +151,7 @@ and matches or slightly exceeds proportionally.
 
 1. **Strix max-model fit.** Strix can host models that wouldn't fit the
    3090 Ti. A follow-up would pull a larger model (70 B+ quantized) on
-   matt-strix and measure the Strix-only performance ceiling.
+   strix-halo and measure the Strix-only performance ceiling.
 2. **Q8 vs Q4 on Strix.** Same model, two quantizations — quality/speed
    tradeoff characterization.
 
@@ -166,7 +166,7 @@ runs/
 ├── steel141/
 │   ├── gemma4-26b/{short,long}.json
 │   └── gemma4-31b/{short,long}.json
-└── matt-strix/
+└── strix-halo/
     ├── gemma4-26b/{short,long}.json
     └── gemma4-31b/{short,long}.json
 ```
diff --git a/scripts/gpu-bakeoff/harness.py b/scripts/gpu-bakeoff/harness.py
index fd346a2..b07771d 100644
--- a/scripts/gpu-bakeoff/harness.py
+++ b/scripts/gpu-bakeoff/harness.py
@@ -5,7 +5,7 @@ three hosts:
 
   - steel141  : RTX 3090 Ti (24 GB GDDR6X, compute 8.6, ~1008 GB/s)
   - pve197    : Tesla V100-PCIE-32GB (32 GB HBM2, compute 7.0, ~900 GB/s)
-  - matt-strix: AMD Strix Halo iGPU (shared LPDDR5X, ~256 GB/s)
+  - strix-halo: AMD Strix Halo iGPU (shared LPDDR5X, ~256 GB/s)
 
 Per (host, model, prompt_length), runs 1 warmup + N measurement runs,
 records Ollama's canonical timing fields, and writes one JSON trace to
@@ -15,6 +15,13 @@ All three Ollama servers are polled via HTTP; no SSH required. All
 timings come from Ollama's own /api/generate response fields so wall-
 clock jitter between the harness and the server is excluded.
 
+Host URLs are resolved from environment variables so routable addresses
+don't live in source. Set these before running against non-local hosts:
+
+    OLLAMA_STEEL141_URL=http://127.0.0.1:11434
+    OLLAMA_PVE197_URL=http://<lan-ip>:11434
+    OLLAMA_STRIX_URL=http://<tailscale-ip>:11434
+
 Invocation:
     python3 harness.py --host steel141 --model gemma4:26b --prompt short
     python3 harness.py all   # runs the full planned matrix
@@ -24,6 +31,7 @@ from __future__ import annotations
 
 import argparse
 import json
+import os
 import sys
 import time
 import urllib.request
@@ -31,16 +39,30 @@ from pathlib import Path
 
 
 HOSTS = {
-    "steel141":   {"url": "http://127.0.0.1:11434",       "gpu": "RTX 3090 Ti",           "vram_gb": 24},
-    "pve197":     {"url": "http://192.168.0.179:11434",   "gpu": "Tesla V100-PCIE-32GB",  "vram_gb": 32},
-    "matt-strix": {"url": "http://100.117.155.64:11434",  "gpu": "AMD Strix Halo iGPU",   "vram_gb": None},
+    "steel141":   {"url_env": "OLLAMA_STEEL141_URL", "default_url": "http://127.0.0.1:11434",
+                   "gpu": "RTX 3090 Ti",          "vram_gb": 24},
+    "pve197":     {"url_env": "OLLAMA_PVE197_URL",   "default_url": None,
+                   "gpu": "Tesla V100-PCIE-32GB", "vram_gb": 32},
+    "strix-halo": {"url_env": "OLLAMA_STRIX_URL",    "default_url": None,
+                   "gpu": "AMD Strix Halo iGPU",  "vram_gb": None},
 }
 
-# Per-host model tag mapping. matt-strix uses gemma4:31b, the others
+
+def _host_url(host: str) -> str:
+    cfg = HOSTS[host]
+    url = os.environ.get(cfg["url_env"]) or cfg["default_url"]
+    if not url:
+        raise RuntimeError(
+            f"host {host!r} has no URL — set ${cfg['url_env']} in env"
+        )
+    return url
+
+
+# Per-host model tag mapping. strix-halo uses gemma4:31b, the others
 # use gemma4:31b-it-q4_K_M — identical weights, different tags.
 MODEL_ALIASES = {
-    "gemma4:26b":  {"steel141": "gemma4:26b",            "pve197": "gemma4:26b",            "matt-strix": "gemma4:26b"},
-    "gemma4:31b":  {"steel141": "gemma4:31b-it-q4_K_M",  "pve197": "gemma4:31b-it-q4_K_M",  "matt-strix": "gemma4:31b"},
+    "gemma4:26b":  {"steel141": "gemma4:26b",            "pve197": "gemma4:26b",            "strix-halo": "gemma4:26b"},
+    "gemma4:31b":  {"steel141": "gemma4:31b-it-q4_K_M",  "pve197": "gemma4:31b-it-q4_K_M",  "strix-halo": "gemma4:31b"},
     # V100-only edge case — only 32 GB host has headroom for the Q8 MoE.
     "gemma4:26b-q8":  {"pve197": "gemma4:26b-a4b-it-q8_0"},
 }
@@ -151,7 +173,7 @@ def run_matrix(
         return {"host": host, "model_alias": model_alias, "skipped": "model not available on host"}
 
     prompt = PROMPTS[prompt_key]
-    url = host_cfg["url"]
+    url = _host_url(host)
 
     trace = {
         "host": host,
diff --git a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b-q8/long.json b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b-q8/long.json
similarity index 75%
rename from scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b-q8/long.json
rename to scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b-q8/long.json
index 7b0d94d..c0378ad 100644
--- a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b-q8/long.json
+++ b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b-q8/long.json
@@ -1,5 +1,5 @@
 {
-  "host": "matt-strix",
+  "host": "strix-halo",
   "model_alias": "gemma4:26b-q8",
   "skipped": "model not available on host"
 }
\ No newline at end of file
diff --git a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b-q8/short.json b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b-q8/short.json
similarity index 75%
rename from scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b-q8/short.json
rename to scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b-q8/short.json
index 7b0d94d..c0378ad 100644
--- a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b-q8/short.json
+++ b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b-q8/short.json
@@ -1,5 +1,5 @@
 {
-  "host": "matt-strix",
+  "host": "strix-halo",
   "model_alias": "gemma4:26b-q8",
   "skipped": "model not available on host"
 }
\ No newline at end of file
diff --git a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b/long.json b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b/long.json
similarity index 98%
rename from scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b/long.json
rename to scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b/long.json
index 97ecd3c..69dd030 100644
--- a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b/long.json
+++ b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b/long.json
@@ -1,5 +1,5 @@
 {
-  "host": "matt-strix",
+  "host": "strix-halo",
   "gpu": "AMD Strix Halo iGPU",
   "vram_gb": null,
   "model_alias": "gemma4:26b",
diff --git a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b/short.json b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b/short.json
similarity index 98%
rename from scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b/short.json
rename to scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b/short.json
index f49e2dc..48e547f 100644
--- a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-26b/short.json
+++ b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-26b/short.json
@@ -1,5 +1,5 @@
 {
-  "host": "matt-strix",
+  "host": "strix-halo",
   "gpu": "AMD Strix Halo iGPU",
   "vram_gb": null,
   "model_alias": "gemma4:26b",
diff --git a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-31b/long.json b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-31b/long.json
similarity index 98%
rename from scripts/gpu-bakeoff/runs/matt-strix/gemma4-31b/long.json
rename to scripts/gpu-bakeoff/runs/strix-halo/gemma4-31b/long.json
index f9e4faf..1f684b5 100644
--- a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-31b/long.json
+++ b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-31b/long.json
@@ -1,5 +1,5 @@
 {
-  "host": "matt-strix",
+  "host": "strix-halo",
   "gpu": "AMD Strix Halo iGPU",
   "vram_gb": null,
   "model_alias": "gemma4:31b",
diff --git a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-31b/short.json b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-31b/short.json
similarity index 98%
rename from scripts/gpu-bakeoff/runs/matt-strix/gemma4-31b/short.json
rename to scripts/gpu-bakeoff/runs/strix-halo/gemma4-31b/short.json
index 73af9f8..46c2b46 100644
--- a/scripts/gpu-bakeoff/runs/matt-strix/gemma4-31b/short.json
+++ b/scripts/gpu-bakeoff/runs/strix-halo/gemma4-31b/short.json
@@ -1,5 +1,5 @@
 {
-  "host": "matt-strix",
+  "host": "strix-halo",
   "gpu": "AMD Strix Halo iGPU",
   "vram_gb": null,
   "model_alias": "gemma4:31b",