Switch all Ollama models to gemma3n:e4b on node-197 GPU

Bake-off results: gemma3n:e4b (80.6% cmd match, 100% safety, 5.9s) outperforms qwen3-coder:30b on all metrics. Updated paper, shrink, and langgraph gateway configs. Frees steel141 for LoRA training. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 10:29:54 -04:00
parent 30aa8388e3
commit ba4a2f4262
4 changed files with 24 additions and 10 deletions
@@ -113,3 +113,13 @@ For shrink-world use port `25576` and password `REDACTED_RCON`.
 - External access requires port forwarding on router: `25565` and `25566` → `192.168.0.244`
 - Web panel accessible via Caddy at `mc.sethpc.xyz`
 - DNS: Pi-hole at `192.168.0.153`
+
+---
+
+## AI / Ollama
+
+- **Ollama instance:** `192.168.0.179:11434` (CT 105, node-197, Quadro RTX 4000 8GB)
+- **Model (message + command + tool):** `gemma3n:e4b` (6.9B, Q4_K_M, GPU-accelerated)
+- **LangGraph gateway model:** `gemma3n:e4b` (was `qwen2.5:1.5b` for tools)
+- **Previous:** `192.168.0.141:11434` (steel141), `gemma3:12b` + `qwen3-coder:30b`
+- **Changed:** 2026-03-18 after bake-off showed gemma3n:e4b outperforms all tested models