Three-tier constraint model, mode-aware eval, boundary examples, playtest tooling

Eval harness: - Mode-aware scoring: sudo=strict (exact match), pray/god=soft (category match, in-character, appropriate intensity) - New metrics: cmd_category_match, appropriate_intensity, scoring_mode breakdown - Eval defaults to steel141 (192.168.0.141) — prod GPU reserved for serving Dataset (213 examples): - Added 31 boundary/adversarial examples (safety edges, abstention, near-boundary) - Updated pray example reasoning: character-driven logic, not prescriptive outputs - Tagged pray examples with scoring_mode=soft Playtest tooling: - whitelist.sh: add/remove/list across all 3 servers - FRIENDS_INVITE.md + Discord version: playtester recruitment docs - Server addresses and implementation details for both training servers PLAN.md: - Three-tier constraint model documented (sudo/pray/god_system) - Success criteria split by scoring mode - All session decisions logged Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:57:01 -04:00
parent 38b9a02e45
commit 9d789d2524
8 changed files with 516 additions and 82 deletions
@@ -359,6 +359,10 @@ These are ideas to explore after the core system is working. Prioritize based on
 | 2026-03-18 | Dual tool-set architecture: RCON tools + Mineflayer tools | RCON for admin ops (server-side), Mineflayer for in-game presence (client-side). Same model, different tool sets per deployment |
 | 2026-03-18 | Offline dev Paper server for training bots | Dedicated offline-mode Paper 1.21.11 on port 25568. Allows unlimited Mineflayer bots without auth, world resets, destructive testing |
 | 2026-03-18 | Extract training data from existing repair code | Every hardcoded syntax fixer in mc_aigod_paper.py encodes a wrong->correct pair. 31 seed examples extracted from 10 repair functions, prayer logs, and session history |
+| 2026-03-18 | Three-tier constraint model: sudo (loose) / pray (character-driven) / god_system (probabilistic benevolent) | Sudo: only refuse server-killing commands, do what the admin asked. Pray: God decides based on worthiness/character/mood, not a safety filter. God_system: mostly benevolent, occasionally mischievous, rarely wrathful. Constraints are a spectrum, not a binary. |
+| 2026-03-18 | Mode-aware eval scoring | Sudo scored strict (exact command match). Pray/god scored soft (command category match, in-character message, appropriate intensity). Exact match meaningless for pray — God's creative interpretation is a feature. |
+| 2026-03-18 | Validator improvements: 5 new syntax repair functions | @s→player, NBT→component enchants, strip invalid components, hallucinated effect/command repair. Deployed to paper-ai. Every repair is a negative→positive training pair. |
+| 2026-03-18 | Eval/testing on steel141 (RTX 3090 Ti), not prod RTX 4000 | All eval scripts default to 192.168.0.141:11434. Prod GPU reserved for live serving only. |

 ---

@@ -411,14 +415,19 @@ node spawn_bots.js 10           # Spawn 10 bots

 | Metric | Actual Baseline (gemma3n) | Actual Baseline (qwen3:8b) | Fine-Tuned Target |
 |--------|:-------------------------:|:--------------------------:|:-----------------:|
+| **Sudo (strict scoring)** | | | |
 | Command match (loose) | 59.2% | 73.7% | 85%+ |
 | Exact match (strict) | 10.5% | 18.4% | 40%+ |
-| Syntax correctness | 82.9% | 82.9% | 95%+ |
+| RCON success (live) | 33.1% | 34.6% | 70%+ |
 | Safety compliance | 93.4% | 92.1% | 99%+ |
+| **Pray (soft scoring)** | | | |
+| Command category match | — | — | 80%+ |
+| Has in-character message | — | — | 95%+ |
+| Appropriate intensity | — | — | 90%+ |
+| **All modes** | | | |
+| Syntax correctness | 82.9% | 82.9% | 95%+ |
 | Hallucination rate | 0% | 0% | 0% |
 | Empty response rate | 9.2% | 14.5% | <3% |
-| Troubleshoot category | 16.7% | 33.3% | 70%+ |
-| Info category | 0.0% | 66.7% | 80%+ |
 | Response latency (avg) | 6.4s | 13.5s | <5s |

 ---