data: first operator run — 4 issues found, 3 notes, 2 escalations
Noob playtest: 4/7 passed, 3 blocked (perms). Found: context bleed via Mind's Eye, opus slot misconfigured, player_message tag leaking, test players need perms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"title": "Cross-session context bleed via Mind's Eye",
|
||||
"severity": "medium",
|
||||
"description": "Ask mode queried slingshooter08 instead of TestNoob after pray mode discovered that player was online. Session IDs are separate but Mind's Eye world context injection includes online player names, which the model targets instead of the requesting player.",
|
||||
"evidence": "Noob playtest test 6: ask 'how do I find diamonds' — response referenced slingshooter08's position, not TestNoob's",
|
||||
"suggested_fix": "Mind's Eye context injection should prioritize the requesting player, or clearly label other players as 'other online players' so the model doesn't confuse them with the requester.",
|
||||
"timestamp": "2026-03-28T19:30:00",
|
||||
"status": "open"
|
||||
}
|
||||
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"title": "Opus brain slot shows ollama provider with Claude model",
|
||||
"severity": "medium",
|
||||
"description": "Gateway status shows opus slot configured as provider=ollama, model=claude-opus-4-20250514. Ollama cannot run Claude models. This may be a stale override or config error.",
|
||||
"evidence": "curl http://localhost:8500/v2/status — model_slots.opus shows provider=ollama",
|
||||
"suggested_fix": "Check agents.yaml opus section and any in-memory overrides. Should be provider=anthropic or provider=codex.",
|
||||
"timestamp": "2026-03-28T19:30:00",
|
||||
"status": "open"
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
# context-bleed
|
||||
|
||||
## 2026-03-28 19:30
|
||||
|
||||
Cross-session context bleed observed: ask mode queried slingshooter08 (online player) instead of TestNoob after pray mode discovered that player. Session IDs are separate (TestNoob:ask vs TestNoob:pray) but Mind's Eye world context injection includes online player names, which the model then targets instead of the requesting player. This could cause confused responses in production.
|
||||
@@ -0,0 +1,5 @@
|
||||
# playtest-permissions
|
||||
|
||||
## 2026-03-28 19:30
|
||||
|
||||
Bot test players (TestNoob, TestBuilder, etc.) need permissions granted for all modes before playtesting. Use /raw to grant: perms_manage grant TestNoob sudo,pray,ask,raw. Without this, 3/7 noob commands are blocked by the permission check.
|
||||
@@ -0,0 +1,5 @@
|
||||
# tag-leaking
|
||||
|
||||
## 2026-03-28 19:30
|
||||
|
||||
`<player_message>` XML tags sometimes leak into response_text for /ask mode responses. The gateway's tag stripping (_parse_player_message) should catch these but isn't in all cases. Observed on ask mode test: "what does redstone do".
|
||||
@@ -0,0 +1,14 @@
|
||||
## 2026-03-28 19:30 — First Run
|
||||
|
||||
**Gateway**: UP, v1.0.0-alpha, 10 sessions, TPS 20.0, 1 player online
|
||||
**Playtest**: noob profile — 4/7 passed, 3 blocked (TestNoob lacks /sudo perms)
|
||||
|
||||
**Issues found**:
|
||||
1. [MEDIUM] Cross-session context bleed — ask mode queried slingshooter08 instead of TestNoob after pray mode discovered that player
|
||||
2. [MEDIUM] Opus brain slot shows provider=ollama with model=claude-opus-4 — invalid combo
|
||||
3. [LOW] `<player_message>` tags leaking into response_text on ask mode
|
||||
4. [INFO] TestNoob needs /sudo perms for full playtest coverage
|
||||
|
||||
**Actions**: None (Layer 1 — escalated all issues)
|
||||
|
||||
---
|
||||
Reference in New Issue
Block a user