Files

T

claude (blind_chess) 04494fcdee docs: handoff for blind Casual check-resolution fix

Captures session state: root cause, fix, verification numbers (blind 100%
-> 17% resignation, avg ply 26 -> 90), preserved view-filter invariant,
deferred Phase 2 work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-29 06:05:21 -04:00

13 KiB

Raw Blame History

Handoff: Blind Casual check-resolution fix shipped

Session Metadata

Created: 2026-04-29 06:01:21 UTC
Project: /home/claude/bin/blind_chess
Branch: main (commits dc7f8ad, f00164e pushed)
Session duration: ~1 hour
Live URL: https://chess.sethpc.xyz (deployed and verified)

Recent Commits (for context)

f00164e chore: gitignore tmp/ for self-play transcripts
dc7f8ad fix(bot): blind Casual no longer resigns prematurely under check
1213ec8 docs: handoff reflects final merged state
1674695 docs: AI Phase 1 shipped — context, decisions, handoff
7c18725 feat(bot): vanilla CasualBrain delegates to js-chess-engine

Handoff Chain

Continues from: 2026-04-28-191500-ai-phase-1-shipped.md — Phase 1 (Casual bot) deployed; the prior handoff predicted this exact bug as a deferred risk: "the heuristic exhausts its retry cap (5) when the bot picks a move that can't legally proceed in blind mode... Consider raising retry cap or improving heuristic if blind Casual feels broken in real play."
Supersedes: None.

Current State Summary

User reported: "casual bot is resigning prematurely." Investigation confirmed the prior handoff's prediction. Vanilla mode is rock-solid (0 resigns across 80 stress games); blind mode was 100% resign at avg ply 26 in self-play. Root cause: CasualBrain.heuristicPick ignored the <own>_in_check moderator announcement and scored moves on capture/advance signals uncorrelated with check resolution. chess.js rejected every non-resolving attempt, BotDriver.RETRY_CAP=5 fired, and the bot resigned. Fix shipped in two commits, deployed to CT 690, smoke-tested. Blind self-play (100 games): resigns 100% → 17%, avg ply 26 → 90. Vanilla regression check confirmed unchanged strength.

Architecture Overview

The fix preserves the spec's view-filter invariant — the brain still sees only its own pieces + announcements, no oracle access added. The data needed to detect check was already being delivered to the brain in newAnnouncements; the heuristic just wasn't reading it. This is a recurring shape worth recognizing: a bug that looks like "the AI is broken" often turns out to be "the AI ignored a signal the protocol already sends."

The retry-cap raise (5 → 25) is essentially free for vanilla because chess.js verbose moves are guaranteed legal — vanilla never exercises retries. Blind needs the larger budget because pseudo-legal candidates from geometricMoves are filtered by chess.js at commit time and many fail (pinned pieces, unresolved check).

The new [bot resign] log line in BotDriver.botResign() decouples observability from the fix. Phase 1 had silent resignations — operators couldn't grep journald for them, which is why the bug surfaced as a user report rather than an alert. Future regressions are now greppable: journalctl -u blind-chess | grep "bot resign".

Critical Files

File	Purpose	Relevance
`packages/server/src/bot/casual-brain.ts`	Decision logic; vanilla delegates to js-chess-engine, blind uses heuristic	New `detectOwnCheck()` and `findOwnKing()` methods; `heuristicPick` takes `inCheck` parameter and applies +5000 boost to king moves
`packages/server/src/bot/driver.ts`	Per-game orchestrator; mutex, retry, dispatch, dispose	`RETRY_CAP` 5 → 25; `botResign()` now takes a `BotResignReason` and logs `[bot resign]` with structured detail
`packages/server/test/unit/bot/casual-brain.test.ts`	Unit tests	+2 tests: check-aware king bias (20-seed determinism check), and fall-through to non-king when all king moves are rejected
`packages/server/test/unit/bot/driver.test.ts`	Unit tests	Retry-cap test updated for new RETRY_CAP=25
`scripts/selfplay.ts`	Operator CLI for evaluation	Used heavily this session — `pnpm selfplay --white casual --black casual --games 100 --mode blind --seed 100`

Verification Results

Check	Result
Blind 100-game self-play (Casual vs Casual, seed=100)	resigns 100% → 17%, avgPly 26 → 90; 42 checkmates, 41 threefolds
Blind 20-game self-play (seed=42, same as pre-fix benchmark)	resigns 100% → 35%, avgPly 26 → 82
Vanilla 30-game self-play (Casual vs Casual, seed=42)	0 resigns; 27 checkmates, 2 threefolds, 1 fifty-move
Vanilla 50-game self-play (Casual W vs Random B, seed=7)	0 resigns; Casual wins 49/50
Vanilla 50-game self-play (Random W vs Casual B, seed=7)	0 resigns; Casual wins 49/50
Test suite	78 passing (was 75; +2 new check tests, +1 driver retry-cap test updated)
Live `/api/health`	`{"ok":true,"activeGames":0,"uptime":4}`
Live POST `/api/games` with `vsAi.brain=casual` blind mode	200 + `joinUrl:null`
Live POST `/api/games` with `vsAi.brain=recon`	503 + `ai_offline` (Phase 2 unimplemented, expected)
journald post-deploy	No errors/warnings

Decisions Made

Decision	Options Considered	Rationale
Boost king moves in heuristic vs filter candidates by chess.js legality	(a) heuristic boost — preserves view-filter invariant; (b) chess.js pre-filter — would leak attacker info	Chose (a). Preserves "bots play through the same view filter as humans" principle from the AI spec; same information ration as a human player
`RETRY_CAP` 5 → 25 (single global cap) vs per-mode caps	Per-mode (5 vanilla, 25 blind) vs global 25	Chose global. Vanilla never hits the cap, so single cap simplifies code with no regression
King-move boost magnitude +5000	Smaller (e.g., +200) vs larger	+5000 is large enough to deterministically dominate all other heuristic factors plus the 0.01 random tiebreak; unit test asserts 20/20 seeds pick king moves under check
Add resign logging now vs defer	(a) bundled with fix; (b) separate later commit	Bundled. The handoff explicitly noted the silent-resign observability gap; fixing that gap was load-bearing for any future regression detection
Two commits (fix + .gitignore) vs one	One bundled commit vs split	Split. Per global homelab convention: "no batching unrelated changes" — .gitignore drift was pre-existing and orthogonal

Immediate Next Steps

Soak the fix for a few days of real play before declaring "blind Casual is solid". Watch for:
- ssh root@192.168.0.245 'journalctl -u blind-chess | grep "bot resign"' — should be rare; legitimate forced positions only.
- User feedback on whether blind Casual still feels broken (lower bar but still possible).
- Mid-game stuck states (the retry budget is now 25; with degenerate brain output that's 25× more compute per cycle — should still be sub-second).
When ready, write Phase 2 plan — docs/superpowers/plans/<DATE>-ai-player-phase-2-recon.md. Phase 2 reuses the Brain/BotDriver infrastructure unchanged; new pieces are OllamaClient, ollama-endpoints (preflight + failover), prompt, parse, ReconBrain, plus aiInfo protocol field, 'ai_unavailable' end reason, post-game reasoning reveal UI.
(Cleanup, low priority) git rm --cached packages/server/tsconfig.tsbuildinfo — file is tracked from before the *.tsbuildinfo rule was added to .gitignore. Persistent M noise in git status between any rebuilds. Not blocking.

Blockers / Open Questions

Blind Casual is now noticeably stronger but still loses to careful play. The 17% post-fix resign rate represents legitimately stuck positions (multi-piece checks with no king escape, etc.) more than blunders. A human in those positions would also struggle. If users still feel blind Casual is unbeatable-or-broken, the next lever is making the heuristic also prefer captures and adjacent-to-king moves under check (likely block targets).
Threefold draws spiked from 0% → 41% in blind self-play. Two Casual bots with the same seed/heuristic shuffle pieces and repeat positions. This is more a self-play artifact than a real-play concern; humans don't repeat. Worth watching but not actionable yet.

Deferred Items

All Phase 2 work, untouched:

ReconBrain (gemma4:26b chat agent on steel141 RTX 3090 Ti, pve197 V100 fallback)
Mid-game GPU failover, preflight, AI-unavailable end state
Persistent chat history per game; post-game reasoning reveal UI
aiInfo protocol field (model + GPU + host)
Acceptance bar: Recon wins ≥60% over 50 Recon-vs-Casual self-play games

Important Context

The view-filter invariant is preserved. No oracle access was added. The brain detects check via <own_color>_in_check in newAnnouncements, which is a public moderator announcement humans receive too. Phase 2 ReconBrain will read these same announcements — the pattern is now established.
BrainInput.fen is set ONLY in vanilla mode. Blind mode omits it so the engine path can't smuggle opponent positions past the view filter. The fix did not touch this; the security boundary holds.
Watermark advance only on successful dispatch is load-bearing for the fix. On retry, the brain still sees the original <color>_in_check announcement from the opponent's move (because lastSeenAnnouncementCount doesn't advance until success). This is what makes detectOwnCheck robust across retries.
The bot still uses the heuristic in vanilla as fallback if the engine returns a move not in the chess.js candidate list. Vanilla never exercised this path in our tests, but the new inCheck parameter is wired through it for safety.
scripts/selfplay.ts is the canonical evaluation tool. Phase 2 will extend it to support --white recon --black casual etc. The harness sets game.aiOpponent = undefined; game.status = 'active' after createGame returns — that's how it transitions out of "waiting" without a hello.

Assumptions Made

The user was playing in blind mode when they reported premature resignation. I didn't ask, but vanilla self-play showed 0 resigns across 80 games while blind showed 100%, so blind was overwhelmingly the more likely mode. If they were actually playing vanilla, that's a different bug — though I have no evidence of one.
The +5000 king-move boost is "large enough." Verified by 20-seed determinism test; if the heuristic ever gains another factor scoring above ~5000, this assumption breaks and the test will catch it.
RETRY_CAP=25 is sufficient. 100-game blind self-play showed 17% still hit the cap — those are legitimate stuck positions, not under-budgeted retry. If real-play feedback says otherwise, raise further (each retry is microseconds for the heuristic; the cap could go to 50+ without performance concern).

Potential Gotchas

packages/server/tsconfig.tsbuildinfo shows persistent M in git status — it was tracked before *.tsbuildinfo was gitignored. Don't be alarmed; it's preexisting drift, not your work.
The pre-commit hook is detect-secrets-hook --baseline .secrets.baseline at ~/.config/git/hooks/pre-commit. If you add a new dep and pnpm-lock hashes get flagged, run detect-secrets scan > .secrets.baseline to refresh.
Server restart drops in-memory games. Acceptable for MVP per prior decisions, but be aware: any active player-vs-Casual game in flight at deploy time will lose state.
js-chess-engine declares engines: { node: '>=24' } but works on Node 22.22.2. Engines is advisory by default. If a future Node update breaks it, pin to v1.x of the package.

Files Modified This Session

File	Change
`packages/server/src/bot/casual-brain.ts`	+35 LoC: new `detectOwnCheck`, `findOwnKing`; `heuristicPick` takes `inCheck`, boosts king moves +5000 when set
`packages/server/src/bot/driver.ts`	`RETRY_CAP` 5 → 25; `botResign(reason, detail?)` with `console.error('[bot resign]', ...)`; `BotResignReason` union; `errString` helper
`packages/server/test/unit/bot/casual-brain.test.ts`	+2 tests (check-aware king preference; fall-through to non-king when king moves exhausted)
`packages/server/test/unit/bot/driver.test.ts`	Retry-cap test updated 5 → 25, expected calls updated
`.gitignore`	+`tmp/` (separate commit `f00164e`)

Environment State

CT 690 / blind-chess.service: running. Restarted 09:54 UTC after deploy. systemctl is-active returns active.
Active processes: none session-relevant. Deploy was a normal restart of the systemd unit.
Environment variables: none added/changed.
Backups:
- Local: packages/server/src/bot/.backup/{casual-brain,driver}.ts.1777455623
- CT 690: /opt/blind-chess/.backup/server-1777456437.tar.gz
Secrets: none added; pre-commit detect-secrets hook passed both commits clean.

Live URL: https://chess.sethpc.xyz
Repo: https://git.sethpc.xyz/Seth/blind_chess (main at f00164e)
AI Phase 1 spec: docs/superpowers/specs/2026-04-28-ai-player-design.md
Phase 1 plan: docs/superpowers/plans/2026-04-28-ai-player-phase-1-casual.md
DECISIONS.md "AI / computer player" section
Project identity: CLAUDE.md
Prior handoffs: 2026-04-28-191500-ai-phase-1-shipped.md, 2026-04-28-170713-ai-player-spec.md, 2026-04-28-152000-mvp-deployed.md, 2026-04-28-104344-spec-approved-ready-for-plan.md, 2026-04-28-kickoff.md

Security Reminder: This handoff describes a behavior fix; no credentials, secrets, or sensitive endpoints are exposed in the handoff or the deployed code.

13 KiB Raw Blame History Unescape Escape