Updates CLAUDE.md "Current State" + "Key files" to point at the new spec. Adds DECISIONS.md "AI / computer player" section (11 settled decisions). Strikes through the prior "Client-side AI / hint generation — out of scope" row with a "partially superseded" note: the reversal applies only to the human-vs-AI path. Adds 7 new Deferred/Rejected rows for AI-feature scope. Handoff at .claude/handoffs/2026-04-28-170713-ai-player-spec.md captures session state for the next pickup (writing-plans → Phase 1 implementation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 KiB
Handoff: AI/computer player spec written and approved
Session Metadata
- Created: 2026-04-28 ~17:07 UTC
- Project: /home/claude/bin/blind_chess
- Branch:
main - Repo:
git.sethpc.xyz/Seth/blind_chess - Recent commits:
288693f docs(spec): add AI/computer player design spec(this session) on top ofa878dee fix(client): wrap connect/disconnect in untrack() to break effect loop(prior work). - Live URL: https://chess.sethpc.xyz (MVP, unaffected by this session).
Handoff Chain
- Continues from: 2026-04-28-152000-mvp-deployed.md — MVP deployed and live.
- Supersedes: None.
Current State Summary
Seth invoked the workflow handoff -> spec ai/computer player, then closed the session with approved write spec -> update context -> create handoff -> git commit -> close session. In this session: ran the brainstorming skill end-to-end with him for the AI/computer player feature, presented six design sections section-by-section with approval gates, wrote the full spec to docs/superpowers/specs/2026-04-28-ai-player-design.md, self-reviewed it, committed and pushed to gitea, and updated CLAUDE.md + DECISIONS.md to reflect the new spec.
Implementation has not started. The next session can directly invoke superpowers:writing-plans against the new spec to produce a step-by-step implementation plan.
Architecture Overview (the spec)
Two AI bots, phased delivery:
- Phase 1 — Casual bot. Algorithmic, in-process, ~200 LoC of TypeScript. Plays legal moves with simple heuristics (capture-bias, development bonus, anti-shuffling penalty). Always available; no external dependencies. Plays badly but quickly. Single-week scope.
- Phase 2 — gemma4 recon bot. Multi-turn chat agent backed by
gemma4:26brunning on the homelab Ollama service (steel141 RTX 3090 Ti primary, pve197 V100 fallback). Maintains a private per-game chat history that persists across turns as the bot's belief memory. Reasoning hidden from human during play, revealed in collapsible post-game panel. Multi-week scope; depends on prompt engineering iteration.
Both bots play through the same view filter and finite-state machine that humans use. The architectural invariant from CLAUDE.md ("the view filter is the only egress for board state") applies to bots — a bot consumes only buildView(game, botColor) plus moderator announcements. No oracle access. The Recon bot is honestly playing blind chess.
Modules to be added under packages/server/src/bot/:
brain.ts—Braininterface,BrainInput,BrainActiontypes.driver.ts—BotDriverclass (per-game orchestration, mutex, retry cap).casual-brain.ts—CasualBrainclass.recon-brain.ts—ReconBrainclass (Phase 2).ollama-client.ts—OllamaClientinterface + production HTTP impl (Phase 2).ollama-endpoints.ts— endpoint priority list, preflight, mid-game failover (Phase 2).prompt.ts— system prompt template, per-turn user message builder (Phase 2).parse.ts— extract JSON from Gemma's response (Phase 2).candidates.ts— legal candidate computation (vanilla vs blind paths).
Tests under packages/server/test/unit/bot/ and packages/server/test/integration/. Self-play harness at scripts/selfplay.ts (operator tool, NOT in CI).
Protocol additions:
CreateGameRequest.vsAi?: { brain: 'casual' | 'recon' }EndReasonadds'ai_unavailable'joinedandupdateserver messages add optionalaiInfo: { model, gpu, host }
UI additions on the existing client: two-section landing layout, AI badge under opponent slot, "AI is thinking..." indicator (with first-move "starting up" variant), moderator-panel-area UI-system messages for game-start GPU info + failover, collapsible post-game reasoning reveal.
Acceptance bars:
- Phase 1 done: 100 Casual self-play games complete; Casual beats random-mover ≥80%.
- Phase 2 done: Recon wins ≥60% over 50 Recon-vs-Casual games; ≤8s/move on 3090 Ti (≤10s on V100); manual inspection of 10 reasoning logs shows Gemma using announcements as evidence.
Critical Files (added or to be added)
| File | Status | Purpose |
|---|---|---|
docs/superpowers/specs/2026-04-28-ai-player-design.md |
✅ Written and committed | Full design spec — read this first when implementing. |
CLAUDE.md |
✅ Updated | "Current State" notes spec is approved; "Key files" links to the new spec. |
DECISIONS.md |
✅ Updated | New "AI / computer player" section logs design decisions; Deferred/Rejected superseded the prior "Client-side AI / hint generation" rejection (partial reversal). |
packages/server/src/bot/ |
⏳ Not yet created | Where the new modules will live. |
packages/shared/src/protocol.ts |
⏳ Not yet modified | Will add vsAi, aiInfo, 'ai_unavailable' per the spec. |
scripts/selfplay.ts |
⏳ Not yet created | Operator tool for running AI-vs-AI evaluation games. |
Tasks Finished
- Read prior handoff (MVP deployed) and original design spec.
- Read
~/bin/gemma4-research/README.md,SYNTHESIS.md,CORPUS_ollama_variants.mdfor Gemma 4 implementation guidance. - Brainstormed the feature with Seth in 6 sections (architecture, components, data flow, error handling, testing, UX), each with approval gate.
- Pivoted on Seth's input: AI runs on steel141 3090 Ti (not pve197 V100), pve197 V100 as fallback; bot reasoning persistent across turns (multi-turn chat agent, not stateless oracle); no mid-game flap-back but one-way GPU failover allowed.
- Wrote full spec at
docs/superpowers/specs/2026-04-28-ai-player-design.md(674 lines, 3 appendices). - Self-reviewed the spec (no placeholders, retry/timeout/acceptance numbers consistent, scope clear, fixed the
peer-statusambiguity). - Committed and pushed:
288693f docs(spec): add AI/computer player design spec. - Updated
CLAUDE.md(Current State, Key files) andDECISIONS.md(new "AI / computer player" section + amended Deferred/Rejected). - Wrote this handoff.
Files Modified / Added
| File | Changes |
|---|---|
(new) docs/superpowers/specs/2026-04-28-ai-player-design.md |
674-line design spec |
CLAUDE.md |
"Current State" updated; "Key files" links new spec; "Start Here" lists both specs |
DECISIONS.md |
New "AI / computer player (designed 2026-04-28, not yet implemented)" section with 11 entries; Deferred/Rejected amended to supersede prior "Client-side AI / hint generation" rejection (partial); 7 new deferred/rejected rows for AI-feature scope |
(new) .claude/handoffs/2026-04-28-170713-ai-player-spec.md |
This handoff |
(new, ignored) .backup/CLAUDE.md.<ts>, .backup/DECISIONS.md.<ts> |
Pre-edit backups per global safety rule |
Decisions Made
All in DECISIONS.md "AI / computer player" section. Highlights:
- Two-phase delivery (Casual first, Recon second).
- In-process virtual players, not external WS clients. Bots use same view filter as humans.
- Recon is a stateful chat agent with persistent per-game memory; reasoning hidden during play, revealed post-game.
- Endpoint priority steel141 → pve197; mid-game one-way failover; preflight blocks game creation if both down.
- GPU surfaced to user via persistent badge + game-start UI message.
gemma4:26bchosen (not 31B — 5× slower for marginal gain; not e4b — too small).- Per-move latency caps: 30s normal, 90s first-move (covers cold-start).
- Recon "done" bar: ≥60% wins over 50 Recon-vs-Casual self-play games.
Immediate Next Steps
- Run the writing-plans skill against the new spec. The brainstorming skill's terminal state is invoking writing-plans; we skipped that to close the session, so the next session should pick it up. Command:
superpowers:writing-plansagainstdocs/superpowers/specs/2026-04-28-ai-player-design.md. The plan should split clearly into Phase 1 (Casual) and Phase 2 (Recon) work streams; Phase 1 is single-week scope, Phase 2 multi-week. - Implement Phase 1 per the plan. Order from the spec's Appendix C: scaffold
packages/server/src/bot/, write the Brain interface, implementCasualBrain+ tests, implementBotDriver+ tests withStubBrain, wire uplegalCandidatescomputation, add protocol changes (vsAi, bot registry), wirePOST /api/games, wirews.tsobserver, build the client landing page two-section layout + thinking indicator, write integration tests, write self-play harness for Casual-vs-Casual. - Deploy Phase 1 to CT 690, run live smoke checklist for Casual.
- Implement Phase 2 per the plan. Order:
OllamaClientinterface + HTTP impl, endpoint preflight + failover, prompt template, JSON parser,ReconBrain+ tests (mocked Ollama), protocol additions foraiInfo,POST /api/gamesRecon path with preflight + warmup, driver retry/fallback wiring, client GPU badge + system messages + post-game reasoning reveal, integration tests, self-play harness for Recon-vs-Casual, prompt iteration until 60% bar met. - Deploy Phase 2 to CT 690, run live smoke checklist for Recon (warm, cold, failover, both-down).
Blockers / Open Questions
- Recon's actual playing strength is the central research-y unknown. LLMs play vanilla chess poorly, but Gemma's task here is different — it's reasoning under uncertainty, picking from a pre-computed legal candidate list, not computing tactical depth. The 60% Recon-vs-Casual bar is a guess; we'll learn the real number from
scripts/selfplay.ts. Spec's "Decision triggers" section (under Acceptance criteria) describes how to react if the bar is missed. - mort-3090-scheduler GPU contention. The scheduler is supposed to yield to other GPU users, but verifying this under Recon load is unmeasured. Plan: monitor steel141 GPU utilization during early Recon games; if mort jobs interfere, add explicit coordination.
- Cold-start UX on first Recon move. 30–60s is long. The "AI is starting up..." copy mitigates but doesn't eliminate. If users complain, escalation path is in the spec's Risks #2.
- Chat history grows unboundedly. 32K context covers ~128 turns; longer games would overflow. If seen in practice, add per-turn compaction (summarize older turns into running "what I've inferred" summary). Not MVP unless triggered.
Deferred Items
See DECISIONS.md "Deferred / Rejected" — specifically the new AI-feature rows: difficulty slider, Stockfish for vanilla AI, live token streaming, GPU flap-back, public AI vs AI spectator games, context compaction, bot rating/personalities. None block Phase 1 or Phase 2.
Important Context
- The spec assumes
gemma4:26bis on both steel141 and pve197. Verified via~/bin/CLAUDE.mdOllama inventory at the time of writing. If either host's model inventory drifts, the preflight will fall through to the other host or fail. - steel141
OLLAMA_KEEP_ALIVE=30m— first call after >30 min idle pays a 30–60s reload cost. Spec's first-move 90s timeout exists specifically to absorb this. Reference:~/bin/CLAUDE.md"Ollama models" section. - The
gemma4:26bthink: falsegotcha. Per~/bin/gemma4-research/GOTCHAS.md, settingthink: falsesilently breaks 26B in multi-turn tool-calling loops. Spec explicitly says "do not setthink: false" for this reason. Implementation must respect this. - The
format: "json"gotcha. Per~/bin/gemma4-research/SYNTHESIS.md,format: "json"causes infinite loops on nested schemas. Spec says use client-side regex JSON extraction instead. Implementation must respect this. - Bot has no
PlayerToken, no WS connection, no grace timer. This is new architectural ground. Spec's Architecture section "Key principle 5" makes this explicit, but it's a subtle point that an implementer might miss when wiring uppeer-statusfor the bot's slot. - The reasoning is the ONLY persistent state for Recon. No SQLite, no disk. Server restart drops Recon's chat history with the rest of the game state, consistent with current MVP behavior. If we add SQLite later (deferred), the chat history would be a natural thing to persist alongside game state.
- Self-play harness needs an in-process bot adapter that bypasses the WS layer. It's documented in spec section 5.5 but not deeply specified. The cleanest implementation is to instantiate
BotDriverdirectly against a Game and let it use the in-process commit handler — same path the production code uses. - The DECISIONS.md row "Client-side AI / hint generation" was previously written as fully rejected. This session partially reversed it (the entry is now strikethrough + a "partially superseded" note). The hint-generation-in-human-vs-human path remains rejected; only the human-vs-AI path was unblocked.
Assumptions Made
- Seth's "approved write spec -> update context -> create handoff -> git commit -> close session" shorthand was a workflow chain (the next four steps after spec-approved). Did not invoke writing-plans (would have been the brainstorming skill's terminal state).
- Two CLAUDE.md paragraphs (Current State + Key files) needed updating; the rest of CLAUDE.md is unaffected. Did not touch project identity or operations sections.
- DECISIONS.md should organize the AI design entries as their own section ("AI / computer player") rather than mixing into "Architecture" / "Implementation" — those existing sections are about the deployed MVP, not future-but-approved work.
- The "Deferred / Rejected" row for "Client-side AI / hint generation" should be partially struck through, not deleted. The deletion would lose the historical record of the change of mind.
- Backup-before-edit applies to source-controlled files too (per global rule). Created
.backup/CLAUDE.md.<ts>and.backup/DECISIONS.md.<ts>. The.backup/directory should be gitignored — verify on next session.
Potential Gotchas
.backup/IS gitignored (verified at the top of.gitignore— first non-comment line is.backup/). Future sessions can keep using it freely.docs/superpowers/specs/has TWO specs now. Future readers ofCLAUDE.md"Start Here" should read both. The MVP spec is the deployed reality; the AI spec is approved-but-not-built work.- The strikethrough Markdown (
~~text~~) in DECISIONS.md "Deferred / Rejected" for the partially-superseded row may render unexpectedly in some viewers. The intent is "this was rejected, now partially reversed" — if the rendering is confusing in practice, switch to plain text with an explicit "PARTIALLY SUPERSEDED" prefix. - Spec says retry cap is 5 for the driver (rejecting
wont_help/illegal_movemoves). If Recon repeatedly proposes illegal moves on a hard position, the driver will resign the bot at attempt 6. This is a safety belt, not the expected path — if it fires regularly during testing, the prompt template needs work, not the cap. - Spec acceptance bar says "≥8s/move on 3090 Ti, ≤10s on V100" with cold-start excluded. "Cold-start excluded" means we measure post-warmup; the first move's latency is reported separately. If cold-start latency itself becomes a problem (sustained complaints from users), spec Risks #2 has the escalation path.
Environment State
Tools/Services Used
Write/Edit/Read/Bashfor the spec, context, handoff.git(commit + push) for the spec commit.- No SSH, no Ollama calls, no client/server changes — purely documentation work this session.
Active Processes
blind-chess.serviceon CT 690 (192.168.0.245). Unaffected by this session. Live URL still serves the MVP at https://chess.sethpc.xyz.
Environment Variables
- None changed this session.
Related Resources
- Live URL: https://chess.sethpc.xyz (MVP, unaffected)
- Repo: https://git.sethpc.xyz/Seth/blind_chess
- New spec:
docs/superpowers/specs/2026-04-28-ai-player-design.md - MVP spec:
docs/superpowers/specs/2026-04-28-blind-chess-design.md - Decisions:
DECISIONS.md(new "AI / computer player" section) - Project identity:
CLAUDE.md(updated) - Original brief:
IDEA.md - Prior handoffs:
2026-04-28-152000-mvp-deployed.md,2026-04-28-104344-spec-approved-ready-for-plan.md,2026-04-28-kickoff.md - Gemma 4 implementation guidance:
~/bin/gemma4-research/README.md(index)~/bin/gemma4-research/SYNTHESIS.md(must-read for the implementation)~/bin/gemma4-research/GOTCHAS.md(think: false+format: "json"warnings)~/bin/gemma4-research/CORPUS_ollama_variants.md(model selection, VRAM)~/bin/gemma4-research/docs/reference/gpu-bakeoff-2026-04-20.md(3090 Ti vs V100 throughput)~/bin/gemma4-research/docs/reference/mort-bakeoff-2026-04-18.md(<think>token serialization behavior)
- Ollama endpoints (per
~/bin/CLAUDE.md):- steel141:
http://192.168.0.141:11434(3090 Ti, primary) - pve197 CT 105:
http://192.168.0.179:11434(V100, fallback)
- steel141:
Security Reminder: This handoff describes design only; no credentials, deploy targets, or live state changed.