Files

T

claude (blind_chess) 729199097e docs: log AI-player spec approval, update context, add handoff

Updates CLAUDE.md "Current State" + "Key files" to point at the new spec.
Adds DECISIONS.md "AI / computer player" section (11 settled decisions).
Strikes through the prior "Client-side AI / hint generation — out of scope"
row with a "partially superseded" note: the reversal applies only to the
human-vs-AI path. Adds 7 new Deferred/Rejected rows for AI-feature scope.

Handoff at .claude/handoffs/2026-04-28-170713-ai-player-spec.md captures
session state for the next pickup (writing-plans → Phase 1 implementation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 13:12:04 -04:00

17 KiB

Raw Blame History

Handoff: AI/computer player spec written and approved

Session Metadata

Created: 2026-04-28 ~17:07 UTC
Project: /home/claude/bin/blind_chess
Branch: main
Repo: git.sethpc.xyz/Seth/blind_chess
Recent commits: 288693f docs(spec): add AI/computer player design spec (this session) on top of a878dee fix(client): wrap connect/disconnect in untrack() to break effect loop (prior work).
Live URL: https://chess.sethpc.xyz (MVP, unaffected by this session).

Handoff Chain

Continues from: 2026-04-28-152000-mvp-deployed.md — MVP deployed and live.
Supersedes: None.

Current State Summary

Seth invoked the workflow handoff -> spec ai/computer player, then closed the session with approved write spec -> update context -> create handoff -> git commit -> close session. In this session: ran the brainstorming skill end-to-end with him for the AI/computer player feature, presented six design sections section-by-section with approval gates, wrote the full spec to docs/superpowers/specs/2026-04-28-ai-player-design.md, self-reviewed it, committed and pushed to gitea, and updated CLAUDE.md + DECISIONS.md to reflect the new spec.

Implementation has not started. The next session can directly invoke superpowers:writing-plans against the new spec to produce a step-by-step implementation plan.

Architecture Overview (the spec)

Two AI bots, phased delivery:

Phase 1 — Casual bot. Algorithmic, in-process, ~200 LoC of TypeScript. Plays legal moves with simple heuristics (capture-bias, development bonus, anti-shuffling penalty). Always available; no external dependencies. Plays badly but quickly. Single-week scope.
Phase 2 — gemma4 recon bot. Multi-turn chat agent backed by gemma4:26b running on the homelab Ollama service (steel141 RTX 3090 Ti primary, pve197 V100 fallback). Maintains a private per-game chat history that persists across turns as the bot's belief memory. Reasoning hidden from human during play, revealed in collapsible post-game panel. Multi-week scope; depends on prompt engineering iteration.

Both bots play through the same view filter and finite-state machine that humans use. The architectural invariant from CLAUDE.md ("the view filter is the only egress for board state") applies to bots — a bot consumes only buildView(game, botColor) plus moderator announcements. No oracle access. The Recon bot is honestly playing blind chess.

Modules to be added under packages/server/src/bot/:

brain.ts — Brain interface, BrainInput, BrainAction types.
driver.ts — BotDriver class (per-game orchestration, mutex, retry cap).
casual-brain.ts — CasualBrain class.
recon-brain.ts — ReconBrain class (Phase 2).
ollama-client.ts — OllamaClient interface + production HTTP impl (Phase 2).
ollama-endpoints.ts — endpoint priority list, preflight, mid-game failover (Phase 2).
prompt.ts — system prompt template, per-turn user message builder (Phase 2).
parse.ts — extract JSON from Gemma's response (Phase 2).
candidates.ts — legal candidate computation (vanilla vs blind paths).

Tests under packages/server/test/unit/bot/ and packages/server/test/integration/. Self-play harness at scripts/selfplay.ts (operator tool, NOT in CI).

Protocol additions:

CreateGameRequest.vsAi?: { brain: 'casual' | 'recon' }
EndReason adds 'ai_unavailable'
joined and update server messages add optional aiInfo: { model, gpu, host }

UI additions on the existing client: two-section landing layout, AI badge under opponent slot, "AI is thinking..." indicator (with first-move "starting up" variant), moderator-panel-area UI-system messages for game-start GPU info + failover, collapsible post-game reasoning reveal.

Acceptance bars:

Phase 1 done: 100 Casual self-play games complete; Casual beats random-mover ≥80%.
Phase 2 done: Recon wins ≥60% over 50 Recon-vs-Casual games; ≤8s/move on 3090 Ti (≤10s on V100); manual inspection of 10 reasoning logs shows Gemma using announcements as evidence.

Critical Files (added or to be added)

File	Status	Purpose
`docs/superpowers/specs/2026-04-28-ai-player-design.md`	✅ Written and committed	Full design spec — read this first when implementing.
`CLAUDE.md`	✅ Updated	"Current State" notes spec is approved; "Key files" links to the new spec.
`DECISIONS.md`	✅ Updated	New "AI / computer player" section logs design decisions; `Deferred/Rejected` superseded the prior "Client-side AI / hint generation" rejection (partial reversal).
`packages/server/src/bot/`	⏳ Not yet created	Where the new modules will live.
`packages/shared/src/protocol.ts`	⏳ Not yet modified	Will add `vsAi`, `aiInfo`, `'ai_unavailable'` per the spec.
`scripts/selfplay.ts`	⏳ Not yet created	Operator tool for running AI-vs-AI evaluation games.

Tasks Finished

Read prior handoff (MVP deployed) and original design spec.
Read ~/bin/gemma4-research/README.md, SYNTHESIS.md, CORPUS_ollama_variants.md for Gemma 4 implementation guidance.
Brainstormed the feature with Seth in 6 sections (architecture, components, data flow, error handling, testing, UX), each with approval gate.
Pivoted on Seth's input: AI runs on steel141 3090 Ti (not pve197 V100), pve197 V100 as fallback; bot reasoning persistent across turns (multi-turn chat agent, not stateless oracle); no mid-game flap-back but one-way GPU failover allowed.
Wrote full spec at docs/superpowers/specs/2026-04-28-ai-player-design.md (674 lines, 3 appendices).
Self-reviewed the spec (no placeholders, retry/timeout/acceptance numbers consistent, scope clear, fixed the peer-status ambiguity).
Committed and pushed: 288693f docs(spec): add AI/computer player design spec.
Updated CLAUDE.md (Current State, Key files) and DECISIONS.md (new "AI / computer player" section + amended Deferred/Rejected).
Wrote this handoff.

Files Modified / Added

File	Changes
(new) `docs/superpowers/specs/2026-04-28-ai-player-design.md`	674-line design spec
`CLAUDE.md`	"Current State" updated; "Key files" links new spec; "Start Here" lists both specs
`DECISIONS.md`	New "AI / computer player (designed 2026-04-28, not yet implemented)" section with 11 entries; Deferred/Rejected amended to supersede prior "Client-side AI / hint generation" rejection (partial); 7 new deferred/rejected rows for AI-feature scope
(new) `.claude/handoffs/2026-04-28-170713-ai-player-spec.md`	This handoff
(new, ignored) `.backup/CLAUDE.md.<ts>`, `.backup/DECISIONS.md.<ts>`	Pre-edit backups per global safety rule

Decisions Made

All in DECISIONS.md "AI / computer player" section. Highlights:

Two-phase delivery (Casual first, Recon second).
In-process virtual players, not external WS clients. Bots use same view filter as humans.
Recon is a stateful chat agent with persistent per-game memory; reasoning hidden during play, revealed post-game.
Endpoint priority steel141 → pve197; mid-game one-way failover; preflight blocks game creation if both down.
GPU surfaced to user via persistent badge + game-start UI message.
gemma4:26b chosen (not 31B — 5× slower for marginal gain; not e4b — too small).
Per-move latency caps: 30s normal, 90s first-move (covers cold-start).
Recon "done" bar: ≥60% wins over 50 Recon-vs-Casual self-play games.

Immediate Next Steps

Run the writing-plans skill against the new spec. The brainstorming skill's terminal state is invoking writing-plans; we skipped that to close the session, so the next session should pick it up. Command: superpowers:writing-plans against docs/superpowers/specs/2026-04-28-ai-player-design.md. The plan should split clearly into Phase 1 (Casual) and Phase 2 (Recon) work streams; Phase 1 is single-week scope, Phase 2 multi-week.
Implement Phase 1 per the plan. Order from the spec's Appendix C: scaffold packages/server/src/bot/, write the Brain interface, implement CasualBrain + tests, implement BotDriver + tests with StubBrain, wire up legalCandidates computation, add protocol changes (vsAi, bot registry), wire POST /api/games, wire ws.ts observer, build the client landing page two-section layout + thinking indicator, write integration tests, write self-play harness for Casual-vs-Casual.
Deploy Phase 1 to CT 690, run live smoke checklist for Casual.
Implement Phase 2 per the plan. Order: OllamaClient interface + HTTP impl, endpoint preflight + failover, prompt template, JSON parser, ReconBrain + tests (mocked Ollama), protocol additions for aiInfo, POST /api/games Recon path with preflight + warmup, driver retry/fallback wiring, client GPU badge + system messages + post-game reasoning reveal, integration tests, self-play harness for Recon-vs-Casual, prompt iteration until 60% bar met.
Deploy Phase 2 to CT 690, run live smoke checklist for Recon (warm, cold, failover, both-down).

Blockers / Open Questions

Recon's actual playing strength is the central research-y unknown. LLMs play vanilla chess poorly, but Gemma's task here is different — it's reasoning under uncertainty, picking from a pre-computed legal candidate list, not computing tactical depth. The 60% Recon-vs-Casual bar is a guess; we'll learn the real number from scripts/selfplay.ts. Spec's "Decision triggers" section (under Acceptance criteria) describes how to react if the bar is missed.
mort-3090-scheduler GPU contention. The scheduler is supposed to yield to other GPU users, but verifying this under Recon load is unmeasured. Plan: monitor steel141 GPU utilization during early Recon games; if mort jobs interfere, add explicit coordination.
Cold-start UX on first Recon move. 30–60s is long. The "AI is starting up..." copy mitigates but doesn't eliminate. If users complain, escalation path is in the spec's Risks #2.
Chat history grows unboundedly. 32K context covers ~128 turns; longer games would overflow. If seen in practice, add per-turn compaction (summarize older turns into running "what I've inferred" summary). Not MVP unless triggered.

Deferred Items

See DECISIONS.md "Deferred / Rejected" — specifically the new AI-feature rows: difficulty slider, Stockfish for vanilla AI, live token streaming, GPU flap-back, public AI vs AI spectator games, context compaction, bot rating/personalities. None block Phase 1 or Phase 2.

Important Context

The spec assumes gemma4:26b is on both steel141 and pve197. Verified via ~/bin/CLAUDE.md Ollama inventory at the time of writing. If either host's model inventory drifts, the preflight will fall through to the other host or fail.
steel141 OLLAMA_KEEP_ALIVE=30m — first call after >30 min idle pays a 30–60s reload cost. Spec's first-move 90s timeout exists specifically to absorb this. Reference: ~/bin/CLAUDE.md "Ollama models" section.
The gemma4:26b think: false gotcha. Per ~/bin/gemma4-research/GOTCHAS.md, setting think: false silently breaks 26B in multi-turn tool-calling loops. Spec explicitly says "do not set think: false" for this reason. Implementation must respect this.
The format: "json" gotcha. Per ~/bin/gemma4-research/SYNTHESIS.md, format: "json" causes infinite loops on nested schemas. Spec says use client-side regex JSON extraction instead. Implementation must respect this.
Bot has no PlayerToken, no WS connection, no grace timer. This is new architectural ground. Spec's Architecture section "Key principle 5" makes this explicit, but it's a subtle point that an implementer might miss when wiring up peer-status for the bot's slot.
The reasoning is the ONLY persistent state for Recon. No SQLite, no disk. Server restart drops Recon's chat history with the rest of the game state, consistent with current MVP behavior. If we add SQLite later (deferred), the chat history would be a natural thing to persist alongside game state.
Self-play harness needs an in-process bot adapter that bypasses the WS layer. It's documented in spec section 5.5 but not deeply specified. The cleanest implementation is to instantiate BotDriver directly against a Game and let it use the in-process commit handler — same path the production code uses.
The DECISIONS.md row "Client-side AI / hint generation" was previously written as fully rejected. This session partially reversed it (the entry is now strikethrough + a "partially superseded" note). The hint-generation-in-human-vs-human path remains rejected; only the human-vs-AI path was unblocked.

Assumptions Made

Seth's "approved write spec -> update context -> create handoff -> git commit -> close session" shorthand was a workflow chain (the next four steps after spec-approved). Did not invoke writing-plans (would have been the brainstorming skill's terminal state).
Two CLAUDE.md paragraphs (Current State + Key files) needed updating; the rest of CLAUDE.md is unaffected. Did not touch project identity or operations sections.
DECISIONS.md should organize the AI design entries as their own section ("AI / computer player") rather than mixing into "Architecture" / "Implementation" — those existing sections are about the deployed MVP, not future-but-approved work.
The "Deferred / Rejected" row for "Client-side AI / hint generation" should be partially struck through, not deleted. The deletion would lose the historical record of the change of mind.
Backup-before-edit applies to source-controlled files too (per global rule). Created .backup/CLAUDE.md.<ts> and .backup/DECISIONS.md.<ts>. The .backup/ directory should be gitignored — verify on next session.

Potential Gotchas

.backup/ IS gitignored (verified at the top of .gitignore — first non-comment line is .backup/). Future sessions can keep using it freely.
docs/superpowers/specs/ has TWO specs now. Future readers of CLAUDE.md "Start Here" should read both. The MVP spec is the deployed reality; the AI spec is approved-but-not-built work.
The strikethrough Markdown (~~text~~) in DECISIONS.md "Deferred / Rejected" for the partially-superseded row may render unexpectedly in some viewers. The intent is "this was rejected, now partially reversed" — if the rendering is confusing in practice, switch to plain text with an explicit "PARTIALLY SUPERSEDED" prefix.
Spec says retry cap is 5 for the driver (rejecting wont_help/illegal_move moves). If Recon repeatedly proposes illegal moves on a hard position, the driver will resign the bot at attempt 6. This is a safety belt, not the expected path — if it fires regularly during testing, the prompt template needs work, not the cap.
Spec acceptance bar says "≥8s/move on 3090 Ti, ≤10s on V100" with cold-start excluded. "Cold-start excluded" means we measure post-warmup; the first move's latency is reported separately. If cold-start latency itself becomes a problem (sustained complaints from users), spec Risks #2 has the escalation path.

Environment State

Tools/Services Used

Write / Edit / Read / Bash for the spec, context, handoff.
git (commit + push) for the spec commit.
No SSH, no Ollama calls, no client/server changes — purely documentation work this session.

Active Processes

blind-chess.service on CT 690 (192.168.0.245). Unaffected by this session. Live URL still serves the MVP at https://chess.sethpc.xyz.

Environment Variables

None changed this session.

Live URL: https://chess.sethpc.xyz (MVP, unaffected)
Repo: https://git.sethpc.xyz/Seth/blind_chess
New spec: docs/superpowers/specs/2026-04-28-ai-player-design.md
MVP spec: docs/superpowers/specs/2026-04-28-blind-chess-design.md
Decisions: DECISIONS.md (new "AI / computer player" section)
Project identity: CLAUDE.md (updated)
Original brief: IDEA.md
Prior handoffs: 2026-04-28-152000-mvp-deployed.md, 2026-04-28-104344-spec-approved-ready-for-plan.md, 2026-04-28-kickoff.md
Gemma 4 implementation guidance:
- ~/bin/gemma4-research/README.md (index)
- ~/bin/gemma4-research/SYNTHESIS.md (must-read for the implementation)
- ~/bin/gemma4-research/GOTCHAS.md (think: false + format: "json" warnings)
- ~/bin/gemma4-research/CORPUS_ollama_variants.md (model selection, VRAM)
- ~/bin/gemma4-research/docs/reference/gpu-bakeoff-2026-04-20.md (3090 Ti vs V100 throughput)
- ~/bin/gemma4-research/docs/reference/mort-bakeoff-2026-04-18.md (<think> token serialization behavior)
Ollama endpoints (per ~/bin/CLAUDE.md):
- steel141: http://192.168.0.141:11434 (3090 Ti, primary)
- pve197 CT 105: http://192.168.0.179:11434 (V100, fallback)

Security Reminder: This handoff describes design only; no credentials, deploy targets, or live state changed.

17 KiB Raw Blame History Unescape Escape