Files
claude (blind_chess) 2e808008b1 docs: record table-fidelity feature batch as code-complete
- DECISIONS.md: new "Table-fidelity features" section + deferred items
  (smart-tracker rejected, highlight/phantom coupling deferred,
  abandoned-game localStorage cleanup deferred).
- CLAUDE.md: current state, test count 78->87, key files, known gaps.
- spec: record that the driver unit test covers the bot-suppression
  path in place of the considered-and-dropped ai-game-casual integration
  test (resolves a spec/implementation drift the final review flagged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:57:02 -04:00

117 lines
22 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DECISIONS.md — blind_chess Decision Log
Project-specific decisions. For global/cross-cutting decisions, see `~/bin/DECISIONS.md`.
Format: `YYYY-MM-DD: <decision> — <why>`
## Architecture
- 2026-04-28: Node 22 + TypeScript stack — single-language top-to-bottom; `chess.js` is the de facto rules engine and lives natively here.
- 2026-04-28: pnpm workspace with three packages — `packages/server` (Fastify + ws), `packages/client` (Svelte + Vite), `packages/shared` (TS types). Shared types are the load-bearing decision: the WS protocol drift surface is high-risk and shared types catch it at compile time.
- 2026-04-28: Fastify > Express — better TypeScript ergonomics, faster, cleaner plugin model for `ws` integration.
- 2026-04-28: Svelte > React — smaller bundle, reactive stores fit the constantly-changing board state model. React is overkill for a 2-route app.
- 2026-04-28: `chess.js` for rules + custom `geometricMoves` helper — chess.js doesn't expose pseudo-legal moves; ~80 LoC pure function covers all six piece types. Lives in `packages/shared` so server and client use the same code.
- 2026-04-28: In-memory only; `Map<gameId, Game>` is the entire database — simplest possible. SQLite later if crash recovery becomes painful. Rejected: SQLite for MVP (premature given hobby-project scope).
- 2026-04-28: Single-port Node service — Fastify serves both static client and `/ws` upgrade on port 3000. No reverse proxy logic in our service; Caddy CT 600 handles TLS and routing.
- 2026-04-28: Deploy target: new LXC on node-241 — clean isolation, matches existing patterns. Behind Caddy CT 600 at `chess.sethpc.xyz`.
- 2026-04-28: No auth beyond the hashed game link — friction-minimal; appropriate for casual play. No Authentik gate. Rejected: gating with Authentik (overkill).
- 2026-04-28: 8-character `gameId` (32 bits, `^[a-z0-9]{8}$`), 24-character `PlayerToken` (144 bits) — game IDs short enough for hand-shareable links, tokens long enough to prevent guessing.
- 2026-04-28: WebSocket transport for in-game; REST POST `/api/games` for creation — keeps create flow simple (refresh-friendly, cacheable), keeps in-game traffic on a single channel.
## Implementation
- 2026-04-28: Both modes (vanilla + blind) shipped day one — single engine, mode = per-player view filter. Vanilla mode is "blind mode with full reveal."
- 2026-04-28: Moderator hierarchy refined to four tiers: (1) `no_such_piece`, (2) `no_legal_moves` = pseudo-legal ∅, (3) `wont_help` = pseudo-legal ≠∅ but legal ∅ (pin OR unresolved check), (4) silent = legal moves exist. Each tier is information-strictly-monotonic (more info revealed at later tiers).
- 2026-04-28: Touch-move FSM — tap arms (reversible, client-side only), drag-start or destination-click commits ("touches"). Server tracks `armed: { color, from }`. `no_legal_moves` and `wont_help` checks fire only on first commit with a piece; once committed, all subsequent failed attempts are `illegal_move` with the touch staying.
- 2026-04-28: Highlighting (blind+ON) is purely geometric — function of `(piece type, position, own-piece set)`, no opponent input. Rays extend through unseen opponent pieces. Stop at own pieces. Off-board excluded. Zero opponent info leak. (Vanilla+ON shows engine-truth: legal-empty as green dot, legal-capture as red ring.)
- 2026-04-28: Game creation: creator picks side at create time (default random); single-use link (first joiner takes the open slot, then locked); no spectators in MVP; link dies with the game.
- 2026-04-28: Reconnect via opaque `PlayerToken` in browser `localStorage`, 5-minute grace window — generous for phone hiccups, short enough that abandoned games end. Grace expiry → `endReason: 'abandoned'`, opponent wins. Both-sides simultaneous expiry → game ends with `winner: undefined`.
- 2026-04-28: Pawn promotion via modal (Q/R/B/N), client must include `promotion` field in the move; moderator announces the promotion (it's tactically significant — public info).
- 2026-04-28: All draws auto-detected (stalemate, insufficient material, threefold, 50-move) — casual-play friendly; no "claim" UI.
- 2026-04-28: `Announcement` is an enum (`ModeratorText`), not a free-form string. Display strings live client-side. Tests assert against enum values.
- 2026-04-28: `update` is the single, idempotent server-to-client message that includes a filtered `view` and any new `Announcement` entries. Replaying the latest `update` produces correct render.
- 2026-04-28: Moderator-vocabulary "errors" (no_such_piece, no_legal_moves, wont_help, illegal_move) come through as `Announcement` entries on `update`, NOT as `error` messages. Errors reserved for protocol failures.
- 2026-04-28: Janitor prunes finished games after 30 min idle; active games never expire (until restart).
- 2026-04-28: Rate limiting via per-token bucket on `commit`: 10/s, burst 20 — well above human pace, well below abuse.
- 2026-04-28: Mobile-first responsive design — IDEA.md's share-a-link flow strongly implies phone use.
- 2026-04-28: Logging via Pino (Fastify default) → journald. `/api/health` for Uptime Kuma probe. No Prometheus/OpenTelemetry in MVP.
- 2026-04-28: Resign + draw-offer/accept-decline flow — standard chess UX. Resignation ends without grace; disconnect applies grace.
- 2026-04-28: Game-over screen reveals full board for both sides — post-game review is part of the experience.
## Implementation outcomes (2026-04-28 build session)
- 2026-04-28: **Repo:** `git.sethpc.xyz/Seth/blind_chess`. Created via `gitea create blind_chess`. Default branch `main`.
- 2026-04-28: **CT:** 690 on node-241, hostname `blind-chess`, IP 192.168.0.245, Debian 12, Node 22.22.2. 2 cores / 512 MB RAM / 8 GB rootfs. Resting memory ~29 MB, plenty of headroom.
- 2026-04-28: **Chosen `chess.js` v1.4.0** — uses `Move.isEnPassant()` / `isKingsideCastle()` / `isQueensideCastle()` instead of the deprecated `flags` string. The `Move` constructor's deprecated `flags` field is intentionally not relied upon.
- 2026-04-28: **Half-move clock for the 50-move rule** is read from FEN field 4 (chess.js doesn't expose it directly). See `translator.ts:halfMoveClock`.
- 2026-04-28: **Shared package import resolution**`packages/shared/package.json` `main` and `exports` point to `./dist/`. Source `.ts` is dev-only. Always run `pnpm --filter @blind-chess/shared build` before `pnpm --filter @blind-chess/server build` (the workspace project refs handle this when running `pnpm -r build`).
- 2026-04-28: **Client routing** is hash-based with a pathname fallback in `App.svelte` so `https://chess.sethpc.xyz/g/<id>` (the share URL) and `https://chess.sethpc.xyz/#/g/<id>` (the post-create URL) both render the game. The Fastify SPA fallback serves `index.html` on any non-matching `text/html` request.
- 2026-04-28: **Click-to-move only** — drag-and-drop deferred. Tap-arm + tap-destination is faithful to the touch-move FSM and works identically on phone and desktop.
- 2026-04-28: **WS path through Caddy**`wss://chess.sethpc.xyz/ws?game=<id>` works without explicit `transport ws` config. Caddy's reverse_proxy handles upgrade transparently.
- 2026-04-28: **Public DNS** — relies on existing `*.sethpc.xyz` wildcard pointing at the WAN IP; no Pi-hole entry was needed. Caddy host-routes `chess.sethpc.xyz` to 192.168.0.245:3000.
## AI / computer player
Spec: `docs/superpowers/specs/2026-04-28-ai-player-design.md`. **Phase 1 (Casual bot) deployed 2026-04-28** — live at https://chess.sethpc.xyz "Play vs computer". Phase 2 (Recon) deferred until Phase 1 has soaked.
- 2026-04-28: **Two AI bots, phased delivery**`CasualBrain` (Phase 1, algorithmic, in-process) ships first; `ReconBrain` (Phase 2, `gemma4:26b` chat agent) ships second. Phased to keep research uncertainty (Recon's actual playing strength) from blocking shipping anything. Rejected: combined launch, single difficulty-dial UX, throwaway Casual-as-stub.
- 2026-04-28: **Bots use the same view filter as humans**`BotDriver` calls `buildView(game, botColor)`; bot input is filtered `BoardView` + `Announcement[]`. No oracle access. Preserves the architectural invariant: the view filter is the only egress for board state, even for in-process bots. Rejected: "easy mode" oracle access for Casual to keep it simple.
- 2026-04-28: **In-process virtual players, not external WS clients**`BotDriver` lives in the existing Fastify server, dispatches actions through the same `commit` handler humans use. One process, no new deploy targets. Rejected: external bot processes (more operational surface, no real benefit), hybrid Casual-in-process / Recon-external (asymmetric for no gain).
- 2026-04-28: **Recon bot is a stateful chat agent, not stateless** — per-game chat history persists across turns as the bot's private memory. Each turn appends user (new view + announcements + candidates) + assistant (reasoning + move). Reasoning is hidden from the human during play, revealed in collapsible post-game panel. Rejected: stateless one-shot move-picker (loses belief-tracking across turns), revealing reasoning during play (would leak strategic intent).
- 2026-04-28: **Endpoint priority: steel141 RTX 3090 Ti primary, pve197 V100 fallback** — preflight on game creation; mid-game failover allowed once (one-way). Rationale: 3090 Ti benchmarks at ~134 tok/s on `gemma4:26b`; V100 estimated ~80 tok/s. Both have the model present. Rejected: no failover (worse UX), bidirectional flap (more complexity, no real benefit).
- 2026-04-28: **GPU shown to user** — persistent badge under AI's slot reads `"gemma4:26b · RTX 3090 Ti"` (or V100 / failed-over variant). Game-start moderator-panel UI message explicitly names the model + host. Rationale: chess.sethpc.xyz is a personal homelab site; surfacing the hardware is brand-appropriate and gives honest feedback when fallback engages. Rejected: hiding the GPU (would be opaque on slow V100 fallback).
- 2026-04-28: **`gemma4:26b` model choice** — sweet spot per gemma4-research: ~134 tok/s decode on 3090 Ti (4.7× faster than 31B), MoE 3.8B active, vision-capable (not used here). Rejected: 31B (5× slower, marginal strength gain not worth latency), e4b (too small for this task).
- 2026-04-28: **Per-move latency budget: 30s normal, 90s first-move** — first-move headroom covers cold-start (steel141 keep_alive=30m policy, ~30-60s reload after idle). Beyond 90s, treat as endpoint failure → failover. Rejected: tighter cap (false-positives on cold start), looser cap (UX death).
- 2026-04-28: **Recon "done" bar: ≥60% wins over 50 Recon-vs-Casual self-play games** — concrete, measurable acceptance bound. If Recon misses 60% but holds >40%, prompt-engineering rabbit hole; if <40%, design signal (try 31B or feed textual board representation). Self-play harness lives in `scripts/selfplay.ts`, not in CI. Rejected: subjective "feels okay" bar (would let weak Recon ship), bar against humans (untestable at scale).
- 2026-04-28: **Reasoning hidden during play, revealed post-game** — Gemma's chat history is private during the game; on game end, the chat history is copied to `Game.aiThoughtsLog` and the post-game screen shows a collapsible "View gemma4's reasoning" section. Rejected: live streaming "thinking tokens" to user (leaks strategy), permanent hiding (loses showcase value of the project).
- 2026-04-28: **`vsAi` field added to `CreateGameRequest`; `aiInfo` field added to `joined`/`update` server messages; `'ai_unavailable'` added to `EndReason`** — minimal protocol surface for the feature. AI metadata is NOT in `ModeratorText` enum (kept clean). UI-system messages for game-start info and failover events are style-distinct from `Announcement` entries.
### Phase 1 implementation outcomes (2026-04-28)
- 2026-04-28: **Phase 1 shipped to https://chess.sethpc.xyz.** 13 implementation tasks executed via subagent-driven development against `docs/superpowers/plans/2026-04-28-ai-player-phase-1-casual.md`. 75 tests passing (21 shared + 54 server). Live smoke checklist passed.
- 2026-04-28: **CasualBrain reversal — vanilla mode now uses `js-chess-engine` (level 2, randomness=30), not the hand-rolled scorer.** The original heuristic lost to a random-move baseline 7-7 in 100-game self-play (target was ≥80%). After swap-in: Casual wins 97% as white and 96% as black vs Random, ~5-30ms/move. Supersedes the spec's "no Stockfish" decision in spirit — `js-chess-engine` is MIT-licensed, ~400KB, no native deps, and at level 2 plays "Casual" strength (beats random comfortably, loses to a careful human). Originally rejected "Stockfish for strong vanilla AI" was about *strength*, not about *using a pre-made engine*. Documented and pushed; accepted as a learning.
- 2026-04-28: **Bot's BoardView is the only egress to the engine.** `BrainInput.fen` is set ONLY in vanilla mode (where the view is full reveal); blind mode omits it. Engine cannot smuggle opponent positions past the view filter — same architectural invariant the brainstorming session established for human-played blind chess.
- 2026-04-28: **Blind mode keeps the heuristic (not engine).** Architecturally Stockfish/js-chess-engine can't usefully play blind chess — they need a full board to evaluate, and giving them one would be oracle access. Building a belief-state from announcements is the Recon bot's design (Phase 2). Self-play confirmed blind heuristic completes games (avgPly=16, 0 errors, all decisive) — short games but functional.
- 2026-04-28: **Bot-slot synthetic token is randomized, not a fixed placeholder.** Using a hard-coded placeholder ("botxxxxxxxxxxxxxxxxxxxxx") would let any client knowing it claim the bot's color via `hello`. Random tokens (same shape as human tokens) close that hole. Caught in code review of Task 7.
- 2026-04-28: **`endGame` and `finalizeIfEnded` extracted from `ws.ts` to `packages/server/src/game-end.ts`.** Both `ws.ts` and `bot/driver.ts` need to set the game-finished state — duplication risk. Hoist resolves it.
## Table-fidelity features (2026-05-18)
Spec: `docs/superpowers/specs/2026-05-18-table-fidelity-features-design.md`. Plan: `docs/superpowers/plans/2026-05-18-table-fidelity-features.md`. Three features requested by Andrew Freiberg (a physical-game player); shipped to `main` 2026-05-18, 12 tasks via subagent-driven development. 87 tests passing (25 shared + 62 server).
- 2026-05-18: **All moderator announcements are `audience: 'both'`** — every move event and every attempted-move error reaches both players, faithful to the physical game where the moderator speaks aloud. A deliberate, authorised widening of the moderator channel (it makes blind mode slightly less blind — e.g. you hear "won't help you" on the opponent's turn). The `audience` field is retained (now uniformly `'both'`) as the egress-control hook in `ws.ts` / `ModeratorPanel`.
- 2026-05-18: **Bot intermediate retry-rejection announcements are popped in `BotDriver.dispatch`** — the blind Casual bot's retry search would otherwise broadcast up to 25 churn announcements per turn. Only the bot's final committed move is announced. Human probes (13 pieces, human-paced) still broadcast — that is the feature.
- 2026-05-18: **Capture tally is a server-derived per-viewer `captures` field on `joined`/`update`**, not a `ModeratorText` enum entry — the announcement vocabulary stays a pure event enum; the tally is structured data (`CaptureTally = { byYou, byOpponent }`). Must be server-side: in blind mode the capturing client can't see what it took.
- 2026-05-18: **Phantom opponent-piece layer is 100% client-local** — never sent to the server, persisted only to `localStorage` (`bc:phantoms:<gameId>`), in its own store (`phantoms.svelte.ts`) separate from the protocol store so the zero-leak property is auditable. Blind mode only. `buildView` / `geometric.ts` untouched.
- 2026-05-18: **Manual phantom model** — seeded once with the opponent's standard starting army, then fully manual: drag anywhere, drag off-board to remove, re-add from an unlimited palette, no automation. Rejected: a "smart tracker" that auto-removes on capture and tracks promotions (Seth chose the manual model).
- 2026-05-18: **Phantom manipulation is pointer-event drag-and-drop** with a tap-vs-drag threshold so a tap still makes a real move. Real chess moves stay click-to-move — the deferred drag-and-drop decision for *real* moves still stands; F3's drag is phantom-only.
- 2026-05-18: **Client has no unit-test harness** (deliberate) — Feature 3's testable pure logic (`opponentStartPosition`, `deserializePhantoms`) lives in `packages/shared` and is unit-tested there; Svelte components/stores are covered by `svelte-check` typechecking plus manual verification.
## Deferred / Rejected
<!-- Decisions NOT to do something are just as valuable -- prevents re-proposing rejected ideas -->
- 2026-04-28: **Tactical-advice interpretation of "won't help you"** — rejected. The phrase is a check-resolution announcement, not engine evaluation. Subjective "this move is bad" is anti-fun and out of scope.
- 2026-04-28: **Spectator mode** — deferred. Single-use links and no spectators in MVP. Revisit if there's demand.
- 2026-04-28: **Time controls (clocks)** — deferred. Untimed correspondence-style for MVP. Optional 5+0 / 10+0 / 15+10 in a follow-up if Seth wants.
- 2026-04-28: **SQLite persistence** — deferred. In-memory only for MVP. Add when crash recovery becomes painful (1-day implementation: serialize Map on `ExecStop`, deserialize on `ExecStart`).
- 2026-04-28: **End-to-end browser tests (Playwright)** — out of scope for MVP. Protocol-level integration tests cover the same drift surface for ~10× less maintenance. Manual phone+desktop testing suffices.
- 2026-04-28: **Vanilla-only or blind-only MVP** — rejected in favor of both-from-day-one. The shared engine + view-filter architecture means vanilla is essentially free.
- 2026-04-28: **Authentik gate on `chess.sethpc.xyz`** — rejected. The hashed link IS the auth; an additional gate would be friction with no security benefit (link guessing is already infeasible).
- 2026-04-28: **CI/CD automation** — deferred. Manual `pnpm -r build` + `rsync` + `systemctl restart` is fine for a hobby project. Add Gitea Actions later if deploy friction grows.
- 2026-04-28: **Move log / PGN export, post-game replay** — deferred. Announcements are persisted in-game (so the moderator-panel scrollback works); export and replay are post-MVP.
- 2026-04-28: **Public lobby / matchmaking / ratings** — out of scope. This is a private-link game, not a chess site.
- 2026-04-28: **Pre-deploy "server restarting" warning to active players** — stretch goal, not MVP. Mitigation for now: deploy during low-usage windows.
- 2026-04-28: ~~**Client-side AI / hint generation** — explicitly out of scope. Human vs. human only.~~ **Partially superseded 2026-04-28** by AI-player spec. Reversal applies *only* to the human-vs-AI path; client-side AI / hint generation in human-vs-human games remains rejected.
- 2026-04-28: **Difficulty slider for AI** — rejected. Two named buttons (Casual, Recon) only. No continuum; the two bots are architecturally different, not tuneable strengths of the same engine.
- 2026-04-28: ~~**Stockfish for vanilla-mode AI strength** — deferred. Vanilla is a side-effect, not a feature target. Revisit if users explicitly ask for strong vanilla AI.~~ **Partially superseded 2026-04-28** during Phase 1 implementation — using `js-chess-engine` (smaller, MIT, no GPL concerns) at level 2 for Casual vanilla, capped at ~30ms/move. The original rejection was about not making Casual *strong*; the engine at level 2 is genuinely casual-strength while still beating random comfortably. Stockfish itself remains rejected (GPL, 7MB+ wasm, overkill for the strength target).
- 2026-04-28: **Live token streaming during Gemma's thinking** — rejected for MVP. Static "AI is thinking..." indicator only. Streaming would leak strategic intent and adds protocol complexity.
- 2026-04-28: **Mid-game GPU flap-back** — rejected. Once failed over to V100, stays there for the rest of the game even if steel141 recovers. Simpler, more predictable, and chat-history is mid-flight.
- 2026-04-28: **AI vs AI public spectate-able games** — rejected for MVP. Self-play harness is CLI-only (`scripts/selfplay.ts`).
- 2026-04-28: **Per-turn context compaction** — deferred. Spec uses `num_ctx: 32768` which covers ~128 turns; longer games would overflow but are rare in casual play. Add running-summary compaction if seen in practice.
- 2026-04-28: **Bot rating / Elo / personalities** — out of scope. Two named buttons, no scoreboard.
- 2026-04-28: **In-game chat (player ↔ player and human ↔ Gemma)** — deferred indefinitely. Two failure modes drove the deferral: (1) blind-mode chat is a side channel that bypasses the moderator-vocabulary security boundary ("knight on c3, take it" defeats the entire view-filter architecture); (2) chatting with Gemma during play leaks the bot's belief state and undermines the post-game reasoning reveal. Resolvable but expensive (two-history split for Gemma, blind-mode mute or social-variant warnings, mobile UI real estate). Revisit only if users explicitly ask. The post-game reasoning reveal already covers most of the "see what Gemma was thinking" appeal without the leak surface.
- 2026-05-18: **Smart-tracker phantom model** (auto-remove a phantom on capture, track promotions, constrain the phantom set to the opponent's surviving army) — rejected in favour of the fully-manual model. More code and more edge cases; Seth wanted the manual ritual.
- 2026-05-18: **Highlighting interacting with phantoms** (bishop/rook rays stopping at phantom pieces) — deferred. Safe to do (phantoms carry zero real opponent info) but out of scope for v1; phantoms are a pure annotation layer that highlighting ignores.
- 2026-05-18: **Phantom-layer `localStorage` cleanup for abandoned games** — deferred. `clearForGame` only fires when the game reaches `finished` while `<Game>` is mounted; a tab closed mid-game leaves a stale `bc:phantoms:<id>` key. Each entry is a tiny JSON object; add a stale-key sweep on app start only if it ever matters.