Commit Graph

68 Commits

Author SHA1 Message Date
Seth 3510f0f571 feat(oracle): scaffold project + mineflayer spectator bot
Adds oracle-bot/ with package.json and bot.js. OracleBot connects to
the Paper 1.21.11 dev server (offline auth), auto-enters spectator mode,
exposes getPlayers/scanArea/getNearbyEntities/getWorldInfo/followPlayer,
and reconnects with exponential backoff (1s→30s max).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 04:07:06 -04:00
Mortdecai 924f16b9da 22-tool architecture: log.query, user.ask, journal system deployed
New tools implemented and deployed to dev gateway:
- log.query: focused event queries (chat/deaths/joins/actions), replaces 200-line dump
- user.ask: risk-scaled clarifying questions, async with tellraw
- journal.read/write: per-player files, cross-mode (God+Sudo share)

All wired into langgraph_gateway.py _execute_tool and model-driven tool loop.
Tool schemas updated (22 total). Deployed to CT 644 dev server.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 21:04:01 -04:00
Mortdecai 9c2c9a2310 1200+ distilled gold examples, journal system, redstone mastery, safety awareness
Distilled Training Data (1,203 examples):
- 341 initial gold (plugins, enchantments, builds, effects, god, errors)
- 165 buildings + pipeline (100 structures built on dev, 65 request→query→act)
- 24 safety-aware (worldborder, safe tp, intentional harm, gamemode checks)
- 17 advanced logic (decanonized items, redstone gates, iterative builds)
- 12 redstone mastery (NOT/OR/AND/XOR/RS-latch/T-flip-flop/comparator/clock)
- 7 circuit verification and diagnosis
- 1 compact comparator gates
- 10 redstone methodology (build→test→save→recall→learn from mistakes)
- 8 player journal usage
- 29 creative+uncommon+pipeline+god with full tool chains

Player Journal System:
- agent/tools/player_journal.py — per-player text files (1-10 lines)
- journal.read + journal.write tool schemas added
- Cross-contaminated: God and Sudo share same journal per player
- Includes sentiment, relationship, builds, preferences, skill level

Redstone Engineering:
- agent/prompts/redstone_rules.md — baked-in wall torch, dedicated lead, repeater rules
- Learned from 4 iterations of 8-switch circuit: wall_torch on back face, not top
- T-junction bypass prevention: dedicated lead wire between merge and NOT block
- RCON limitation: can build circuits but cannot test them (lever toggle doesn't propagate)

Training Data Cleaning:
- 466 @s→@p fixes, 10 template commands removed
- 12 outdated refusals replaced with correct plugin commands
- Data de-duped across all sources

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 20:50:52 -04:00
Mortdecai d9acb653fe Fix chart labels, add version history table to README
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 15:48:35 -04:00
Mortdecai b6fbfac2ae Add README with training progress chart and bake-off results
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 15:31:39 -04:00
Mortdecai f5118505b1 0.5.0 bake-off results, knowledge lookup tools, training progress chart
Bake-off (0.5.0 vs 0.4.0):
- Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2
- Enchantments: +47% (20% → 67%)
- EssentialsX: +60% (0% → 60%)
- Effects: +25% (0% → 25%)
- Regressions: fill_build -67%, world -20%

Knowledge Lookup Tools (4 new):
- plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs
- minecraft.changelog_lookup: version history from Minecraft Wiki
- paper.docs_lookup: Paper server-specific documentation
- Wired into gateway model-driven tool loop and exploration self-play

Exploration Self-Play:
- General (vanilla MC) and plugins focus modes
- Wiki-grounded: model researches before acting, validates through RCON
- 2,243 exploration examples generated, 150 kept after quality filtering

Training Progress Chart:
- SVG chart showing training examples and inverse loss across versions
- Added to MODEL_CARD.md for Gitea display

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 15:28:09 -04:00
Mortdecai da8f557219 GPU scheduler, 14-tool architecture, plugin deployment, event dispatcher
GPU Scheduler (gpu.sethpc.xyz):
- Live dashboard with 4 GPUs, training monitor, loss sparklines
- Preset-based job scheduler with 3 triggers (time, finish_training, cost)
- Model selection per GPU, pipeline configuration
- Tool self-play and training pipeline types
- Behind Google OAuth, live-refresh without page reload

Tool Architecture (14 tools):
- 3 new tools: world.nearby_entities, memory.read, memory.write
- 7 script.* tools: write, validate, execute, read, list, delete, schedule
- ScriptManager: full mcfunction datapack CRUD with RCON validation
- Training data: 1,430 tool examples (up from 1,159)

Plugin Deployment (paper-ai-25567):
- WorldGuard 7.0.12, CoreProtect CE 23.1, EssentialsX 2.21.2, Vault 1.7.3
- Fresh greenfield world reset
- 104 RCON-validated plugin training examples

Event Dispatcher:
- Watches server log for deaths, joins, advancements, PvP kills
- Configurable trigger probability and cooldowns per event type
- Deployed to dev server, fires god_system prompts on events
- 21 event-response training examples

Training Infrastructure:
- train_lora.py: --save-steps 50, --resume from checkpoint
- run_training.sh: stops Ollama, activates conda, restarts after
- Passwordless sudo for ollama services on steel141
- Dev server added to MCSManager with autoStart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:14:45 -04:00
Mortdecai 434589d098 Prompt pipeline: 1660 generates, bigger GPUs process via Mortdecai
Architecture:
- 1660 Super (qwen3.5:0.8b) generates diverse edge-case prompts
- 2080 Ti / RTX 4000 / 3090 Ti process through Mortdecai + RCON validation
- File-based queue with locking for multi-GPU coordination
- 10 prompt categories targeting known weaknesses

Categories: fill_syntax, enchantments, execute_chains, entity_targeting,
gamerules_timed, memory_commands, creative_prayers, edge_items,
multicommand, natural_language

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 00:08:48 -04:00
Mortdecai 3c1cbfce39 Shared player memory system + 39 training examples
Memory system (agent/tools/player_memory.py):
- Per-server JSON with owner tagging, cross-player references
- Location, preference, fact memory types
- Thread-safe, 50/player 500/server limits
- format_memory_context() injected into LLM prompts

Model output wired (mc_aigod_paper.py):
- memory_write processed → saves to JSON, confirms in chat
- memory_read processed → displays results in chat
- Memory context injected into prayer prompts

39 training examples:
- 7 location saves ("remember this as home")
- 7 location recalls + tp ("tp me home", cross-player)
- 5 memory queries ("what do you know about me")
- 3 memory deletes
- 4 preferences ("I prefer diamond tools")
- 4 facts ("I am building a castle")
- 4 memory-informed commands (give tools for current project)
- 5 edge cases (no memory found, server-wide, overwrite)

Seed dataset: 3,175 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 23:37:32 -04:00
Mortdecai 8158178a56 Shared player memory system + whitelist migration to CT 650
player_memory.py:
- Per-server JSON with owner tagging, cross-player references
- write/read/delete with thread safety and limits (50/player, 500/server)
- format_memory_context() for LLM prompt injection
- handle_memory_write/read for model output processing
- MODEL_OUTPUT_SCHEMA with commands, memory_write, memory_read, revert_after

mortdecai-sites (CT 650):
- Whitelist app migrated from CT 644, RCON via LAN (192.168.0.244)
- All 4 sites verified: mortdec.ai, docs, git, minecraft

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 23:28:04 -04:00
Mortdecai 84036d39ca revert_after in model output + 20 training examples
Model can now output revert_after (seconds) and revert_commands fields.
Python service schedules timer from model's response, not just heuristics.
Players notified of revert countdown. Revert announced when applied.

Training examples: temporary gamerules with explicit/implicit/no duration,
permanent changes (no revert), effects with built-in duration, combined reverts.

Key principle: no duration specified → default 5 min revert for safety.
"permanently"/"forever"/"always" → no revert.
Effects → built-in duration, no revert_after needed.

Seed dataset: 3,136 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 23:25:20 -04:00
Seth 06b082bd21 0.5.0 pre-training: 9,444 examples, prod pattern fixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:48:54 -04:00
Seth bd65f4a84c Add LICENSE, MODEL_CARD, requirements, CONTRIBUTING
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:43:21 -04:00
Seth f39809eaca Semver rename: v1-v5 → 0.1.0-0.5.0 across all files
Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH)
- 0.x.0 = pre-release development
- 1.0.0 = first public/monetized release

Renamed everywhere: PLAN.md, training scripts, self-play, overnight script,
status printer, whitelist app, discord bot, all training data references.

Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0
Server configs updated on all three servers.
Self-play restarted with new model name.
Entity targeting + radius-aware kill + distance scale training added.

Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:37:14 -04:00
Seth a03c0a8087 17 radius-aware kill examples: context determines blast radius
Radius scales with intent:
- "the zombie" → limit=1,sort=nearest,distance=..10 (surgical, risk 3)
- "all zombies near me" → distance=..30 (area clear, risk 3)
- "everything in the area" → distance=..100 (large, risk 2)
- "every mob everywhere" → no distance cap (risk 1, refuses by default)

Context-aware radius:
- "attacking me" → 15 (melee range)
- "shooting at me" → 20 (bow range)
- "this building" → 25 (structure)
- "whole city" → 500 (massive)
- "the farm" → 30 + specific animal types

Seed dataset: 2,503 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:27:20 -04:00
Seth 634f0137bb 10 entity targeting examples: THE zombie vs ALL zombies
Teaches the model to distinguish:
- "kill the zombie" → limit=1,sort=nearest (specific target)
- "kill all zombies" → distance=..30 (area clear)
- "what mobs are nearby" → requires world.nearby_entities tool
- "target the closest enemy" → type=!player,limit=1,sort=nearest

With LangGraph tools enabled, world.nearby_entities gives the model
entity awareness before generating kill commands.

Seed dataset: 2,486 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:25:03 -04:00
Seth 5c71976a34 22 distance scale examples: 1 block to 30 million
Scale reference baked into training:
- slightly (1-3) → close (5) → nearby (20-30) → far (500) → very far (1000)
- edge of world (29,999,900) → max tp distance
- Vertical: bedrock (-60) → diamond (-59) → sea (63) → clouds (192) → build limit (319)
- Nether 1:8 scale mechanics
- Real world: 1 block = 1 meter, mile = 1609 blocks, marathon = 42195 blocks
- World size: 60M × 60M blocks (surface area of Neptune)
- Chunk = 16 blocks, region = 512 blocks

Seed dataset: 2,476 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:23:11 -04:00
Seth b6e5874a11 45 new examples: chaos events, fireball/projectile mechanics, distance concepts
Chaos events (4): multi-phase dramatic sequences, earthquake, TNT rain, zen transition
Chaos gaps (23): pray thunder/lightning, execute at @a patterns, charged creepers,
  custom NBT fuse, magma blocks, lava, obsidian, music discs, say broadcast
Distance/projectile (18): far/near/close in blocks, fireball Motion+ExplosionPower,
  dragon fireball, wither skull, arrow/trident entities, mob spawn/aggro ranges,
  explosion radius reference (creeper/TNT/wither/bed)

Gateway updated: single-call mode with full tooling on all servers.
Seed dataset: 2,454 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:20:30 -04:00
Seth 0f043384e5 Self-play: --api-key for authenticated gateway connections
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:40:01 -04:00
Seth aa5400e31e 12 multi-step dependency training examples
Teaches command ordering and dependencies:
- Build structure THEN tp inside (not reverse)
- Apply protection BEFORE spawning hostile mobs
- Create water pool BEFORE dropping player
- Effects before gear (protection active during equip)
- Clear mobs before healing (don't waste heal)
- Cage before tp victim (prevent escape)

Key principle: reasoning explains WHY order matters.
Seed dataset: 2,409 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 18:43:03 -04:00
Seth ead16fd429 Persistent RCON connections — fixes server crash from connection spam
Root cause: self-play opened/closed a new TCP socket for every RCON command
(hundreds/minute). Paper's RCON listener creates a thread per connection,
overwhelming the server until it stopped.

Fix: PersistentRCON class maintains a single connection per server with
auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port.

Applied to:
- mc_aigod_paper.py (prod paper-ai + dev)
- mc_aigod.py (shrink-world)
- self_play.py (training data generation)
- persistent_rcon.py (shared module)

Before: ~100+ RCON connections/minute → server crash
After: 3 persistent connections total → stable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 18:24:44 -04:00
Seth 67179f75ad Self-play data + mortdecai-sites container + Grafana 3-GPU dashboard
Self-play: 218+ examples from overnight 3-GPU run (3090 Ti + 2080 Ti + RTX 4000)
Now running independently per GPU (no synchronization bottleneck)
50 rounds/tier, 0.1s sleep — near 100% GPU utilization

Infrastructure:
- CT 650 (mortdecai-sites) on pve112: landing page + docs + Gitea
- mortdec.ai landing page live
- docs.mortdec.ai MkDocs with Material theme
- git.mortdec.ai Gitea instance (fresh, needs admin setup)
- GPU exporter on RTX 4000 (node-197)
- Mortdecai GPU Monitoring dashboard in Grafana (all 3 GPUs)
- DNS updated via SethDDNS (GCP + Cloudflare)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 08:06:51 -04:00
Seth 25918b5b66 Self-play: 50 rounds, 0.1s sleep, max GPU utilization
Bumped from 20 rounds/tier to 50. Reduced sleep from 1s to 0.1s.
GPUs should run near 100% — Ollama queues requests internally.
mortdecai-sites container (CT 650) created on pve112.
Landing page live at mortdec.ai.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 07:36:01 -04:00
Seth dcc40a0bf8 Mortdecai v4 bake-off: 75.5% cmd match, 99.7% safety, 4.0s avg
2,397 test cases on steel141 RTX 3090 Ti:
- Command match: 75.5%
- Exact match: 22.9%
- Syntax correct: 80.5%
- Safety compliance: 99.7%
- No gratuitous tp: 98.5%
- Avg latency: 4006ms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 05:55:14 -04:00
Seth 027b835286 Session final: bakeoff fix, branding fonts, 3-GPU parallel self-play
Current running state:
- Prod: mortdecai-v4 on RTX 4000, single-call, error correction, fall protection
- Dev: Gemini 3.1 Flash Lite (preview) + 5 bots generating training data
- Bake-off: v4 running on steel141 (3090 Ti)
- Self-play: ready for overnight — 3 GPUs parallel (3090 Ti + 2080 Ti + RTX 4000)

Changes:
- Bakeoff parser: strips think blocks, handles dict/list types
- Branding fonts: Rajdhani-Bold (official), Exo2, Orbitron, Oxanium, SpaceGrotesk
- Gemini 3.1 pricing added to cost tracker

Active data collection:
- Gemini 3.1 Flash Lite bots on dev ($20 budget, ~$4/hr with 5 bots)
- Self-play overnight: 3 tiers × 3 GPUs = ~9x throughput
- Training audit logging on all servers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:56:45 -04:00
Seth 3580d350b4 Parallel 3-GPU self-play: all tiers run simultaneously
Each cycle runs all three tiers at the same time on different GPUs:
- Tier 1 (drills) on GPU A
- Tier 2 (self-critique) on GPU B
- Tier 3 (adversarial) on GPU C
GPU assignments rotate each cycle for even wear.
3x throughput vs sequential. RCON handles concurrent commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:55:24 -04:00
Seth de14f4a1c8 3-GPU overnight self-play: 3090 Ti + 2080 Ti + RTX 4000
Round-robin load balancing across three Ollama instances:
- 141:11434 (RTX 3090 Ti 24GB)
- 141:11435 (RTX 2080 Ti 11GB) — new second instance
- 179:11434 (RTX 4000 16GB)

Each tier cycles to a different GPU. 3x throughput overnight.
Cycles: Tier 1 drills → Tier 2 self-critique → Tier 3 adversarial → repeat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:54:29 -04:00
Seth 9ef5ab5aa4 PLAN.md complete update — v4 deployed, all session work documented
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:49:57 -04:00
Seth 7ae9a499fa 26 death/environment training examples, Mortdecai v4 deployed
Death mechanics training:
- Drowning (3): water trap, water breathing, emergency rescue
- Lava (3): lava pool, fire resistance, emergency rescue
- Void (2): below Y=-64, void damage explanation
- Explosion (3): TNT, charged creeper, bed in nether
- Mob proximity (5): warden, zombie range, skeleton range, mob surround, combat buffs
- Starvation (2): hunger effect, food bar mechanics
- Contact damage (3): cactus, magma blocks, berry bushes
- Lightning (2): direct strike, thunderstorm combo
- Environment awareness (3): safety check, time query, night danger

Mortdecai v4 deployed to prod (paper-ai + shrink-world)
Dev on Gemini 3.1 Flash Lite with 5 bots ($20 budget, ~5hr)
Seed dataset: 2,397 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:26:50 -04:00
Seth d7138b3514 33 fall safety + suffocation training examples, fall damage test data
Fall safety (25 examples):
- Fall damage math (distance-3 = damage, 23 blocks = lethal)
- Water/slime/hay/cobweb negate or reduce fall damage
- Intent detection: "drop me" = no protection, "tp me up" = add slow_falling
- Height-specific: 4m trivial, 10m hurts, 20m+ needs protection
- Surface awareness: water safe, lava half damage + burn

Suffocation (8 examples):
- TP into solid block = suffocation (1 heart/0.5s)
- Sand/gravel crushing (gravity blocks)
- Obsidian trap, underground tp
- Safety: don't tp into blocks unintentionally

Raw fall damage test results from dev server (noisy but informative)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 00:07:36 -04:00
Seth 98d035439d PLAN.md complete rewrite — Mortdecai project status, TODOs, risk hierarchy
Full rewrite reflecting current state:
- Model history v1→v4, infrastructure map, API spend
- Training data breakdown (3,477 total examples)
- Active TODOs: immediate, short-term, v5, infrastructure, community
- Risk hierarchy with permanence-based levels
- Key architecture decisions log
- Success criteria: v3 actual → v4 target → v5 goal
- Single-call enabled on prod (mortdecai-v3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:45:03 -04:00
Seth 4fc94170e4 Gamerule revert timers, drop/height training, revert_after field for v5
Python revert system (live on prod):
- Gamerule changes auto-revert after default timeout (5-10 min)
- User can specify duration: "disable mobs for 5 minutes"
- "permanently"/"forever" skips revert
- Setting back to default cancels pending revert
- Players notified of revert countdown

Training data (20 examples):
- 8 revert-aware gamerules with revert_after/revert_commands fields
- 12 drop/height/tp examples: intentional drops, safe tp, context-aware

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:42:22 -04:00
Seth edfc365c5f Dangerous effect caps: levitation 15s, wither 30s, poison 60s, nausea 30s
Validator hardcodes maximum durations for dangerous effects:
- Levitation: 15s max (player floats into sky and dies from fall)
- Wither: 30s max (drains health, can kill)
- Poison: 60s max
- Nausea: 30s max

12 training examples: levitation safety, emergency clear, duration caps,
"I can't stop floating" → clear levitation + slow falling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:35:57 -04:00
Seth b85b1a6725 40 risk hierarchy examples: L0 blocked, L1 permanent, L2 temporary, injections
Risk hierarchy baked into training data:
- L0 BLOCKED (15): ban, kick, stop, op, deop, whitelist, pardon, ban-ip
- L1 REFUSE (9): permanent gamerules, gamemode @a, default gamemode, difficulty
- L2 WARN (8): temporary gamerules with reversal intent, time-limited changes
- L3 NORMAL (8): time/weather, tick speed, sleep %, chat cleanup
- Prompt injection (5): fake admin claims, permission override attempts

Key principle: permanence determines risk level.
  gamerule keepInventory true (permanent) = L1
  gamerule doMobSpawning false for 5 min (temporary) = L2
  randomTickSpeed 50 (easily reversed) = L3

Seed dataset: 2,306 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:30:46 -04:00
Seth fbf6974af3 49 gamerule + invincibility training examples
Covers all major gamerules with natural language variants:
- Mob spawning/griefing, keepInventory, daylightCycle, weatherCycle
- Fire tick, insomnia/phantoms, instant respawn, natural regen
- randomTickSpeed (crop growth), sleep percentage, TNT, fall/fire/drowning damage
- Command feedback, advancement announcements, death messages
- God mode / invincibility via resistance 5 effect
- "disable mobs" and "invincibility me" — prompted by prod failures

Seed dataset: 2,266 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:27:26 -04:00
Seth 7a31e500e4 Qwen3.5-9B on prod, Gemini 2.5 Flash for dev, error correction, branding
Prod deployment:
- paper-ai and shrink-world switched from gemma3n:e4b to qwen3.5:9b
- Error correction: detects RCON errors (<--[HERE]), asks model to fix, retries
- Broadened error patterns: Unknown game mode, Unknown enchantment, etc.
- Fixed fire fallback matching "firework" as fire intent
- Fixed command format examples (WRONG vs RIGHT in prompt)
- max_tokens bumped to 600 for command calls
- Removed template workflow commands from sudo prompt

Dev server:
- Gemini 2.5 Flash ($0.15/$0.60 per M tokens) replaces Flash Lite
- 10 bots for ~$1-1.5/hr training data generation
- Dynamic pricing by model name in cost tracker

Branding:
- Rajdhani Bold as official Mortdecai font
- Logo variants: mortdecai + mortdec.ai in 6 fonts
- Whitelist page updated with Mortdecai branding + mortdec.ai domain

Whitelist UUID fix:
- Looks up real Mojang UUID via api.mojang.com
- Patches all whitelist.json files directly
- No more offline-mode UUID mismatches

WorldEdit schematics:
- 77 schematics installed (villages, bridges, lighthouses, parks, etc.)

Mortdecai v4 training in progress: 63% complete on steel141

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:09:27 -04:00
Seth b75a737c11 7 enchantment syntax error examples: count order, typos, old NBT
Common errors seen in prod:
- Count before brackets: sword 1[enchantments=...] → sword[enchantments=...] 1
- Typo: enchanments → enchantments
- Singular: enchantment → enchantments
- Old NBT: {Enchantments:[...]} → [enchantments={...}]
- Old abbreviated: {ench:[...]} → [enchantments={...}]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 22:20:33 -04:00
Seth a3d139e04f Mortdecai v4 pre-training: /no_think, dedup, 3,369 examples
- /no_think prepended to all system prompts (seed + tool training)
- Deduplicated seed dataset (435 dupes removed)
- Training script updated for Qwen3.5-9B + /no_think
- 2,210 seed + 1,159 tool-calling = 3,369 total examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 20:15:00 -04:00
Seth 910d7b4ca7 Qwen3.5-9B bake-off results, model named Mortdecai
Bake-off: qwen3.5:9b base model, 147 cases:
  - 70.1% command match (2x qwen3:8b baseline)
  - 15.6% needed syntax fixes
  - 29.9% miss (mostly God/prayer — no persona training)
  - Avg 7.5s, median 5.7s (thinking tokens)

Model officially named Mortdecai.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:46:00 -04:00
Seth 9abf9238c5 3-tier self-play: command drills, self-critique, adversarial
Tier 1 — Command drills:
  Random seed prompts → generate commands → RCON validates
  Teaches: accurate command syntax

Tier 2 — Single-shot self-critique:
  Model invents a tricky prompt AND responds in one call
  RCON validates the self-generated commands
  Teaches: edge-case awareness, self-evaluation

Tier 3 — Adversarial self-play:
  Session A generates challenging prompts
  Fresh Session B responds cold (can't cheat)
  RCON validates, self-corrects on errors
  Teaches: robustness, generalization

Usage: --tier 1|2|3|all --rounds N --focus category

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:39:33 -04:00
Seth c947fc3fa9 Self-play loop, Qwen3.5-9B bake-off: 70% base accuracy
Self-play (training/scripts/self_play.py):
- Model generates edge-case prompts across 9 categories
- Attempts commands via RCON, self-corrects on errors
- Successful traces → standard training examples
- Error correction traces → multi-turn tool-calling examples
- Anti-collapse: focuses on categories model is weakest in
- Ready for v4 deployment, not yet active

Qwen3.5-9B base model bake-off (147/1542 cases):
- 70.1% OK (vs 34% Qwen3-8B base) — 2x improvement
- 29.9% MISS (mostly God/prayer — no persona training)
- 15.6% needed syntax fixes
- Avg 7.5s response (thinking tokens)
- Strong v4 candidate: better base + tool-calling architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:35:57 -04:00
Seth d31cdb21fd 1,833 training examples: entities, execute chains, multiplayer, advanced, redstone, biomes, errors
New knowledge (291 examples):
- Entity/mob commands (60): summon, kill, NBT, spawn eggs, passengers, named mobs
- Execute chains (45): as/at/positioned/if/unless/store, dimension switching
- Multiplayer targeting (45): selectors, teams, scoreboards, bossbars, tags
- Advanced commands (45): tellraw, loot, clone, data, attributes, ride, forceload
- Redstone knowledge (28): repeaters, comparators, pistons, observers, hoppers
- Biome/dimension (28): nether/end tp, locate structure/biome, dimension awareness
- Error correction (40): item ID fixes, enchant abbreviations, syntax mistakes

Total seed dataset: 1,833 examples
Tool-calling dataset: 1,159 examples
Combined for v4 training: ~3,000 examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:22:32 -04:00
Seth 750cf15c79 1,542 seed + 1,159 tool-calling examples, async processing, validator tracking
New knowledge baked in:
- Enchantments (60): all 1.21 enchants, mutual exclusions, max levels, component syntax
- WorldEdit (45): //set, //replace, //sphere, //stack, selection, brushes
- Paper server (55): gamerules, permissions, plugins, scoreboard, moderation
- Cosmetics/XP (42): title, tellraw, playsound, particle, xp, effect mechanics
- Quantity boundaries (32): item tier caps, greedy→stingy, humble→generous

Training infrastructure:
- train_lora.py updated for multi-turn tool conversations + seed data
- Async prayer/sudo processing (ThreadPoolExecutor, 3 workers)
- Validator hit-rate tracking to /var/log/mc_validator_stats.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:03:30 -04:00
Seth ee764cd22a Tool-calling training: 1,159 multi-turn examples with error correction
Tool schemas (agent/tools/tool_schemas.py):
- rcon.execute: execute commands, get success/error results
- minecraft.wiki_lookup: look up syntax and item info
- world.player_info: player health, position, inventory
- world.server_state: time, weather, online players
- 10 RCON error patterns with corrections
- 12 common error scenarios for training

Training data generator (training/scripts/generate_tool_training.py):
- Converts seed dataset to multi-turn tool conversations
- Error correction: model tries wrong command → gets error → self-corrects
- Wiki/player/server lookups for uncertainty scenarios
- Qwen3 native tool-calling format with <tool_call> tags

1,159 examples: 1043 success, 79 error correction, 24 error scenarios,
13 tool lookups. Ready for v4 training.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:49:08 -04:00
Seth 4e83da39fd Quantity boundaries: item tier caps, tone-based scaling, 32 training examples
God Soul updated with quantity rules:
- Common (dirt/wood): max 320, Uncommon (iron/gold): max 128
- Rare (diamond/emerald): max 32, Very rare (netherite/elytra): max 4
- Forbidden (bedrock/command_block): never give
- Greedy → scaled back, Humble → generous within cap, Absurd → comedic

32 training examples: greedy(6), casual(6), humble(4), explicit(6),
forbidden(5), absurd(3), enchanted(2)

Dataset: 1,340 examples total

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:22:26 -04:00
Seth e780aef8c6 v3 model trained (1,308 examples, loss 0.55), API cascade, context update
v3 training:
- 1,308 examples: curated + Claude-distilled + bot audit + recipes + command ref
- 1 epoch, rank 16, LR 1e-4, loss 0.55 (sweet spot)
- GGUF Q4_K_M exported, loaded in Ollama as qwen3-8b-mc-lora-v3
- Correct commands, no Chinese, proper safety refusals, dramatic God persona

API cascade for dev server:
- Stage 1: Claude Haiku ($20 budget, ~$11 spent)
- Stage 2: Gemini 2.5 Flash Lite ($20 budget)
- Stage 3: qwen3-8b-mc-lora-v3 (free, local)
- Gemini call function with persistent cost tracking
- Full status report printed at each $1 milestone

Data collection: 2,677 dev audit entries and growing
Bot status printer budget display fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 04:52:04 -04:00
Seth 234f2722db v3 training dataset: 1,308 examples with risk_level + distilled data
Merged: 964 curated + 344 Claude-distilled = 1,308 total
All examples tagged with risk_level (0-4)
Model outputs risk classification in training target

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:51:17 -04:00
Seth e28836106f Risk_level in all 644 examples + model outputs risk classification
- All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74)
- Training output now includes risk_level field for decision transparency
- Model learns to classify risk before generating commands
- Validator can sanity-check: risk 0-1 should have empty commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:35:50 -04:00
Seth 0083e80aca Persistent Haiku cost tracking, Sethian whitelist web app
- Haiku cost persists to /var/log/mc_anthropic_cost.json (survives restarts)
- Status printer reads persistent cost file instead of journalctl
- Seeded at $3.08 estimated cumulative spend
- Whitelist app: Sethian Dark theme, mission description, server info

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:29:19 -04:00
Seth 0473eb0b50 Minecraft knowledge corpus, recipe trees, GitHub scraper, 644 examples
Knowledge corpus (knowledge/mc-data/):
- 1505 items, 886 crafting recipes, 1166 blocks from minecraft-data 1.21.11
- Recipe dependency tree builder (knowledge/build_recipe_tree.py)
- Crafting chain training: "give me everything to make X from scratch"
- Smelting recipes, version awareness examples

Training data (644 examples total):
- 107 command syntax reference examples (every command + common errors)
- 176 recipe/crafting chain examples (63 crafting, 103 material-giving, 11 smelting)
- 344 Claude-distilled examples (222 sudo + 122 god via Haiku)
- Live bot audit data ingested (128 examples from dev server)

Swarm bots:
- Swimming/water escape logic
- Door opening
- Context-aware prayers (inventory, health, time, depth)
- Prefix enforcement on all Gemini/Dolphin prompts

GitHub log scraper (data/scrape_server_logs.py):
- Searches GitHub for Minecraft server logs with commands
- Strict 1.20.5+ version filter
- Extracts command pairs, converts to training format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 20:33:09 -04:00