GPU Scheduler (gpu.sethpc.xyz):
- Live dashboard with 4 GPUs, training monitor, loss sparklines
- Preset-based job scheduler with 3 triggers (time, finish_training, cost)
- Model selection per GPU, pipeline configuration
- Tool self-play and training pipeline types
- Behind Google OAuth, live-refresh without page reload
Tool Architecture (14 tools):
- 3 new tools: world.nearby_entities, memory.read, memory.write
- 7 script.* tools: write, validate, execute, read, list, delete, schedule
- ScriptManager: full mcfunction datapack CRUD with RCON validation
- Training data: 1,430 tool examples (up from 1,159)
Plugin Deployment (paper-ai-25567):
- WorldGuard 7.0.12, CoreProtect CE 23.1, EssentialsX 2.21.2, Vault 1.7.3
- Fresh greenfield world reset
- 104 RCON-validated plugin training examples
Event Dispatcher:
- Watches server log for deaths, joins, advancements, PvP kills
- Configurable trigger probability and cooldowns per event type
- Deployed to dev server, fires god_system prompts on events
- 21 event-response training examples
Training Infrastructure:
- train_lora.py: --save-steps 50, --resume from checkpoint
- run_training.sh: stops Ollama, activates conda, restarts after
- Passwordless sudo for ollama services on steel141
- Dev server added to MCSManager with autoStart
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model can now output revert_after (seconds) and revert_commands fields.
Python service schedules timer from model's response, not just heuristics.
Players notified of revert countdown. Revert announced when applied.
Training examples: temporary gamerules with explicit/implicit/no duration,
permanent changes (no revert), effects with built-in duration, combined reverts.
Key principle: no duration specified → default 5 min revert for safety.
"permanently"/"forever"/"always" → no revert.
Effects → built-in duration, no revert_after needed.
Seed dataset: 3,136 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH)
- 0.x.0 = pre-release development
- 1.0.0 = first public/monetized release
Renamed everywhere: PLAN.md, training scripts, self-play, overnight script,
status printer, whitelist app, discord bot, all training data references.
Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0
Server configs updated on all three servers.
Self-play restarted with new model name.
Entity targeting + radius-aware kill + distance scale training added.
Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches the model to distinguish:
- "kill the zombie" → limit=1,sort=nearest (specific target)
- "kill all zombies" → distance=..30 (area clear)
- "what mobs are nearby" → requires world.nearby_entities tool
- "target the closest enemy" → type=!player,limit=1,sort=nearest
With LangGraph tools enabled, world.nearby_entities gives the model
entity awareness before generating kill commands.
Seed dataset: 2,486 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches command ordering and dependencies:
- Build structure THEN tp inside (not reverse)
- Apply protection BEFORE spawning hostile mobs
- Create water pool BEFORE dropping player
- Effects before gear (protection active during equip)
- Clear mobs before healing (don't waste heal)
- Cage before tp victim (prevent escape)
Key principle: reasoning explains WHY order matters.
Seed dataset: 2,409 examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python revert system (live on prod):
- Gamerule changes auto-revert after default timeout (5-10 min)
- User can specify duration: "disable mobs for 5 minutes"
- "permanently"/"forever" skips revert
- Setting back to default cancels pending revert
- Players notified of revert countdown
Training data (20 examples):
- 8 revert-aware gamerules with revert_after/revert_commands fields
- 12 drop/height/tp examples: intentional drops, safe tp, context-aware
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validator hardcodes maximum durations for dangerous effects:
- Levitation: 15s max (player floats into sky and dies from fall)
- Wither: 30s max (drains health, can kill)
- Poison: 60s max
- Nausea: 30s max
12 training examples: levitation safety, emergency clear, duration caps,
"I can't stop floating" → clear levitation + slow falling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prod deployment:
- paper-ai and shrink-world switched from gemma3n:e4b to qwen3.5:9b
- Error correction: detects RCON errors (<--[HERE]), asks model to fix, retries
- Broadened error patterns: Unknown game mode, Unknown enchantment, etc.
- Fixed fire fallback matching "firework" as fire intent
- Fixed command format examples (WRONG vs RIGHT in prompt)
- max_tokens bumped to 600 for command calls
- Removed template workflow commands from sudo prompt
Dev server:
- Gemini 2.5 Flash ($0.15/$0.60 per M tokens) replaces Flash Lite
- 10 bots for ~$1-1.5/hr training data generation
- Dynamic pricing by model name in cost tracker
Branding:
- Rajdhani Bold as official Mortdecai font
- Logo variants: mortdecai + mortdec.ai in 6 fonts
- Whitelist page updated with Mortdecai branding + mortdec.ai domain
Whitelist UUID fix:
- Looks up real Mojang UUID via api.mojang.com
- Patches all whitelist.json files directly
- No more offline-mode UUID mismatches
WorldEdit schematics:
- 77 schematics installed (villages, bridges, lighthouses, parks, etc.)
Mortdecai v4 training in progress: 63% complete on steel141
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
God Soul updated with quantity rules:
- Common (dirt/wood): max 320, Uncommon (iron/gold): max 128
- Rare (diamond/emerald): max 32, Very rare (netherite/elytra): max 4
- Forbidden (bedrock/command_block): never give
- Greedy → scaled back, Humble → generous within cap, Absurd → comedic
32 training examples: greedy(6), casual(6), humble(4), explicit(6),
forbidden(5), absurd(3), enchanted(2)
Dataset: 1,340 examples total
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v3 training:
- 1,308 examples: curated + Claude-distilled + bot audit + recipes + command ref
- 1 epoch, rank 16, LR 1e-4, loss 0.55 (sweet spot)
- GGUF Q4_K_M exported, loaded in Ollama as qwen3-8b-mc-lora-v3
- Correct commands, no Chinese, proper safety refusals, dramatic God persona
API cascade for dev server:
- Stage 1: Claude Haiku ($20 budget, ~$11 spent)
- Stage 2: Gemini 2.5 Flash Lite ($20 budget)
- Stage 3: qwen3-8b-mc-lora-v3 (free, local)
- Gemini call function with persistent cost tracking
- Full status report printed at each $1 milestone
Data collection: 2,677 dev audit entries and growing
Bot status printer budget display fix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged: 964 curated + 344 Claude-distilled = 1,308 total
All examples tagged with risk_level (0-4)
Model outputs risk classification in training target
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All 644 examples tagged: 0=blocked(15), 1=refuse(33), 2=warn(24), 3=normal(498), 4=generous(74)
- Training output now includes risk_level field for decision transparency
- Model learns to classify risk before generating commands
- Validator can sanity-check: risk 0-1 should have empty commands
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Knowledge corpus (knowledge/mc-data/):
- 1505 items, 886 crafting recipes, 1166 blocks from minecraft-data 1.21.11
- Recipe dependency tree builder (knowledge/build_recipe_tree.py)
- Crafting chain training: "give me everything to make X from scratch"
- Smelting recipes, version awareness examples
Training data (644 examples total):
- 107 command syntax reference examples (every command + common errors)
- 176 recipe/crafting chain examples (63 crafting, 103 material-giving, 11 smelting)
- 344 Claude-distilled examples (222 sudo + 122 god via Haiku)
- Live bot audit data ingested (128 examples from dev server)
Swarm bots:
- Swimming/water escape logic
- Door opening
- Context-aware prayers (inventory, health, time, depth)
- Prefix enforcement on all Gemini/Dolphin prompts
GitHub log scraper (data/scrape_server_logs.py):
- Searches GitHub for Minecraft server logs with commands
- Strict 1.20.5+ version filter
- Extracts command pairs, converts to training format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Swarm bots (ingame/swarm_bots.js):
- 10 survival bots with generated names (SwiftWolf, DarkWolf, etc.)
- All bots wander, take damage, auto-respawn, pray when hurt
- Gemini + Dolphin(5%) + Multilingual(3%) prompt generation
- 20-60s interaction interval per bot
Distillation results:
- 222 sudo examples via Haiku ($0.28)
- 122 god examples via Haiku ($0.37) — with God Soul personality
- Total: 344 distilled, $0.65 spent of $5 budget
- RCON validation: 74.7% fully valid, 30 real errors out of ~1000 commands
validate_distilled.py:
- Executes distilled commands on live server via RCON
- Distinguishes real errors from benign (no player online)
- Tags each example with validation status
Dev server switched to Claude Haiku via Anthropic API:
- llm_provider: anthropic with $5 budget cap
- Auto-fallback to Ollama when budget exhausted
- Cost tracking with logging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ingested 128 new examples from bot-driven data collection.
Dropped: 86 duplicates, 19 language mismatches, 10 prompt leaks, 19 empty.
Changed default epochs from 3 to 1 (previous run overfit at loss 0.10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
data/ingest_audit.py:
- Pulls training audit logs from CT 644 (dev + prod)
- Filters: language mismatch (Chinese output for English input), system
prompt leaks, empty responses, duplicates
- Keeps multilingual examples where input/output languages match
- Converts to dataset schema, appends to seed_dataset.jsonl
- --dry-run to preview, --source dev/prod/both
Tested: 237 entries → 112 kept (16 lang mismatch, 10 prompt leak, 86 dupe, 13 empty dropped)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expanded dataset from 31 to 182 examples (45 manual + 106 extracted from server logs)
- Built eval/harness.py with per-category breakdowns and baseline tracking
- Built eval/live_bakeoff.py for RCON-verified model comparison on live server
- Extracted training data from prayer logs, sudo logs, and bug reports on CT 644
- Added Reddit post draft and modmail for playtester recruitment
- Updated server context: all servers now online-mode=false + whitelist
- Updated PLAN.md with Phase 2 progress
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- IDEA.md: project scope (Minecraft ops AI assistant via qwen3-coder LoRA/SFT)
- PLAN.md: complete roadmap with prior art analysis, architecture, phased plan, dev server docs
- data/schema.json: training example JSON Schema with negative_output support
- data/processed/seed_dataset.jsonl: 31 validated examples from repair code, prayer logs, session history
- data/validate_dataset.py: schema validator with summary statistics
- ingame/: Mineflayer bot framework (test_connect, spawn_bots, aware_bots with full event logging)
- Directory structure for knowledge/, eval/, training/, agent/ (Phase 1.3+ work)