Files
Mortdecai/PLAN.md
T
Seth 98d035439d PLAN.md complete rewrite — Mortdecai project status, TODOs, risk hierarchy
Full rewrite reflecting current state:
- Model history v1→v4, infrastructure map, API spend
- Training data breakdown (3,477 total examples)
- Active TODOs: immediate, short-term, v5, infrastructure, community
- Risk hierarchy with permanence-based levels
- Key architecture decisions log
- Success criteria: v3 actual → v4 target → v5 goal
- Single-call enabled on prod (mortdecai-v3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:45:03 -04:00

7.5 KiB

PLAN.md — Mortdecai Project Roadmap

Last updated: 2026-03-20 Model name: Mortdecai Domain: mortdec.ai Status legend: [ ] planned | [~] in progress | [x] done | [-] cancelled/deferred


Vision

Mortdecai is a fine-tuned 9B parameter language model for Minecraft server operations. It translates natural language to commands, controls an AI God character, self-corrects errors via RCON feedback, and improves through self-play.

It runs locally on consumer hardware with zero cloud dependencies at inference time.


Current State (2026-03-20)

Models

Model Base Examples Loss Status
v1 Qwen3-8B 233 0.10 Retired (overfit)
v2 Qwen3-8B 361 2.03 Retired
v3 Qwen3-8B 1,308 0.55 Deployed on prod
v4 Qwen3.5-9B 3,369 Training (~88%) ETA ~30 min

Infrastructure

Component Location Details
Training GPU steel141 RTX 3090 Ti (24GB) QLoRA via Unsloth
Prod inference node-197 RTX 4000 (16GB) Ollama, mortdecai-v3
Minecraft servers CT 644 on node-112 paper-ai (25567), shrink-world (25566), dev (25568), vanilla (25565)
Dev data collection CT 644 Gemini 2.5 Flash via API, 10 bots
Whitelist app CT 644 port 8099 minecraft.mortdec.ai
Caddy proxy CT 600 on node-241 mortdec.ai, minecraft.mortdec.ai
GPU monitoring Grafana on CT 300 (node-173) Prometheus + nvidia exporter on steel141
LangGraph gateway CT 644 port 8091 Disabled on prod (fresh session mode available)

API Spend

Provider Spent Budget Status
Claude Haiku $20.01 $20 Exhausted
Gemini 2.5 Flash ~$0.50 $20 Active (dev bots)

Training Data: 2,318 seed + 1,159 tool-calling = 3,477 total

Category Count
Command syntax reference 107
Crafting recipes & chains 176
Enchantments (mutual exclusions, max levels) 60
Entities/mobs (summon, kill, NBT) 60
Execute chains 45
Multiplayer (selectors, teams, scoreboards) 45
Advanced commands (tellraw, clone, data, ride) 45
WorldEdit 45
Paper server features 55
Cosmetics/XP/effects 42
Gamerules 49
Risk hierarchy (L0-L3, prompt injection) 40
Quantity boundaries 32
Dangerous effect caps 12
Revert-aware gamerules + drops 20
Error correction pairs 47
Claude-distilled outputs 344
Bot audit interactions 448+
Boundary/safety examples 95+
Tool-calling (multi-turn with RCON) 1,159

Active TODOs

Immediate (this session)

  • [~] Mortdecai v4 training completing (~30 min remaining)
  • Export v4 to GGUF, deploy to RTX 4000 as mortdecai-v4
  • Enable single-call mode on prod with v4
  • Run v4 bake-off and compare to v3/base
  • Commit and push all PaperFork changes

Short-term

  • Deploy self-play loop with v4 (3-tier: drills, self-critique, adversarial)
  • Add ground-level detection for teleport safety (query terrain before tp)
  • Build revert_after/revert_commands into v5 training format
  • Add Gemini milestone POS printing
  • Fix whitelist app UUID lookup for vanilla server path
  • Start Greenfield world on paper-ai (downloaded, needs MCSManager start)

Model improvements for v5

  • Train on Qwen3.5-9B with tool-calling format (rcon.execute, wiki_lookup, etc.)
  • Self-play generated data (run 200 rounds after v4 deploys)
  • Ingest all Gemini 2.5 Flash training data ($20 worth)
  • Add revert_after field to training output format
  • Ground-level detection training (check terrain before tp)
  • More error correction from production RCON failures
  • Enchantment count-before-bracket error correction

Infrastructure

  • Add GPU monitoring for RTX 4000 (second exporter)
  • Validator hit-rate analysis — remove fixes that fire <1%
  • Automate training pipeline: ingest → dedup → train → export → deploy
  • POS receipt for Gemini milestones
  • Consider moving to Mortdecai as Ollama model name on prod

Content & Community

  • Invite more playtesters via minecraft.mortdec.ai
  • Update mortdec.ai README with v4 results when available
  • Consider public HuggingFace release once quality is validated
  • WorldEdit schematic library expansion (77 installed, need more)

Risk Hierarchy

Commands are classified by permanence, not just danger:

Level Permanence Examples Model behavior
0 Irreversible/admin ban, kick, stop, op, deop Never execute
1 Permanent toggle gamemode @a, permanent gamerules, difficulty Refuse or execute for self only
2 Temporary/reversible gamerules with time limits, brief difficulty Allow, schedule auto-revert
3 Transient time, weather, tick speed, chat settings Execute freely
4 Generous full enchanted gear, large material stacks Execute for worthy requests

Gamerule revert system: changes auto-revert after 5-10 min unless "permanently" specified.

Dangerous effect caps (hardcoded in validator):

  • Levitation: 15s max
  • Wither: 30s max
  • Poison: 60s max
  • Nausea: 30s max

Key Architecture Decisions

Date Decision Rationale
2026-03-18 Serving: gemma3n:e4b → qwen3.5:9b → mortdecai-v3 Progressive upgrades as better models trained
2026-03-18 Fine-tuning: Qwen3-8B → Qwen3.5-9B 3.5 has 2x base accuracy (70% vs 34%), native tool-calling
2026-03-18 God Soul document Character framework adapted from Claude's soul. Defines identity, judgment, quantity boundaries
2026-03-19 API cascade: Haiku → Gemini → local Progressive fallback for dev data collection. $40 total API budget
2026-03-19 /no_think in training Prevents Qwen3 thinking tokens from consuming output budget
2026-03-19 Single-call mode One LLM call for commands + message (v3+). Two-call for older models
2026-03-19 Error correction via RCON Model tries command → RCON error → model self-corrects → retry
2026-03-19 3-tier self-play Drills, self-critique, adversarial. Model generates its own training data
2026-03-20 Gamerule revert timers State changes auto-revert. Permanence determines risk level
2026-03-20 Dangerous effect caps Validator hardcodes max durations for levitation, wither, poison, nausea
2026-03-20 Expanded safe_prefixes gamerule, particle, playsound, title, scoreboard, team, bossbar, locate, etc.
2026-03-20 Model named Mortdecai mortdec.ai domain, Rajdhani Bold font, Sethian orange branding

Dev Server

Property Value
Location CT 644 on node-112
Game port 25568
RCON port 25578
RCON password REDACTED_RCON
Data dir /opt/paper-dev-25568/
AI God Gemini 2.5 Flash via API cascade
Bots 10 swarm bots (swarm_bots.js)
Audit log /var/log/mc_training_audit_dev.jsonl

Success Criteria

Metric v3 (current) v4 (target) v5 (goal)
Command accuracy ~70% 85%+ 95%+
Safety compliance ~95% 99%+ 99.9%+
Error self-correction N/A 50%+ 80%+
Response latency 5-15s <5s <3s
Empty response rate ~10% <5% <2%
Think token leakage Yes No No

This document is updated as the project evolves. Check git history for previous versions.