Versioning scheme: semantic versioning (MAJOR.MINOR.PATCH) - 0.x.0 = pre-release development - 1.0.0 = first public/monetized release Renamed everywhere: PLAN.md, training scripts, self-play, overnight script, status printer, whitelist app, discord bot, all training data references. Ollama models retagged: mortdecai-v4 → mortdecai:0.4.0 Server configs updated on all three servers. Self-play restarted with new model name. Entity targeting + radius-aware kill + distance scale training added. Seed dataset: 2,503 + tool: 1,159 + self-play: 5,059 = 8,721 total examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 KiB
PLAN.md — Mortdecai Project Roadmap
Last updated: 2026-03-20 04:45 UTC Model name: Mortdecai Domain: mortdec.ai Status legend:
[ ]planned |[~]in progress |[x]done |[-]cancelled/deferred
Vision
Mortdecai is a fine-tuned 9B parameter language model for Minecraft server operations. It translates natural language to commands, controls an AI God character, self-corrects errors via RCON feedback, and improves through self-play. Runs locally on consumer hardware with zero cloud dependencies at inference time.
Current State
Models
| Model | Base | Examples | Loss | Status |
|---|---|---|---|---|
| 0.1.0 | Qwen3-8B | 233 | 0.10 | Retired (overfit) |
| 0.2.0 | Qwen3-8B | 361 | 2.03 | Retired |
| 0.3.0 | Qwen3-8B | 1,308 | 0.55 | Available on steel141 |
| 0.4.0 | Qwen3.5-9B | 3,369 | 0.20 | Deployed on prod |
Infrastructure
| Component | Location | Details |
|---|---|---|
| Training GPU | steel141 RTX 3090 Ti (24GB) | QLoRA via Unsloth 2026.3.8 |
| Prod inference | node-197 RTX 4000 (16GB) | Ollama, mortdecai:0.4.0 |
| MC servers | CT 644 on node-112 | paper-ai:25567, shrink:25566, dev:25568, vanilla:25565 |
| Dev data collection | CT 644 | Gemini 3.1 Flash Lite (preview), 5 bots |
| Whitelist app | CT 644:8099 | minecraft.mortdec.ai |
| Caddy proxy | CT 600 on node-241 | mortdec.ai, minecraft.mortdec.ai |
| GPU monitoring | Grafana CT 300 (node-173) | Prometheus + nvidia exporter on steel141 |
| Greenfield map | paper-ai | Downloaded, world swapped, needs MCSManager start |
| WorldEdit schematics | paper-ai | 77 installed in FAWE/schematics/ |
API Spend
| Provider | Spent | Budget | Status |
|---|---|---|---|
| Claude Haiku | $20.01 | $20 | Exhausted |
| Gemini (all) | ~$0.50 | $20 | Active on dev (3.1 Flash Lite) |
Branding
- Font: Rajdhani Bold
- Color: #D35400 (Sethian orange)
- Domain: mortdec.ai → Gitea repo, minecraft.mortdec.ai → whitelist page
- Public repo: https://git.sethpc.xyz/Seth/Mortdecai
Training Data: 2,397 seed + 1,159 tool-calling = 3,556 total
| Category | Count |
|---|---|
| Command syntax reference | 107 |
| Crafting recipes & chains | 176 |
| Enchantments (mutual exclusions, max levels) | 60 |
| Entities/mobs (summon, kill, NBT) | 60 |
| Execute chains | 45 |
| Multiplayer (selectors, teams, scoreboards) | 45 |
| Advanced commands (tellraw, clone, data, ride) | 45 |
| WorldEdit | 45 |
| Paper server features | 55 |
| Cosmetics/XP/effects | 42 |
| Gamerules (49) + risk hierarchy (40) | 89 |
| Quantity boundaries | 32 |
| Dangerous effect caps (levitation, wither, etc.) | 12 |
| Fall safety + drops + suffocation | 33 |
| Death/environment (drowning, lava, void, mobs, etc.) | 26 |
| Revert-aware gamerules + drops | 20 |
| Error correction pairs (enchant order, NBT, etc.) | 54 |
| Claude-distilled outputs | 344 |
| Bot audit interactions | 448+ |
| Boundary/safety/prompt injection | 95+ |
| Tool-calling (multi-turn with RCON) | 1,159 |
Completed This Session
Model & Training
- Mortdecai 0.4.0 trained: Qwen3.5-9B, 3,369 examples, loss 0.20
- 0.4.0 exported to GGUF Q4_K_M (5.3GB)
- 0.4.0 deployed to prod (RTX 4000) — paper-ai + shrink-world
- Single-call mode enabled on prod
/no_thinkin all training data to suppress thinking tokens- Qwen3.5-9B base bake-off: 70.1% accuracy (2x Qwen3-8B)
- [~] 0.4.0 bake-off running on steel141
Validator & Safety
- Error correction: detects RCON errors, asks model to fix, retries
- Broadened error patterns:
<--[HERE]universal catch kill @ablocked (players only)tp minecraft:spawn→ safe coordinates- Fire fallback won't trigger on "firework"
- Dangerous effect caps: levitation 15s, wither 30s, poison 60s, nausea 30s
- Fall protection: detects lethal tp, adds slow_falling unless intentional
- Gamerule revert timers: auto-revert after 5-10 min (configurable)
- Expanded safe_prefixes: gamerule, particle, playsound, title, scoreboard, team, bossbar, locate, etc.
- Validator hit-rate tracking to /var/log/mc_validator_stats.json
- Command format examples (RIGHT vs WRONG) in prompt
- max_tokens bumped to 600 for command calls
- Removed template workflow from sudo prompt
Infrastructure
- Ollama updated on steel141 + RTX 4000 (Qwen3.5 support)
- GPU monitoring: nvtop + Grafana dashboard on steel141
- Whitelist UUID fix: Mojang API lookup, patches all whitelist.json files
- mortdec.ai + minecraft.mortdec.ai live with SSL
- Public Mortdecai repo on Gitea with README
statuscommand: shows model name, mode, validator stats in-game- Verbose pipeline logging: token counts, speed, elapsed time, think stripping
- Greenfield world downloaded and installed on paper-ai
- 77 WorldEdit schematics installed
Training Data Added
- Gamerules (49 examples): all major gamerules with natural language
- Risk hierarchy (40): L0 blocked, L1 permanent, L2 temporary, prompt injection
- Dangerous effects (12): levitation/wither/poison caps
- Fall safety (25): height math, water/slime/hay awareness, intent detection
- Suffocation (8): tp into blocks, sand/gravel crushing
- Death/environment (26): drowning, lava, void, explosions, mobs, starvation, lightning
- Revert-aware gamerules (8): revert_after field for 0.5.0
- Drop/height (12): intentional drops, safe tp, slow_falling
- Enchantment error correction (7): count-before-bracket, typos, old NBT
Data Collection
- API cascade: Haiku ($20) → Gemini ($20) → local
- Switched dev to Gemini 3.1 Flash Lite (preview) with 5 bots
- Dynamic pricing by Gemini model name
- Async prayer/sudo processing (ThreadPoolExecutor, 3 workers)
Branding
- Model named Mortdecai
- mortdec.ai domain purchased and configured
- Rajdhani Bold as official font
- Logo variants generated (6 fonts × 2 text versions)
- Whitelist page branded with Mortdecai logo
Active TODOs
Immediate
- [~] 0.4.0 bake-off running — publish results to Gitea when complete
- Fix 0.4.0 Modelfile chat template on RTX 4000 (done, needs verification)
- Also fix on steel141's Ollama instance
Short-term (0.5.0 prep)
- Shared memory system: per-server JSON, owner-tagged, location/preference/fact types
- Player says "remember this is home" → AI writes location memory
- Other players can reference: "tp me to slingshooter08's home"
- Memory in context for location lookups, tool call for read/write
memory_writefield in model output schema- Setblock training data expansion
world.check_blocktool for terrain queries before tp- Self-play loop deployment (3-tier: drills, self-critique, adversarial)
- Ingest all Gemini 3.1 Flash Lite training data
- More error correction from production RCON failures
Model 0.5.0 Training
- Train with tool-calling format (rcon.execute, wiki_lookup, world.player_info)
revert_after/revert_commandsin output schema- Self-play generated data (200 rounds post-0.4.0)
- Memory read/write training examples
- Ground-level terrain detection training
- Fall damage math in model reasoning (not just validator)
- Setblock + block state training
- More death mechanics awareness in reasoning
Infrastructure
- GPU monitoring for RTX 4000 (second exporter)
- Validator hit-rate analysis — remove fixes that fire <1%
- Automate training pipeline: ingest → dedup → train → export → deploy
- POS receipt for Gemini milestones
- Start Greenfield world via MCSManager
Content & Community
- Invite more playtesters via minecraft.mortdec.ai
- Update mortdec.ai README with 0.4.0 bake-off results
- Consider public HuggingFace release
- WorldEdit schematic library expansion
Risk Hierarchy
Commands classified by permanence:
| Level | Permanence | Examples | Model behavior |
|---|---|---|---|
| 0 | Irreversible/admin | ban, kick, stop, op, deop, whitelist | Never execute |
| 1 | Permanent toggle | gamemode @a, permanent gamerules, difficulty | Execute for self only, refuse for @a |
| 2 | Temporary/reversible | gamerules with time limits, brief changes | Allow, schedule auto-revert |
| 3 | Transient | time, weather, tick speed, chat settings | Execute freely |
| 4 | Generous | full enchanted gear, large material stacks | Execute for worthy requests |
Gamerule revert system: Changes auto-revert after 5-10 min unless "permanently" specified. Player notified of countdown.
Dangerous effect caps (hardcoded): Levitation 15s, Wither 30s, Poison 60s, Nausea 30s.
Fall protection: Lethal tp detected → slow_falling added unless intent words present (drop, yeet, throw, kill me).
Key Decisions
| Date | Decision | Rationale |
|---|---|---|
| 03-18 | gemma3n:e4b for initial prod | Bake-off winner at 80.6% accuracy |
| 03-18 | Qwen3-8B for 0.1.0-0.3.0 training | Best syntax quality, Apache 2.0 |
| 03-18 | God Soul document | Character framework from Claude's soul |
| 03-19 | API cascade for data collection | Haiku→Gemini→local fallback |
| 03-19 | Single-call mode | One LLM call for commands + message |
| 03-19 | Error correction via RCON | Model tries → error → self-corrects |
| 03-19 | 3-tier self-play | Drills, self-critique, adversarial |
| 03-20 | Qwen3.5-9B for 0.4.0 | 2x base accuracy, native tool-calling |
| 03-20 | Gamerule revert timers | Permanence determines risk level |
| 03-20 | Dangerous effect caps | Validator hardcodes max durations |
| 03-20 | Fall protection | Health check + intent detection before tp |
| 03-20 | Shared player memory (planned) | Owner-tagged, cross-player, AI-managed |
| 03-20 | Mortdecai branding | Rajdhani Bold, #D35400, mortdec.ai |
Success Criteria
| Metric | 0.3.0 | 0.4.0 (target) | 0.5.0 (goal) |
|---|---|---|---|
| Command accuracy | ~70% | 85%+ | 95%+ |
| Safety compliance | ~95% | 99%+ | 99.9%+ |
| Error self-correction | N/A | 50%+ | 80%+ |
| Response latency | 5-15s | <5s | <3s |
| Empty response rate | ~10% | <5% | <2% |
| Think token leakage | Yes | No | No |
Updated as the project evolves. Check git history for previous versions.