Files

T

Seth 98d035439d PLAN.md complete rewrite — Mortdecai project status, TODOs, risk hierarchy

Full rewrite reflecting current state:
- Model history v1→v4, infrastructure map, API spend
- Training data breakdown (3,477 total examples)
- Active TODOs: immediate, short-term, v5, infrastructure, community
- Risk hierarchy with permanence-based levels
- Key architecture decisions log
- Success criteria: v3 actual → v4 target → v5 goal
- Single-call enabled on prod (mortdecai-v3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-19 23:45:03 -04:00

7.5 KiB

Raw Blame History

PLAN.md — Mortdecai Project Roadmap

Last updated: 2026-03-20 Model name: Mortdecai Domain: mortdec.ai Status legend: [ ] planned | [~] in progress | [x] done | [-] cancelled/deferred

Vision

Mortdecai is a fine-tuned 9B parameter language model for Minecraft server operations. It translates natural language to commands, controls an AI God character, self-corrects errors via RCON feedback, and improves through self-play.

It runs locally on consumer hardware with zero cloud dependencies at inference time.

Current State (2026-03-20)

Models

Model	Base	Examples	Loss	Status
v1	Qwen3-8B	233	0.10	Retired (overfit)
v2	Qwen3-8B	361	2.03	Retired
v3	Qwen3-8B	1,308	0.55	Deployed on prod
v4	Qwen3.5-9B	3,369	Training (~88%)	ETA ~30 min

Infrastructure

Component	Location	Details
Training GPU	steel141 RTX 3090 Ti (24GB)	QLoRA via Unsloth
Prod inference	node-197 RTX 4000 (16GB)	Ollama, mortdecai-v3
Minecraft servers	CT 644 on node-112	paper-ai (25567), shrink-world (25566), dev (25568), vanilla (25565)
Dev data collection	CT 644	Gemini 2.5 Flash via API, 10 bots
Whitelist app	CT 644 port 8099	minecraft.mortdec.ai
Caddy proxy	CT 600 on node-241	mortdec.ai, minecraft.mortdec.ai
GPU monitoring	Grafana on CT 300 (node-173)	Prometheus + nvidia exporter on steel141
LangGraph gateway	CT 644 port 8091	Disabled on prod (fresh session mode available)

API Spend

Provider	Spent	Budget	Status
Claude Haiku	$20.01	$20	Exhausted
Gemini 2.5 Flash	~$0.50	$20	Active (dev bots)

Training Data: 2,318 seed + 1,159 tool-calling = 3,477 total

Category	Count
Command syntax reference	107
Crafting recipes & chains	176
Enchantments (mutual exclusions, max levels)	60
Entities/mobs (summon, kill, NBT)	60
Execute chains	45
Multiplayer (selectors, teams, scoreboards)	45
Advanced commands (tellraw, clone, data, ride)	45
WorldEdit	45
Paper server features	55
Cosmetics/XP/effects	42
Gamerules	49
Risk hierarchy (L0-L3, prompt injection)	40
Quantity boundaries	32
Dangerous effect caps	12
Revert-aware gamerules + drops	20
Error correction pairs	47
Claude-distilled outputs	344
Bot audit interactions	448+
Boundary/safety examples	95+
Tool-calling (multi-turn with RCON)	1,159

Active TODOs

Immediate (this session)

[~] Mortdecai v4 training completing (~30 min remaining)
Export v4 to GGUF, deploy to RTX 4000 as mortdecai-v4
Enable single-call mode on prod with v4
Run v4 bake-off and compare to v3/base
Commit and push all PaperFork changes

Short-term

Deploy self-play loop with v4 (3-tier: drills, self-critique, adversarial)
Add ground-level detection for teleport safety (query terrain before tp)
Build revert_after/revert_commands into v5 training format
Add Gemini milestone POS printing
Fix whitelist app UUID lookup for vanilla server path
Start Greenfield world on paper-ai (downloaded, needs MCSManager start)

Model improvements for v5

Train on Qwen3.5-9B with tool-calling format (rcon.execute, wiki_lookup, etc.)
Self-play generated data (run 200 rounds after v4 deploys)
Ingest all Gemini 2.5 Flash training data ($20 worth)
Add revert_after field to training output format
Ground-level detection training (check terrain before tp)
More error correction from production RCON failures
Enchantment count-before-bracket error correction

Infrastructure

Add GPU monitoring for RTX 4000 (second exporter)
Validator hit-rate analysis — remove fixes that fire <1%
Automate training pipeline: ingest → dedup → train → export → deploy
POS receipt for Gemini milestones
Consider moving to Mortdecai as Ollama model name on prod

Content & Community

Invite more playtesters via minecraft.mortdec.ai
Update mortdec.ai README with v4 results when available
Consider public HuggingFace release once quality is validated
WorldEdit schematic library expansion (77 installed, need more)

Risk Hierarchy

Commands are classified by permanence, not just danger:

Level	Permanence	Examples	Model behavior
0	Irreversible/admin	ban, kick, stop, op, deop	Never execute
1	Permanent toggle	gamemode @a, permanent gamerules, difficulty	Refuse or execute for self only
2	Temporary/reversible	gamerules with time limits, brief difficulty	Allow, schedule auto-revert
3	Transient	time, weather, tick speed, chat settings	Execute freely
4	Generous	full enchanted gear, large material stacks	Execute for worthy requests

Gamerule revert system: changes auto-revert after 5-10 min unless "permanently" specified.

Dangerous effect caps (hardcoded in validator):

Levitation: 15s max
Wither: 30s max
Poison: 60s max
Nausea: 30s max

Key Architecture Decisions

Date	Decision	Rationale
2026-03-18	Serving: gemma3n:e4b → qwen3.5:9b → mortdecai-v3	Progressive upgrades as better models trained
2026-03-18	Fine-tuning: Qwen3-8B → Qwen3.5-9B	3.5 has 2x base accuracy (70% vs 34%), native tool-calling
2026-03-18	God Soul document	Character framework adapted from Claude's soul. Defines identity, judgment, quantity boundaries
2026-03-19	API cascade: Haiku → Gemini → local	Progressive fallback for dev data collection. $40 total API budget
2026-03-19	/no_think in training	Prevents Qwen3 thinking tokens from consuming output budget
2026-03-19	Single-call mode	One LLM call for commands + message (v3+). Two-call for older models
2026-03-19	Error correction via RCON	Model tries command → RCON error → model self-corrects → retry
2026-03-19	3-tier self-play	Drills, self-critique, adversarial. Model generates its own training data
2026-03-20	Gamerule revert timers	State changes auto-revert. Permanence determines risk level
2026-03-20	Dangerous effect caps	Validator hardcodes max durations for levitation, wither, poison, nausea
2026-03-20	Expanded safe_prefixes	gamerule, particle, playsound, title, scoreboard, team, bossbar, locate, etc.
2026-03-20	Model named Mortdecai	mortdec.ai domain, Rajdhani Bold font, Sethian orange branding

Dev Server

Property	Value
Location	CT 644 on node-112
Game port	25568
RCON port	25578
RCON password	REDACTED_RCON
Data dir	/opt/paper-dev-25568/
AI God	Gemini 2.5 Flash via API cascade
Bots	10 swarm bots (swarm_bots.js)
Audit log	/var/log/mc_training_audit_dev.jsonl

Success Criteria

Metric	v3 (current)	v4 (target)	v5 (goal)
Command accuracy	~70%	85%+	95%+
Safety compliance	~95%	99%+	99.9%+
Error self-correction	N/A	50%+	80%+
Response latency	5-15s	<5s	<3s
Empty response rate	~10%	<5%	<2%
Think token leakage	Yes	No	No

This document is updated as the project evolves. Check git history for previous versions.

7.5 KiB Raw Blame History

PLAN.md — Mortdecai Project Roadmap

Vision

Current State (2026-03-20)

Models

Infrastructure

API Spend

Training Data: 2,318 seed + 1,159 tool-calling = 3,477 total

Active TODOs

Immediate (this session)

Short-term

Model improvements for v5

Infrastructure

Content & Community

Risk Hierarchy

Key Architecture Decisions

Dev Server

Success Criteria

7.5 KiB

Raw Blame History