Mortdecai/CLAUDE.md

# Mortdecai — Agent Context

> Single source of truth for AI agents working on this project.
> When docs disagree, trust this file > implementation > historical docs.

## Project Identity

Mortdecai is a fine-tuned Qwen3.5-9B (and upcoming 14B) language model for Minecraft server operations. It runs inside a Paper 1.21 server via a LangGraph-style gateway, responding to player prayers (god mode) and commands (sudo mode) using 24 tools. The model is trained via QLoRA on curated + Claude-distilled + RCON-validated data.

This is NOT a chatbot, NOT a general assistant. It is a domain-specific Minecraft operations agent.

## Current State (2026-03-22)

- **Current model:** mortdecai:0.5.0 (Qwen3.5-9B, QLoRA fine-tune)
- **Base model:** Qwen/Qwen3.5-9B (HuggingFace)
- **Known issue:** 30% empty god-mode responses (think-block token drain, no freeform text training)
- **Next version:** 0.6.0 (9B + 14B, training on rented H100)
- **Tool count:** 24 (all deployed and wired into gateway)

## Canonical Files (trust these)

| File | Domain | Trust |
|------|--------|-------|
| `agent/tools/tool_schemas.py` | Tool capability inventory | **Canonical** — 24 tools |
| `langgraph_gateway.py` (in Sethpc-Minecraft-PaperFork repo) | Runtime orchestrator | **Canonical** — production code |
| `mc_aigod_paper.py` (in Sethpc-Minecraft-PaperFork repo) | Server-side AI handler | **Canonical** |
| `training/scripts/validate_all_training.py` | Data quality validator | **Current** |
| `training/scripts/merge_datasets.py` | Training data merge | **Current** |
| `training/scripts/train_lora.py` | Training script | **Current** |

## Historical Files (useful but potentially stale)

| File | Note |
|------|------|
| `PLAN.md` | Reflects 0.4.0→0.5.0 era planning. Not current deployment truth. |
| `SESSION.md` | Historical session log. May reference old architecture. |
| `README.md` | Public-facing. Tool count may lag behind tool_schemas.py. |
| `MODEL_CARD.md` | Reflects 0.5.0 release. Update on each version bump. |
| `data/schema.json` | **LEGACY** — does not validate current data. Use validate_all_training.py instead. |
| `data/validate_dataset.py` | **LEGACY** — replaced by training/scripts/validate_all_training.py. |
| `agent/serve.py` | **Reference only** — not the production runtime. Production runs in PaperFork repo. |

## Runtime Architecture

```
Player chat → Paper server (mc_aigod_paper.py) → LangGraph Gateway (langgraph_gateway.py)
                                                      ↓
                                               Mortdecai model (Ollama)
                                                      ↓
                                               Tool loop (24 tools, max 50 steps)
                                                      ↓
                                               RCON execution → Minecraft server
```

- **Production entrypoint:** `mc_aigod_paper.py` watches server log, dispatches to gateway
- **Gateway:** `langgraph_gateway.py` on port 8091 (internal to CT 644)
- **Model inference:** Ollama on steel141 (3090 Ti F16 for dev, RTX 4000 Q4 for prod)
- **This repo contains:** tools, schemas, training data, training scripts, eval harness
- **The PaperFork repo contains:** runtime (gateway + server handler)

## 24 Tools (all deployed, all wired into gateway)

| Tool | Status | In Training Data |
|------|--------|-----------------|
| rcon.execute | Production | Yes (heavy) |
| minecraft.lookup | Production | Partial (was wiki_lookup) |
| plugin.docs_lookup | Production | Yes |
| world.player_info (+ inventory) | Production | Yes |
| world.server_state | Production | Yes |
| world.nearby_entities | Production | Yes |
| world.scan_area | Production | Needs 0.6.0 examples |
| world.redstone_trace | Production | Needs 0.6.0 examples |
| world.render | Production | Needs 0.6.0 examples |
| server.config | Production | Needs 0.6.0 examples |
| memory.read | Production | Yes |
| memory.write | Production | Yes |
| journal.read | Production | 120 multitool examples |
| journal.write | Production | 120 multitool examples |
| log.query | Production | Needs more examples |
| user.ask | Production | Needs more examples |
| script.write | Production | Yes |
| script.validate | Production | Yes |
| script.execute | Production | Yes |
| script.read | Production | Minimal |
| script.list | Production | Minimal |
| script.delete | Production | Minimal |
| script.schedule (tick/load/delay) | Production | Minimal |
| training.save | Dev only (config toggle) | Needs 0.6.0 examples |

## Data Map

**Raw data:** `data/raw/` — individual JSONL files from various sources
**Processed data:** `data/processed/` — merged, filtered, validated
**Quarantine:** `data/quarantine/` — failed validation, some salvageable
**External:** `data/external/` — IGLU Microsoft Research dataset (nested git repo)

**Known data issues (from validator run):**
- 2,891 commands use `@s` (should be player name)
- 2,633 commands use enchantment syntax Paper RCON rejects
- 24,476 examples in old dict format (need conversion to messages[] chat)
- 7,647 examples have outdated system prompts (missing new tools)

## Ignore List

- `__pycache__/` — committed by accident, not meaningful
- `USER_NOTES_IGNORE_ME/` — private notes, not project context
- `data/external/iglu-repo/` — external dataset, read-only
- `eval/results/` — historical eval outputs, may be stale
- `data/processed/pipeline_output.jsonl` — 7,032 examples with RCON connection failures marked as success. Do NOT trust.
- `data/raw/scraped_*.jsonl` — empty files
- `web/` — admin/community tools, not model-related

## Known Problems

1. **Secrets in tracked files** — RCON passwords, API keys hardcoded. Should be env vars.
2. **README/MODEL_CARD tool count lag** — says 17, reality is 24. Update on release.
3. **agent/serve.py misleads** — looks like main entrypoint but isn't. Real runtime is in PaperFork repo.
4. **data/schema.json is legacy** — doesn't validate current data. Replaced by validate_all_training.py.
5. **pipeline_output.jsonl is poisoned** — connection failures marked as success.

## Working Rules for AI Agents

1. `agent/tools/tool_schemas.py` is the tool inventory. Not README, not PLAN.md.
2. The production runtime is NOT in this repo. It's in `Sethpc-Minecraft-PaperFork/`.
3. Paper RCON cannot give enchanted items. Use plain give + effect combos.
4. `@s` does not work via RCON (no executor context). Training data uses `@p` as a pragmatic fix, but `@p` selects nearest player — not always the requester. Prefer explicit player names in new training data.
5. Dev world is named `devworld`, not `world`. WorldGuard needs `-w devworld`.
6. When in doubt about a command, RCON-validate it on dev (192.168.0.244:25578, pass REDACTED_RCON).
7. Keep seed_dataset dominant in training mix to prevent fill_build regression.
8. The model (0.5.0) cannot produce freeform text in sudo mode — it only outputs JSON. Training 0.6.0 fixes this.
9. `data/processed/pipeline_output.jsonl` is poisoned (7,032 examples with RCON connection failures marked success). Excluded from training.