Add LICENSE, MODEL_CARD, requirements, CONTRIBUTING

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:43:21 -04:00
parent f39809eaca
commit bd65f4a84c
5 changed files with 290 additions and 0 deletions
@@ -0,0 +1,106 @@
+# Model Card: Mortdecai
+
+## Model Details
+
+| Field | Value |
+|-------|-------|
+| **Name** | Mortdecai |
+| **Version** | 0.4.0 |
+| **Base Model** | Qwen3.5-9B (Apache 2.0) |
+| **Adaptation** | QLoRA (4-bit base + LoRA adapters in FP16) |
+| **Parameters** | 9.4B total, 29M trainable (0.31%) |
+| **Training Hardware** | RTX 3090 Ti (24GB VRAM) |
+| **Inference Hardware** | RTX 4000 (16GB), RTX 2080 Ti (11GB), or any GPU with 6GB+ VRAM |
+| **Quantization** | Q4_K_M (5.3GB GGUF) |
+| **Context Length** | 4096 tokens (training), 262K tokens (model capability) |
+| **License** | Proprietary (adapter + training data). Base model: Apache 2.0 |
+
+## Intended Use
+
+Mortdecai is designed for **Minecraft Java Edition 1.21.x server operations**:
+
+- Translating natural language to valid Minecraft commands
+- Controlling an AI God character that responds to player prayers
+- Server administration via chat (gamerules, effects, world editing)
+- Error correction (self-corrects failed RCON commands)
+
+**Not intended for:**
+- General-purpose chat or reasoning
+- Other games or non-Minecraft domains
+- Safety-critical applications
+- Use without the validator safety layer
+
+## Training Data
+
+| Source | Count | Description |
+|--------|-------|-------------|
+| Hand-curated examples | 966 | Command syntax, recipes, enchantments, entities, effects |
+| Player interactions | 654 | Real prayers from live server players |
+| Sudo translations | 525 | Natural language → command pairs |
+| Tool-calling sequences | 1,159 | Multi-turn RCON execution with error correction |
+| Self-play | 5,000+ | Model-generated prompts validated via RCON |
+| API distillation | 344 | Claude Haiku gold-standard responses |
+| Error corrections | 150+ | Wrong → right command pairs |
+
+**Total: ~8,400+ examples**
+
+### Data Collection Methods
+
+1. **Manual curation** — Minecraft Wiki, command reference, recipe databases
+2. **Live server logs** — Real player interactions on Paper 1.21.x servers
+3. **Bot collection** — Mineflayer bots with Gemini/Dolphin prompt generation
+4. **API distillation** — Claude Haiku and Gemini Flash responses
+5. **Self-play** — Model generates edge cases, attempts via RCON, learns from results
+6. **RCON validation** — Every command tested against a live Minecraft server
+
+### Known Biases
+
+- Training data skewed toward English (~97%) with limited multilingual coverage (3%)
+- Command distribution favors `give` and `effect` over complex `execute` chains
+- God persona training reflects a specific dramatic character — not neutral
+- Player interaction data comes from a small group of testers (< 10 players)
+- Self-play data may overrepresent patterns the model is already good at
+
+## Evaluation
+
+### Bake-off Results (0.4.0, 2,397 test cases)
+
+| Metric | Score |
+|--------|-------|
+| Command match | 75.5% |
+| Exact match | 22.9% |
+| Syntax correct | 80.5% |
+| Safety compliance | 99.7% |
+| No gratuitous tp | 98.5% |
+| Avg latency | 4.0s |
+
+### Safety
+
+The model uses a 5-level risk hierarchy:
+
+- **Level 0 (never):** ban, kick, stop, op — hardcoded block in validator
+- **Level 1 (refuse):** permanent server state changes
+- **Level 2 (warn):** temporary/reversible changes, destructive actions
+- **Level 3 (normal):** standard gameplay commands
+- **Level 4 (generous):** full enchanted gear, large material stacks
+
+Additional safety layers:
+- Validator blocks dangerous commands even if model generates them
+- Dangerous effect duration caps (levitation 15s, wither 30s)
+- Fall protection (detects lethal teleports)
+- Gamerule auto-revert timers
+
+### Limitations
+
+- Cannot determine what a player is looking at (no raycast)
+- Limited awareness of world state beyond player position
+- Enchantment syntax errors still occur (~15% need validator fixes)
+- Empty responses on ~5% of requests
+- Thinks in `<think>` blocks that must be stripped (Qwen3 behavior)
+- God persona can be unpredictable by design
+
+## Environmental Impact
+
+- **Training energy:** ~84W × 4 hours = 0.34 kWh per training run
+- **Inference energy:** ~54W during calls, idle otherwise
+- **All compute on consumer GPUs** — no data center resources used