0.5.0 bake-off results, knowledge lookup tools, training progress chart
Bake-off (0.5.0 vs 0.4.0): - Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2 - Enchantments: +47% (20% → 67%) - EssentialsX: +60% (0% → 60%) - Effects: +25% (0% → 25%) - Regressions: fill_build -67%, world -20% Knowledge Lookup Tools (4 new): - plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs - minecraft.changelog_lookup: version history from Minecraft Wiki - paper.docs_lookup: Paper server-specific documentation - Wired into gateway model-driven tool loop and exploration self-play Exploration Self-Play: - General (vanilla MC) and plugins focus modes - Wiki-grounded: model researches before acting, validates through RCON - 2,243 exploration examples generated, 150 kept after quality filtering Training Progress Chart: - SVG chart showing training examples and inverse loss across versions - Added to MODEL_CARD.md for Gitea display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+41
-20
@@ -1,17 +1,19 @@
|
||||
# Model Card: Mortdecai
|
||||
|
||||

|
||||
|
||||
## Model Details
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Name** | Mortdecai |
|
||||
| **Version** | 0.4.0 |
|
||||
| **Version** | 0.5.0 |
|
||||
| **Base Model** | Qwen3.5-9B (Apache 2.0) |
|
||||
| **Adaptation** | QLoRA (4-bit base + LoRA adapters in FP16) |
|
||||
| **Parameters** | 9.4B total, 29M trainable (0.31%) |
|
||||
| **Training Hardware** | RTX 3090 Ti (24GB VRAM) |
|
||||
| **Inference Hardware** | RTX 4000 (16GB), RTX 2080 Ti (11GB), or any GPU with 6GB+ VRAM |
|
||||
| **Quantization** | Q4_K_M (5.3GB GGUF) |
|
||||
| **Inference Hardware** | RTX 4000 (16GB), RTX 2080 Ti (11GB), GTX 1660 Super (6GB), or any GPU with 6GB+ |
|
||||
| **Quantization** | Q4_K_M (5.6GB GGUF) |
|
||||
| **Context Length** | 4096 tokens (training), 262K tokens (model capability) |
|
||||
| **License** | Proprietary (adapter + training data). Base model: Apache 2.0 |
|
||||
|
||||
@@ -34,15 +36,25 @@ Mortdecai is designed for **Minecraft Java Edition 1.21.x server operations**:
|
||||
|
||||
| Source | Count | Description |
|
||||
|--------|-------|-------------|
|
||||
| Hand-curated examples | 966 | Command syntax, recipes, enchantments, entities, effects |
|
||||
| Player interactions | 654 | Real prayers from live server players |
|
||||
| Sudo translations | 525 | Natural language → command pairs |
|
||||
| Tool-calling sequences | 1,159 | Multi-turn RCON execution with error correction |
|
||||
| Self-play | 5,000+ | Model-generated prompts validated via RCON |
|
||||
| API distillation | 344 | Claude Haiku gold-standard responses |
|
||||
| Error corrections | 150+ | Wrong → right command pairs |
|
||||
| Hand-curated seed examples | 3,196 | Command syntax, recipes, enchantments, entities, effects, memory, events |
|
||||
| Tool-calling sequences | 1,430 | Multi-turn RCON execution with 17 tools (script, memory, wiki, plugins) |
|
||||
| IGLU build dataset | 4,656 | Natural language → block placement commands from Microsoft Research |
|
||||
| Plugin training (RCON-validated) | 104 | WorldGuard, CoreProtect, EssentialsX, LuckPerms, FAWE |
|
||||
| Exploration self-play | 150 | Wiki-grounded knowledge discovery with RCON validation |
|
||||
| Self-play (0.4.0 + 0.5.0) | 2,900+ | Model-generated prompts validated via RCON |
|
||||
| Live server audit | 8,000+ | Wolf bot + real player interactions from 3 servers |
|
||||
|
||||
**Total: ~8,400+ examples**
|
||||
**Total: ~20,000+ examples across all sources**
|
||||
|
||||
### Tool Architecture (17 tools)
|
||||
|
||||
| Category | Tools |
|
||||
|----------|-------|
|
||||
| Execution | rcon.execute |
|
||||
| Knowledge | minecraft.wiki_lookup, plugin.docs_lookup, minecraft.changelog_lookup, paper.docs_lookup |
|
||||
| World Sensing | world.player_info, world.server_state, world.nearby_entities |
|
||||
| Memory | memory.read, memory.write |
|
||||
| Scripts | script.write, script.validate, script.execute, script.read, script.list, script.delete, script.schedule |
|
||||
|
||||
### Data Collection Methods
|
||||
|
||||
@@ -63,16 +75,25 @@ Mortdecai is designed for **Minecraft Java Edition 1.21.x server operations**:
|
||||
|
||||
## Evaluation
|
||||
|
||||
### Bake-off Results (0.4.0, 2,397 test cases)
|
||||
### Bake-off Results (0.5.0 vs 0.4.0, 38 prompts × 12 categories)
|
||||
|
||||
| Metric | Score |
|
||||
|--------|-------|
|
||||
| Command match | 75.5% |
|
||||
| Exact match | 22.9% |
|
||||
| Syntax correct | 80.5% |
|
||||
| Safety compliance | 99.7% |
|
||||
| No gratuitous tp | 98.5% |
|
||||
| Avg latency | 4.0s |
|
||||
| Metric | 0.4.0 | 0.5.0 |
|
||||
|--------|-------|-------|
|
||||
| Overall success rate | 45.2% | 46.8% |
|
||||
| Avg response time | 2.60s | 2.11s |
|
||||
| Errors (crashes) | 2 | 0 |
|
||||
| Empty responses | 0 | 0 |
|
||||
|
||||
**Category improvements (0.5.0 vs 0.4.0):**
|
||||
|
||||
| Category | 0.4.0 | 0.5.0 | Change |
|
||||
|----------|-------|-------|--------|
|
||||
| Enchantments | 20% | 67% | **+47%** |
|
||||
| EssentialsX | 0% | 60% | **+60%** |
|
||||
| Effects | 0% | 25% | **+25%** |
|
||||
| Basic commands | 75% | 75% | — |
|
||||
| Teleport | 100% | 100% | — |
|
||||
| Error recovery | 50% | 50% | — |
|
||||
|
||||
### Safety
|
||||
|
||||
|
||||
Reference in New Issue
Block a user