0.5.0 bake-off results, knowledge lookup tools, training progress chart

Bake-off (0.5.0 vs 0.4.0):
- Overall: 46.8% vs 45.2% (+1.6%), 0 errors vs 2
- Enchantments: +47% (20% → 67%)
- EssentialsX: +60% (0% → 60%)
- Effects: +25% (0% → 25%)
- Regressions: fill_build -67%, world -20%

Knowledge Lookup Tools (4 new):
- plugin.docs_lookup: WorldGuard, WorldEdit, CoreProtect, EssentialsX, LuckPerms docs
- minecraft.changelog_lookup: version history from Minecraft Wiki
- paper.docs_lookup: Paper server-specific documentation
- Wired into gateway model-driven tool loop and exploration self-play

Exploration Self-Play:
- General (vanilla MC) and plugins focus modes
- Wiki-grounded: model researches before acting, validates through RCON
- 2,243 exploration examples generated, 150 kept after quality filtering

Training Progress Chart:
- SVG chart showing training examples and inverse loss across versions
- Added to MODEL_CARD.md for Gitea display

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mortdecai
2026-03-21 15:28:09 -04:00
parent da8f557219
commit f5118505b1
10 changed files with 3215 additions and 20 deletions
+89
View File
@@ -69,6 +69,95 @@ TOOL_SCHEMAS: List[Dict[str, Any]] = [
}
}
},
{
"name": "plugin.docs_lookup",
"description": (
"Look up plugin command syntax and documentation for server plugins: "
"WorldGuard, WorldEdit/FAWE, CoreProtect, EssentialsX, LuckPerms. "
"Use this when unsure about plugin-specific command syntax, flags, "
"parameters, or configuration options."
),
"parameters": {
"type": "object",
"properties": {
"plugin": {
"type": "string",
"enum": ["worldguard", "worldedit", "coreprotect", "essentialsx", "luckperms"],
"description": "Which plugin to look up docs for."
},
"query": {
"type": "string",
"description": "What to search for (e.g. 'region flags', 'rollback syntax', 'group inheritance')."
}
},
"required": ["plugin", "query"],
"additionalProperties": False
},
"returns": {
"type": "object",
"properties": {
"content": {"type": "string"},
"url": {"type": "string"}
}
}
},
{
"name": "minecraft.changelog_lookup",
"description": (
"Look up what changed in a specific Minecraft version. Use this to check "
"if a feature, item, or command exists in the current version (1.21), "
"when something was added or removed, or what changed between versions."
),
"parameters": {
"type": "object",
"properties": {
"version": {
"type": "string",
"description": "Version to look up (e.g. '1.21', '1.20.4', '1.19'). Omit for latest."
},
"query": {
"type": "string",
"description": "What to search for in the changelog (e.g. 'mace', 'copper', 'trial chambers')."
}
},
"required": ["query"],
"additionalProperties": False
},
"returns": {
"type": "object",
"properties": {
"content": {"type": "string"},
"version": {"type": "string"},
"url": {"type": "string"}
}
}
},
{
"name": "paper.docs_lookup",
"description": (
"Look up Paper server-specific documentation — API differences from Spigot, "
"Paper-specific configuration, async chunk loading, and server optimization. "
"Use when behavior differs from vanilla or Spigot."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to search for (e.g. 'async chunks', 'paper-world config', 'timings')."
}
},
"required": ["query"],
"additionalProperties": False
},
"returns": {
"type": "object",
"properties": {
"content": {"type": "string"},
"url": {"type": "string"}
}
}
},
{
"name": "world.player_info",
"description": (