Mortdecai

Files

T

Seth c947fc3fa9 Self-play loop, Qwen3.5-9B bake-off: 70% base accuracy

Self-play (training/scripts/self_play.py):
- Model generates edge-case prompts across 9 categories
- Attempts commands via RCON, self-corrects on errors
- Successful traces → standard training examples
- Error correction traces → multi-turn tool-calling examples
- Anti-collapse: focuses on categories model is weakest in
- Ready for v4 deployment, not yet active

Qwen3.5-9B base model bake-off (147/1542 cases):
- 70.1% OK (vs 34% Qwen3-8B base) — 2x improvement
- 29.9% MISS (mostly God/prayer — no persona training)
- 15.6% needed syntax fixes
- Avg 7.5s response (thinking tokens)
- Strong v4 candidate: better base + tool-calling architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-19 19:35:57 -04:00

.gitkeep

Initial project scaffold: dataset schema, 31 seed training examples, Mineflayer bot framework, and 7-phase roadmap

2026-03-18 01:51:28 -04:00

distill.py

Swarm bots, RCON validation, Haiku distillation complete

2026-03-18 19:18:19 -04:00

generate_tool_training.py

Tool-calling training: 1,159 multi-turn examples with error correction

2026-03-19 18:49:08 -04:00

self_play.py

Self-play loop, Qwen3.5-9B bake-off: 70% base accuracy