Files
Mortdecai/eval/results/bakeoff_1773963920.json
Seth 910d7b4ca7 Qwen3.5-9B bake-off results, model named Mortdecai
Bake-off: qwen3.5:9b base model, 147 cases:
  - 70.1% command match (2x qwen3:8b baseline)
  - 15.6% needed syntax fixes
  - 29.9% miss (mostly God/prayer — no persona training)
  - Avg 7.5s, median 5.7s (thinking tokens)

Model officially named Mortdecai.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:46:00 -04:00

17 lines
449 B
JSON

{
"timestamp": 1773963920,
"ollama_url": "http://192.168.0.141:11434",
"note": "Partial run \u2014 147 of 1542 cases before manual kill",
"summary": [
{
"model": "qwen3.5:9b",
"n": 147,
"cmd_match_%": 70.1,
"syntax_fixes_%": 15.6,
"miss_%": 29.9,
"avg_latency_ms": 7457,
"median_latency_ms": 5660,
"note": "Base model, no fine-tuning. 2x better than qwen3:8b base (70% vs 34%)"
}
]
}