T

Mortdecai d199c788c4 docs: training data analysis — 6 compounding failure modes identified

Root cause: 90% of system prompts exceed max_seq_len (2048 tokens)
by 2.5x, so model trained on truncated fragments with no user/assistant
content. Plus mixed paradigm (55% tool_call / 45% JSON), 6 JSON schema
variants, contaminated examples, and /no_think misuse.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-26 03:02:18 -04:00

data

docs: Mortdecai 0.6.0 model analysis — fine-tunes broken, base model rankings

2026-03-26 02:39:52 -04:00

scripts

docs: Mortdecai 0.6.0 model analysis — fine-tunes broken, base model rankings

2026-03-26 02:39:52 -04:00

analysis-report.md

docs: Mortdecai 0.6.0 model analysis — fine-tunes broken, base model rankings

2026-03-26 02:39:52 -04:00

README.md

docs: Mortdecai 0.6.0 model analysis — fine-tunes broken, base model rankings

2026-03-26 02:39:52 -04:00

training-data-analysis.md

docs: training data analysis — 6 compounding failure modes identified

2026-03-26 03:02:18 -04:00

README.md

Mortdecai Model Analysis

Analysis of Mortdecai 0.6.0 fine-tuned models vs base model candidates for the Conductor/Hand roles in Mortdecai 2.0.

Date: 2026-03-26 Conducted by: Claude Opus 4.6 (analyst role) Hardware: Matt's Strix Halo (64GB unified memory) running Ollama

Summary

Both Mortdecai 0.6.0 fine-tunes (Qwen3.5 9B and 27B) are completely broken — 0% JSON compliance across all tests. The training signal exists in the weights (proven via raw completion mode) but is inaccessible through the chat API due to chat template misalignment during training.

Base models dramatically outperform the fine-tunes. gemma3:12b and phi4:14b both achieve 100% JSON compliance with zero fine-tuning.

Files

File	Description
`analysis-report.md`	Full analysis with methodology, findings, and recommendations
`data/mortdecai-interview.txt`	Raw output from fine-tuned model interviews (8 tests each)
`data/base-model-interview.txt`	Raw output from base model comparison (6 models, 5 tests each)
`data/deep-probes.txt`	Diagnostic probes: training signal detection, chat template, identity
`scripts/model_interview.py`	Interview script for fine-tuned models
`scripts/base_model_interview.py`	Comparison script for base models
`scripts/deep_probe.py`	Deep diagnostic probe script