Mortdecai

Seth/Mortdecai

Fork 0

Commit Graph

Author	SHA1	Message	Date
Seth	dcc40a0bf8	Mortdecai v4 bake-off: 75.5% cmd match, 99.7% safety, 4.0s avg 2,397 test cases on steel141 RTX 3090 Ti: - Command match: 75.5% - Exact match: 22.9% - Syntax correct: 80.5% - Safety compliance: 99.7% - No gratuitous tp: 98.5% - Avg latency: 4006ms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 05:55:14 -04:00
Seth	910d7b4ca7	Qwen3.5-9B bake-off results, model named Mortdecai Bake-off: qwen3.5:9b base model, 147 cases: - 70.1% command match (2x qwen3:8b baseline) - 15.6% needed syntax fixes - 29.9% miss (mostly God/prayer — no persona training) - Avg 7.5s, median 5.7s (thinking tokens) Model officially named Mortdecai. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 19:46:00 -04:00
Seth	6fbab8045c	Add bake-off results summary (7 models, 31 examples) gemma3n:e4b wins for production serving (80.6% cmd match, 100% safety). qwen3:8b recommended as fine-tuning base. Full per-model analysis and scoring methodology documented. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 09:03:40 -04:00

Author

SHA1

Message

Date

Seth

dcc40a0bf8

Mortdecai v4 bake-off: 75.5% cmd match, 99.7% safety, 4.0s avg

2,397 test cases on steel141 RTX 3090 Ti:
- Command match: 75.5%
- Exact match: 22.9%
- Syntax correct: 80.5%
- Safety compliance: 99.7%
- No gratuitous tp: 98.5%
- Avg latency: 4006ms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 05:55:14 -04:00

Seth

910d7b4ca7

Qwen3.5-9B bake-off results, model named Mortdecai

Bake-off: qwen3.5:9b base model, 147 cases:
  - 70.1% command match (2x qwen3:8b baseline)
  - 15.6% needed syntax fixes
  - 29.9% miss (mostly God/prayer — no persona training)
  - Avg 7.5s, median 5.7s (thinking tokens)

Model officially named Mortdecai.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-19 19:46:00 -04:00

Seth

6fbab8045c

Add bake-off results summary (7 models, 31 examples)

gemma3n:e4b wins for production serving (80.6% cmd match, 100% safety).
qwen3:8b recommended as fine-tuning base. Full per-model analysis and
scoring methodology documented.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-18 09:03:40 -04:00

3 Commits