11 Commits

Author SHA1 Message Date
Seth af5cb4df2a Semver rename: mortdecai:0.4.0, mortdecai:0.5.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:37:36 -04:00
Seth adeda6dd84 Pre-set HSA_OVERRIDE_GFX_VERSION for Strix Halo ROCm detection
Ollama ROCm doesn't auto-detect newer AMD iGPUs (gfx1150/1151).
Setting HSA_OVERRIDE_GFX_VERSION=11.0.0 in the compose fixes this.
Configurable via .env for other AMD chips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 20:37:06 -04:00
Seth f3ea624269 Complete README: cost model, dual ledger, all endpoints documented
Full documentation covering:
- Quick start with automated setup
- Marginal vs dedicated billing modes
- All cost parameters with defaults
- Dual ledger architecture and tamper protection
- Reconciliation process
- All endpoints (public, authenticated, admin)
- Model update paths (remote + manual)
- Response metadata format
- Dashboard features
- GPU support (AMD ROCm + NVIDIA)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 20:00:01 -04:00
Seth 968b00890f Dual ledger: tamper-proof transaction tracking on both sides
Every inference request is recorded in a local JSONL ledger with a
SHA-256 hash of (id + tokens + duration + cost + shared_secret).

Both sides keep independent copies:
- Gateway (Matt's): writes to ledger.jsonl on every request
- Receiver (Seth's): receives callbacks, saves per-gateway ledger

Endpoints:
- GET /ledger — view transactions + total cost
- GET /reconcile — compare ledger vs stats, verify all hashes
- POST /config — adjust cost params live

ledger_receiver.py runs on Seth's server:
- POST /transaction — receive and verify gateway callbacks
- GET /summary — total cost per gateway
- GET /ledger — all transactions across gateways

If either side resets stats, the other's ledger has the full history.
If either side tampers with entries, hash verification catches it.

Tested: request → ledger write → reconcile → hash valid → zero discrepancy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:56:10 -04:00
Seth 583c563daa Fix startup print for new config model
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:50:54 -04:00
Seth 6d3df9ae58 Full cost model: marginal power, labor, profit, live config
Cost model:
- Marginal billing: only charge for watts above idle
- Dedicated billing: charge for all uptime (optional)
- Labor rate: $/hr for operator time, manually logged
- Profit margin: percentage markup on electricity cost
- All parameters adjustable live via POST /config

Dashboard shows:
- Cost breakdown with progress bar
- Power model (idle→load for GPU and system)
- Marginal watts per inference call
- Labor hours + labor cost
- Total owed (electricity + labor + margin)
- GPU utilization, temperature, power draw
- Avg cost per request, estimated remaining requests

Endpoints:
- GET /config — view current cost config
- POST /config — update any parameter live
- GET /stats — full usage stats + cost config (auth required)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:49:14 -04:00
Seth 648b123f14 Add manual model update script
./update-model.sh [url] [name]
Downloads GGUF and loads into Ollama. No remote access needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:41:56 -04:00
Seth 0b37d7de79 Add opt-in model update endpoint + API key support
Gateway: POST /admin/update-model downloads new GGUF and reloads.
Disabled by default — requires ALLOW_MODEL_UPDATES=true in .env.
Matt controls whether remote model updates are allowed.

Self-play: --api-key flag for authenticated gateway connections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:39:50 -04:00
Seth f470f052aa Fix models mount to read-write for Modelfile creation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:35:45 -04:00
Seth df9f623943 Fully automated setup: downloads GGUF, loads model, tests inference
Setup script now:
1. Generates API key
2. Starts Docker containers
3. Downloads GGUF from mortdec.ai automatically (~5.3GB)
4. Creates Ollama model with correct chat template
5. Runs test inference
6. Prints connection details for Seth

Matt just runs ./setup.sh — no manual file copying.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:33:39 -04:00
Seth c5865feb35 Mortdecai Gateway — authenticated Ollama proxy with power metering
- API key auth on all inference endpoints
- Power/cost tracking: GPU TDP × inference time × electricity rate
- Spending cap enforcement
- Web dashboard with live stats
- Docker compose for AMD ROCm (Strix Halo) or NVIDIA
- Auto-setup script with GGUF loading
- Tested against local Ollama

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:26:43 -04:00