6d3df9ae58652c731b6626a63416cf61343db421
Cost model: - Marginal billing: only charge for watts above idle - Dedicated billing: charge for all uptime (optional) - Labor rate: $/hr for operator time, manually logged - Profit margin: percentage markup on electricity cost - All parameters adjustable live via POST /config Dashboard shows: - Cost breakdown with progress bar - Power model (idle→load for GPU and system) - Marginal watts per inference call - Labor hours + labor cost - Total owed (electricity + labor + margin) - GPU utilization, temperature, power draw - Avg cost per request, estimated remaining requests Endpoints: - GET /config — view current cost config - POST /config — update any parameter live - GET /stats — full usage stats + cost config (auth required) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mortdecai Gateway
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
Quick Start
git clone <repo-url>
cd mortdecai-gateway
mkdir -p models
# Copy the GGUF file into models/
cp /path/to/mortdecai-v4.gguf models/
chmod +x setup.sh
./setup.sh
Dashboard: http://localhost:8434/dashboard
What It Does
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
The gateway sits in front of Ollama and:
- Authenticates requests via API key
- Tracks inference time, tokens, energy usage
- Estimates electricity cost (GPU TDP × time × rate)
- Enforces a spending cap
- Provides a dashboard with live stats
Configuration
Edit .env:
API_KEY=mk_your_secret_key
GPU_TDP_WATTS=54 # Your GPU's TDP
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
ELECTRICITY_RATE=0.15 # $/kWh
SPENDING_CAP=10.00 # $ before gateway stops accepting
Endpoints
| Endpoint | Auth | Description |
|---|---|---|
GET /health |
No | Ollama status + loaded models |
GET /dashboard |
No | Web dashboard with live stats |
GET /stats |
Yes | JSON usage stats |
POST /api/chat |
Yes | Proxied to Ollama |
POST /api/generate |
Yes | Proxied to Ollama |
* |
Yes | Everything else proxied to Ollama |
Response Metadata
Every proxied response includes a _gateway field:
{
"message": { "role": "assistant", "content": "..." },
"_gateway": {
"duration_seconds": 3.42,
"energy_wh": 0.0798,
"estimated_cost": 0.000012,
"total_cost": 0.0342,
"budget_remaining": 9.9658
}
}
AMD ROCm
The Docker compose uses ollama/ollama:rocm by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
NVIDIA
Edit docker-compose.yml: uncomment the deploy section and comment out the devices section.
Description
Languages
Python
86.4%
Shell
13.3%
Dockerfile
0.3%