Seth df9f623943 Fully automated setup: downloads GGUF, loads model, tests inference
Setup script now:
1. Generates API key
2. Starts Docker containers
3. Downloads GGUF from mortdec.ai automatically (~5.3GB)
4. Creates Ollama model with correct chat template
5. Runs test inference
6. Prints connection details for Seth

Matt just runs ./setup.sh — no manual file copying.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:33:39 -04:00

Mortdecai Gateway

Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.

Quick Start

git clone <repo-url>
cd mortdecai-gateway
mkdir -p models
# Copy the GGUF file into models/
cp /path/to/mortdecai-v4.gguf models/
chmod +x setup.sh
./setup.sh

Dashboard: http://localhost:8434/dashboard

What It Does

Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet

The gateway sits in front of Ollama and:

  • Authenticates requests via API key
  • Tracks inference time, tokens, energy usage
  • Estimates electricity cost (GPU TDP × time × rate)
  • Enforces a spending cap
  • Provides a dashboard with live stats

Configuration

Edit .env:

API_KEY=mk_your_secret_key
GPU_TDP_WATTS=54          # Your GPU's TDP
SYSTEM_OVERHEAD_WATTS=30  # CPU/RAM draw during inference
ELECTRICITY_RATE=0.15     # $/kWh
SPENDING_CAP=10.00        # $ before gateway stops accepting

Endpoints

Endpoint Auth Description
GET /health No Ollama status + loaded models
GET /dashboard No Web dashboard with live stats
GET /stats Yes JSON usage stats
POST /api/chat Yes Proxied to Ollama
POST /api/generate Yes Proxied to Ollama
* Yes Everything else proxied to Ollama

Response Metadata

Every proxied response includes a _gateway field:

{
  "message": { "role": "assistant", "content": "..." },
  "_gateway": {
    "duration_seconds": 3.42,
    "energy_wh": 0.0798,
    "estimated_cost": 0.000012,
    "total_cost": 0.0342,
    "budget_remaining": 9.9658
  }
}

AMD ROCm

The Docker compose uses ollama/ollama:rocm by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.

NVIDIA

Edit docker-compose.yml: uncomment the deploy section and comment out the devices section.

S
Description
Authenticated Ollama proxy with power metering for distributed inference
Readme 93 KiB
Languages
Python 86.4%
Shell 13.3%
Dockerfile 0.3%