Mortdecai Gateway — authenticated Ollama proxy with power metering
- API key auth on all inference endpoints - Power/cost tracking: GPU TDP × inference time × electricity rate - Spending cap enforcement - Web dashboard with live stats - Docker compose for AMD ROCm (Strix Halo) or NVIDIA - Auto-setup script with GGUF loading - Tested against local Ollama Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,78 @@
|
||||
# Mortdecai Gateway
|
||||
|
||||
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
git clone <repo-url>
|
||||
cd mortdecai-gateway
|
||||
mkdir -p models
|
||||
# Copy the GGUF file into models/
|
||||
cp /path/to/mortdecai-v4.gguf models/
|
||||
chmod +x setup.sh
|
||||
./setup.sh
|
||||
```
|
||||
|
||||
Dashboard: http://localhost:8434/dashboard
|
||||
|
||||
## What It Does
|
||||
|
||||
```
|
||||
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
|
||||
```
|
||||
|
||||
The gateway sits in front of Ollama and:
|
||||
- Authenticates requests via API key
|
||||
- Tracks inference time, tokens, energy usage
|
||||
- Estimates electricity cost (GPU TDP × time × rate)
|
||||
- Enforces a spending cap
|
||||
- Provides a dashboard with live stats
|
||||
|
||||
## Configuration
|
||||
|
||||
Edit `.env`:
|
||||
|
||||
```
|
||||
API_KEY=mk_your_secret_key
|
||||
GPU_TDP_WATTS=54 # Your GPU's TDP
|
||||
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
|
||||
ELECTRICITY_RATE=0.15 # $/kWh
|
||||
SPENDING_CAP=10.00 # $ before gateway stops accepting
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Endpoint | Auth | Description |
|
||||
|----------|------|-------------|
|
||||
| `GET /health` | No | Ollama status + loaded models |
|
||||
| `GET /dashboard` | No | Web dashboard with live stats |
|
||||
| `GET /stats` | Yes | JSON usage stats |
|
||||
| `POST /api/chat` | Yes | Proxied to Ollama |
|
||||
| `POST /api/generate` | Yes | Proxied to Ollama |
|
||||
| `*` | Yes | Everything else proxied to Ollama |
|
||||
|
||||
## Response Metadata
|
||||
|
||||
Every proxied response includes a `_gateway` field:
|
||||
|
||||
```json
|
||||
{
|
||||
"message": { "role": "assistant", "content": "..." },
|
||||
"_gateway": {
|
||||
"duration_seconds": 3.42,
|
||||
"energy_wh": 0.0798,
|
||||
"estimated_cost": 0.000012,
|
||||
"total_cost": 0.0342,
|
||||
"budget_remaining": 9.9658
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## AMD ROCm
|
||||
|
||||
The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
|
||||
|
||||
## NVIDIA
|
||||
|
||||
Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.
|
||||
Reference in New Issue
Block a user