af5cb4df2a
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
216 lines
6.1 KiB
Markdown
216 lines
6.1 KiB
Markdown
# Mortdecai Gateway
|
||
|
||
Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
|
||
|
||
## Quick Start
|
||
|
||
```bash
|
||
git clone <repo-url>
|
||
cd mortdecai-gateway
|
||
chmod +x setup.sh
|
||
./setup.sh
|
||
```
|
||
|
||
The setup script:
|
||
1. Generates an API key
|
||
2. Starts Ollama + gateway in Docker
|
||
3. Downloads the model (~5.3 GB)
|
||
4. Loads it into Ollama
|
||
5. Runs a test inference
|
||
6. Prints connection details
|
||
|
||
Dashboard: http://localhost:8434/dashboard
|
||
|
||
## Architecture
|
||
|
||
```
|
||
Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
|
||
```
|
||
|
||
The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
|
||
|
||
## Cost Model
|
||
|
||
The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.
|
||
|
||
```
|
||
Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
|
||
```
|
||
|
||
### Configuration
|
||
|
||
All parameters in `.env` or adjustable live via `POST /config`:
|
||
|
||
| Parameter | Default | Description |
|
||
|-----------|---------|-------------|
|
||
| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
|
||
| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
|
||
| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
|
||
| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
|
||
| `ELECTRICITY_RATE` | 0.15 | $/kWh |
|
||
| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
|
||
| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
|
||
| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
|
||
| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
|
||
| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |
|
||
|
||
### Billing Modes
|
||
|
||
**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
|
||
|
||
**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
|
||
|
||
## Dual Ledger
|
||
|
||
Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.
|
||
|
||
### How it works
|
||
|
||
```
|
||
1. Client sends inference request to gateway
|
||
2. Gateway processes request via Ollama
|
||
3. Gateway records transaction in local ledger.jsonl
|
||
4. Gateway POSTs transaction to client's callback URL
|
||
5. Client's ledger_receiver.py saves independent copy
|
||
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
|
||
```
|
||
|
||
### Tamper protection
|
||
|
||
| Scenario | Detection |
|
||
|----------|-----------|
|
||
| Gateway resets stats | Client's ledger has full history |
|
||
| Client denies requests happened | Gateway's ledger has full history |
|
||
| Either side edits a transaction | Hash verification fails on `/reconcile` |
|
||
| Shared secret mismatch | All hashes show as invalid |
|
||
|
||
### Setup
|
||
|
||
Both sides configure the same `LEDGER_SECRET` in their `.env`:
|
||
|
||
**Gateway (.env):**
|
||
```
|
||
LEDGER_SECRET=agreed_upon_secret_here
|
||
CALLBACK_URL=http://client_ip:8435/transaction
|
||
```
|
||
|
||
**Client (ledger_receiver.py):**
|
||
```
|
||
LEDGER_SECRET=agreed_upon_secret_here
|
||
python3 ledger_receiver.py
|
||
```
|
||
|
||
### Reconciliation
|
||
|
||
```bash
|
||
# On the gateway — verify all hashes, compare ledger vs stats
|
||
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
|
||
```
|
||
|
||
Response:
|
||
```json
|
||
{
|
||
"ledger_entries": 142,
|
||
"ledger_total_cost": 0.003421,
|
||
"stats_total_cost": 0.003421,
|
||
"discrepancy": 0.0,
|
||
"hash_verification": {
|
||
"total": 142,
|
||
"valid": 142,
|
||
"invalid": 0
|
||
},
|
||
"status": "OK"
|
||
}
|
||
```
|
||
|
||
## Endpoints
|
||
|
||
### Public (no auth)
|
||
|
||
| Endpoint | Description |
|
||
|----------|-------------|
|
||
| `GET /health` | Ollama status + loaded models |
|
||
| `GET /dashboard` | Web dashboard with live stats |
|
||
|
||
### Authenticated
|
||
|
||
| Endpoint | Description |
|
||
|----------|-------------|
|
||
| `POST /api/chat` | Proxied to Ollama (inference) |
|
||
| `POST /api/generate` | Proxied to Ollama (inference) |
|
||
| `GET /stats` | Full usage stats + cost config |
|
||
| `GET /config` | View cost configuration |
|
||
| `POST /config` | Update cost parameters live |
|
||
| `GET /ledger` | View recent transactions + total cost |
|
||
| `GET /reconcile` | Verify ledger integrity |
|
||
|
||
### Admin
|
||
|
||
| Endpoint | Description |
|
||
|----------|-------------|
|
||
| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |
|
||
|
||
## Model Updates
|
||
|
||
**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:
|
||
|
||
```bash
|
||
curl -X POST http://gateway:8434/admin/update-model \
|
||
-H "Authorization: Bearer $KEY" \
|
||
-d '{"url": "https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf", "name": "mortdecai:0.5.0"}'
|
||
```
|
||
|
||
**Manual update**: Run the update script:
|
||
```bash
|
||
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf mortdecai:0.5.0
|
||
```
|
||
|
||
## Response Metadata
|
||
|
||
Every proxied response includes gateway metadata:
|
||
|
||
```json
|
||
{
|
||
"message": {"role": "assistant", "content": "..."},
|
||
"_gateway": {
|
||
"duration_seconds": 3.42,
|
||
"marginal_watts": 59,
|
||
"energy_wh": 0.0561,
|
||
"estimated_cost": 0.000008,
|
||
"total_cost": 0.0342,
|
||
"budget_remaining": 9.9658,
|
||
"billing_mode": "marginal"
|
||
}
|
||
}
|
||
```
|
||
|
||
## Dashboard
|
||
|
||
The dashboard shows live:
|
||
- Request count, tokens, inference time
|
||
- Cost progress bar (spent vs cap)
|
||
- Average cost per request, estimated remaining requests
|
||
- Power model breakdown (idle→load for GPU and system)
|
||
- Labor hours and cost
|
||
- GPU utilization, temperature, power draw
|
||
|
||
Auto-refreshes every 10 seconds.
|
||
|
||
## GPU Support
|
||
|
||
**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
|
||
|
||
**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.
|
||
|
||
## Files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `gateway.py` | Main proxy server |
|
||
| `ledger_receiver.py` | Client-side transaction receiver |
|
||
| `docker-compose.yml` | Ollama + gateway containers |
|
||
| `Dockerfile` | Gateway container build |
|
||
| `setup.sh` | Automated first-time setup |
|
||
| `update-model.sh` | Manual model update |
|
||
| `.env.example` | Configuration template |
|