Compare commits
10 Commits
c5865feb35
..
master
| Author | SHA1 | Date | |
|---|---|---|---|
| af5cb4df2a | |||
| adeda6dd84 | |||
| f3ea624269 | |||
| 968b00890f | |||
| 583c563daa | |||
| 6d3df9ae58 | |||
| 648b123f14 | |||
| 0b37d7de79 | |||
| f470f052aa | |||
| df9f623943 |
+29
-4
@@ -1,6 +1,31 @@
|
|||||||
# Mortdecai Gateway Configuration
|
# Mortdecai Gateway Configuration
|
||||||
|
# All values can also be adjusted live via POST /config
|
||||||
|
|
||||||
|
# Auth
|
||||||
API_KEY=mk_change_this_to_a_real_key
|
API_KEY=mk_change_this_to_a_real_key
|
||||||
GPU_TDP_WATTS=54
|
|
||||||
SYSTEM_OVERHEAD_WATTS=30
|
# Power model
|
||||||
ELECTRICITY_RATE=0.15
|
GPU_IDLE_WATTS=15 # GPU at idle (watts)
|
||||||
SPENDING_CAP=10.00
|
GPU_LOAD_WATTS=54 # GPU during inference (watts)
|
||||||
|
SYSTEM_IDLE_WATTS=45 # Whole system idle (watts)
|
||||||
|
SYSTEM_INFERENCE_WATTS=65 # Whole system during inference (watts)
|
||||||
|
|
||||||
|
# Billing
|
||||||
|
ELECTRICITY_RATE=0.15 # $/kWh
|
||||||
|
BILLING_MODE=marginal # "marginal" (only extra watts) or "dedicated" (all uptime)
|
||||||
|
BASE_RATE_PER_HOUR=0.00 # $/hr base (dedicated mode only)
|
||||||
|
SPENDING_CAP=10.00 # $ before gateway stops accepting
|
||||||
|
|
||||||
|
# Labor & profit
|
||||||
|
LABOR_RATE_PER_HOUR=0.00 # $/hr for setup/maintenance time
|
||||||
|
PROFIT_MARGIN=0.00 # Markup multiplier (0.10 = 10%)
|
||||||
|
|
||||||
|
# Dual ledger
|
||||||
|
LEDGER_SECRET=change_me_to_a_shared_secret # Both sides must match
|
||||||
|
CALLBACK_URL= # Seth's server (e.g. http://seth_ip:8435/transaction)
|
||||||
|
|
||||||
|
# Features
|
||||||
|
ALLOW_MODEL_UPDATES=false # Allow remote model push via /admin/update-model
|
||||||
|
|
||||||
|
# AMD GPU (Strix Halo / newer chips that ROCm doesn't auto-detect)
|
||||||
|
HSA_OVERRIDE_GFX_VERSION=11.0.0
|
||||||
|
|||||||
@@ -1,78 +1,215 @@
|
|||||||
# Mortdecai Gateway
|
# Mortdecai Gateway
|
||||||
|
|
||||||
Authenticated Ollama proxy with power metering. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
|
Authenticated Ollama proxy with power metering and tamper-proof billing. Deploy on any machine with a GPU to contribute inference compute to the Mortdecai training pipeline.
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone <repo-url>
|
git clone <repo-url>
|
||||||
cd mortdecai-gateway
|
cd mortdecai-gateway
|
||||||
mkdir -p models
|
|
||||||
# Copy the GGUF file into models/
|
|
||||||
cp /path/to/mortdecai-v4.gguf models/
|
|
||||||
chmod +x setup.sh
|
chmod +x setup.sh
|
||||||
./setup.sh
|
./setup.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The setup script:
|
||||||
|
1. Generates an API key
|
||||||
|
2. Starts Ollama + gateway in Docker
|
||||||
|
3. Downloads the model (~5.3 GB)
|
||||||
|
4. Loads it into Ollama
|
||||||
|
5. Runs a test inference
|
||||||
|
6. Prints connection details
|
||||||
|
|
||||||
Dashboard: http://localhost:8434/dashboard
|
Dashboard: http://localhost:8434/dashboard
|
||||||
|
|
||||||
## What It Does
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
Your GPU → Ollama → Gateway (auth + metering) → Port 8434 → Internet
|
Internet → Port 8434 → Gateway (auth + metering + ledger) → Ollama → GPU
|
||||||
```
|
```
|
||||||
|
|
||||||
The gateway sits in front of Ollama and:
|
The gateway is the only exposed port. It proxies authenticated requests to Ollama and tracks every transaction in a tamper-proof ledger.
|
||||||
- Authenticates requests via API key
|
|
||||||
- Tracks inference time, tokens, energy usage
|
|
||||||
- Estimates electricity cost (GPU TDP × time × rate)
|
|
||||||
- Enforces a spending cap
|
|
||||||
- Provides a dashboard with live stats
|
|
||||||
|
|
||||||
## Configuration
|
## Cost Model
|
||||||
|
|
||||||
Edit `.env`:
|
The gateway estimates electricity cost based on **marginal power** — only the extra watts your GPU draws during inference above its idle power.
|
||||||
|
|
||||||
```
|
```
|
||||||
API_KEY=mk_your_secret_key
|
Marginal cost = (GPU load - GPU idle + System load - System idle) × time × $/kWh
|
||||||
GPU_TDP_WATTS=54 # Your GPU's TDP
|
```
|
||||||
SYSTEM_OVERHEAD_WATTS=30 # CPU/RAM draw during inference
|
|
||||||
ELECTRICITY_RATE=0.15 # $/kWh
|
### Configuration
|
||||||
SPENDING_CAP=10.00 # $ before gateway stops accepting
|
|
||||||
|
All parameters in `.env` or adjustable live via `POST /config`:
|
||||||
|
|
||||||
|
| Parameter | Default | Description |
|
||||||
|
|-----------|---------|-------------|
|
||||||
|
| `GPU_IDLE_WATTS` | 15 | GPU power at idle |
|
||||||
|
| `GPU_LOAD_WATTS` | 54 | GPU power during inference |
|
||||||
|
| `SYSTEM_IDLE_WATTS` | 45 | System power at idle (CPU/RAM/fans) |
|
||||||
|
| `SYSTEM_INFERENCE_WATTS` | 65 | System power during inference |
|
||||||
|
| `ELECTRICITY_RATE` | 0.15 | $/kWh |
|
||||||
|
| `BILLING_MODE` | marginal | `marginal` (extra watts only) or `dedicated` (all uptime) |
|
||||||
|
| `BASE_RATE_PER_HOUR` | 0.00 | Hourly rate in dedicated mode |
|
||||||
|
| `SPENDING_CAP` | 10.00 | $ before gateway stops accepting requests |
|
||||||
|
| `LABOR_RATE_PER_HOUR` | 0.00 | $/hr for operator time (setup/maintenance) |
|
||||||
|
| `PROFIT_MARGIN` | 0.00 | Markup multiplier (0.10 = 10%) |
|
||||||
|
|
||||||
|
### Billing Modes
|
||||||
|
|
||||||
|
**Marginal** (default): Only charges for the extra power above idle. If the machine is on anyway (gaming, general use), you only pay for what inference adds.
|
||||||
|
|
||||||
|
**Dedicated**: Charges for full system power during uptime plus a base hourly rate. Use if the machine is kept on specifically for inference.
|
||||||
|
|
||||||
|
## Dual Ledger
|
||||||
|
|
||||||
|
Every transaction is recorded in a tamper-proof ledger on **both sides** — the gateway operator's machine AND the client's server.
|
||||||
|
|
||||||
|
### How it works
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Client sends inference request to gateway
|
||||||
|
2. Gateway processes request via Ollama
|
||||||
|
3. Gateway records transaction in local ledger.jsonl
|
||||||
|
4. Gateway POSTs transaction to client's callback URL
|
||||||
|
5. Client's ledger_receiver.py saves independent copy
|
||||||
|
6. Both copies include a SHA-256 hash of (id + tokens + cost + shared_secret)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tamper protection
|
||||||
|
|
||||||
|
| Scenario | Detection |
|
||||||
|
|----------|-----------|
|
||||||
|
| Gateway resets stats | Client's ledger has full history |
|
||||||
|
| Client denies requests happened | Gateway's ledger has full history |
|
||||||
|
| Either side edits a transaction | Hash verification fails on `/reconcile` |
|
||||||
|
| Shared secret mismatch | All hashes show as invalid |
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
|
||||||
|
Both sides configure the same `LEDGER_SECRET` in their `.env`:
|
||||||
|
|
||||||
|
**Gateway (.env):**
|
||||||
|
```
|
||||||
|
LEDGER_SECRET=agreed_upon_secret_here
|
||||||
|
CALLBACK_URL=http://client_ip:8435/transaction
|
||||||
|
```
|
||||||
|
|
||||||
|
**Client (ledger_receiver.py):**
|
||||||
|
```
|
||||||
|
LEDGER_SECRET=agreed_upon_secret_here
|
||||||
|
python3 ledger_receiver.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Reconciliation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On the gateway — verify all hashes, compare ledger vs stats
|
||||||
|
curl -s http://localhost:8434/reconcile -H "Authorization: Bearer $KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ledger_entries": 142,
|
||||||
|
"ledger_total_cost": 0.003421,
|
||||||
|
"stats_total_cost": 0.003421,
|
||||||
|
"discrepancy": 0.0,
|
||||||
|
"hash_verification": {
|
||||||
|
"total": 142,
|
||||||
|
"valid": 142,
|
||||||
|
"invalid": 0
|
||||||
|
},
|
||||||
|
"status": "OK"
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Endpoints
|
## Endpoints
|
||||||
|
|
||||||
| Endpoint | Auth | Description |
|
### Public (no auth)
|
||||||
|----------|------|-------------|
|
|
||||||
| `GET /health` | No | Ollama status + loaded models |
|
| Endpoint | Description |
|
||||||
| `GET /dashboard` | No | Web dashboard with live stats |
|
|----------|-------------|
|
||||||
| `GET /stats` | Yes | JSON usage stats |
|
| `GET /health` | Ollama status + loaded models |
|
||||||
| `POST /api/chat` | Yes | Proxied to Ollama |
|
| `GET /dashboard` | Web dashboard with live stats |
|
||||||
| `POST /api/generate` | Yes | Proxied to Ollama |
|
|
||||||
| `*` | Yes | Everything else proxied to Ollama |
|
### Authenticated
|
||||||
|
|
||||||
|
| Endpoint | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `POST /api/chat` | Proxied to Ollama (inference) |
|
||||||
|
| `POST /api/generate` | Proxied to Ollama (inference) |
|
||||||
|
| `GET /stats` | Full usage stats + cost config |
|
||||||
|
| `GET /config` | View cost configuration |
|
||||||
|
| `POST /config` | Update cost parameters live |
|
||||||
|
| `GET /ledger` | View recent transactions + total cost |
|
||||||
|
| `GET /reconcile` | Verify ledger integrity |
|
||||||
|
|
||||||
|
### Admin
|
||||||
|
|
||||||
|
| Endpoint | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `POST /admin/update-model` | Download + load new GGUF (requires `ALLOW_MODEL_UPDATES=true`) |
|
||||||
|
|
||||||
|
## Model Updates
|
||||||
|
|
||||||
|
**Remote push** (opt-in): Set `ALLOW_MODEL_UPDATES=true` in `.env`. The client can push new model versions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://gateway:8434/admin/update-model \
|
||||||
|
-H "Authorization: Bearer $KEY" \
|
||||||
|
-d '{"url": "https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf", "name": "mortdecai:0.5.0"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Manual update**: Run the update script:
|
||||||
|
```bash
|
||||||
|
./update-model.sh https://mortdec.ai/dl/v5/mortdecai-0.5.0.gguf mortdecai:0.5.0
|
||||||
|
```
|
||||||
|
|
||||||
## Response Metadata
|
## Response Metadata
|
||||||
|
|
||||||
Every proxied response includes a `_gateway` field:
|
Every proxied response includes gateway metadata:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"message": { "role": "assistant", "content": "..." },
|
"message": {"role": "assistant", "content": "..."},
|
||||||
"_gateway": {
|
"_gateway": {
|
||||||
"duration_seconds": 3.42,
|
"duration_seconds": 3.42,
|
||||||
"energy_wh": 0.0798,
|
"marginal_watts": 59,
|
||||||
"estimated_cost": 0.000012,
|
"energy_wh": 0.0561,
|
||||||
|
"estimated_cost": 0.000008,
|
||||||
"total_cost": 0.0342,
|
"total_cost": 0.0342,
|
||||||
"budget_remaining": 9.9658
|
"budget_remaining": 9.9658,
|
||||||
|
"billing_mode": "marginal"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## AMD ROCm
|
## Dashboard
|
||||||
|
|
||||||
The Docker compose uses `ollama/ollama:rocm` by default. Requires ROCm drivers on the host. For Strix Halo, ensure BIOS is set to reserved VRAM mode.
|
The dashboard shows live:
|
||||||
|
- Request count, tokens, inference time
|
||||||
|
- Cost progress bar (spent vs cap)
|
||||||
|
- Average cost per request, estimated remaining requests
|
||||||
|
- Power model breakdown (idle→load for GPU and system)
|
||||||
|
- Labor hours and cost
|
||||||
|
- GPU utilization, temperature, power draw
|
||||||
|
|
||||||
## NVIDIA
|
Auto-refreshes every 10 seconds.
|
||||||
|
|
||||||
Edit `docker-compose.yml`: uncomment the `deploy` section and comment out the `devices` section.
|
## GPU Support
|
||||||
|
|
||||||
|
**AMD ROCm** (default): Docker compose uses `ollama/ollama:rocm`. Requires ROCm drivers on host. For Strix Halo, set BIOS to reserved VRAM mode.
|
||||||
|
|
||||||
|
**NVIDIA**: Edit `docker-compose.yml` — uncomment the `deploy` section, comment out the `devices` section.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `gateway.py` | Main proxy server |
|
||||||
|
| `ledger_receiver.py` | Client-side transaction receiver |
|
||||||
|
| `docker-compose.yml` | Ollama + gateway containers |
|
||||||
|
| `Dockerfile` | Gateway container build |
|
||||||
|
| `setup.sh` | Automated first-time setup |
|
||||||
|
| `update-model.sh` | Manual model update |
|
||||||
|
| `.env.example` | Configuration template |
|
||||||
|
|||||||
@@ -28,6 +28,7 @@ services:
|
|||||||
- /dev/dri:/dev/dri
|
- /dev/dri:/dev/dri
|
||||||
environment:
|
environment:
|
||||||
- OLLAMA_HOST=0.0.0.0:11434
|
- OLLAMA_HOST=0.0.0.0:11434
|
||||||
|
- HSA_OVERRIDE_GFX_VERSION=${HSA_OVERRIDE_GFX_VERSION:-11.0.0}
|
||||||
# For NVIDIA, replace 'devices' above with:
|
# For NVIDIA, replace 'devices' above with:
|
||||||
# deploy:
|
# deploy:
|
||||||
# resources:
|
# resources:
|
||||||
|
|||||||
+332
-44
@@ -19,6 +19,8 @@ import os
|
|||||||
import time
|
import time
|
||||||
import threading
|
import threading
|
||||||
import subprocess
|
import subprocess
|
||||||
|
import hashlib
|
||||||
|
import uuid
|
||||||
from http.server import HTTPServer, BaseHTTPRequestHandler
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||||
from urllib.parse import urlparse, parse_qs
|
from urllib.parse import urlparse, parse_qs
|
||||||
import requests
|
import requests
|
||||||
@@ -27,11 +29,139 @@ import requests
|
|||||||
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434")
|
||||||
LISTEN_PORT = int(os.environ.get("GATEWAY_PORT", "8434"))
|
LISTEN_PORT = int(os.environ.get("GATEWAY_PORT", "8434"))
|
||||||
API_KEY = os.environ.get("API_KEY", "mk_mortdecai_default")
|
API_KEY = os.environ.get("API_KEY", "mk_mortdecai_default")
|
||||||
ELECTRICITY_RATE = float(os.environ.get("ELECTRICITY_RATE", "0.15")) # $/kWh
|
|
||||||
GPU_TDP_WATTS = float(os.environ.get("GPU_TDP_WATTS", "54")) # Strix Halo iGPU
|
|
||||||
SYSTEM_OVERHEAD_WATTS = float(os.environ.get("SYSTEM_OVERHEAD_WATTS", "30")) # CPU/RAM/etc idle draw during inference
|
|
||||||
SPENDING_CAP = float(os.environ.get("SPENDING_CAP", "10.00")) # $ before refusing requests
|
|
||||||
STATS_FILE = os.environ.get("STATS_FILE", "/var/lib/mortdecai-gateway/stats.json")
|
STATS_FILE = os.environ.get("STATS_FILE", "/var/lib/mortdecai-gateway/stats.json")
|
||||||
|
CONFIG_FILE = os.environ.get("CONFIG_FILE", "/var/lib/mortdecai-gateway/cost_config.json")
|
||||||
|
|
||||||
|
# Default cost config (overridden by config file or env vars)
|
||||||
|
_DEFAULT_COST_CONFIG = {
|
||||||
|
"electricity_rate": 0.15, # $/kWh
|
||||||
|
"gpu_idle_watts": 15, # GPU at idle
|
||||||
|
"gpu_load_watts": 54, # GPU during inference
|
||||||
|
"system_idle_watts": 45, # Whole system idle (CPU/RAM/fans/PSU)
|
||||||
|
"system_inference_watts": 65, # Whole system during inference
|
||||||
|
"billing_mode": "marginal", # "marginal" = only extra watts; "dedicated" = all uptime
|
||||||
|
"base_rate_per_hour": 0.00, # $/hr for keeping machine on (dedicated mode only)
|
||||||
|
"spending_cap": 10.00, # $ before refusing requests
|
||||||
|
"labor_rate_per_hour": 0.00, # $/hr for operator's time (setup, maintenance)
|
||||||
|
"profit_margin": 0.00, # multiplier (0.10 = 10% markup)
|
||||||
|
"labor_hours_logged": 0.0, # total hours spent on setup/maintenance
|
||||||
|
}
|
||||||
|
|
||||||
|
def _load_cost_config():
|
||||||
|
config = dict(_DEFAULT_COST_CONFIG)
|
||||||
|
# Override from file
|
||||||
|
try:
|
||||||
|
with open(CONFIG_FILE) as f:
|
||||||
|
config.update(json.load(f))
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
# Override from env vars
|
||||||
|
for key in _DEFAULT_COST_CONFIG:
|
||||||
|
env_key = key.upper()
|
||||||
|
val = os.environ.get(env_key)
|
||||||
|
if val is not None:
|
||||||
|
try:
|
||||||
|
config[key] = type(_DEFAULT_COST_CONFIG[key])(val)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
return config
|
||||||
|
|
||||||
|
def _save_cost_config(config):
|
||||||
|
try:
|
||||||
|
os.makedirs(os.path.dirname(CONFIG_FILE), exist_ok=True)
|
||||||
|
with open(CONFIG_FILE, "w") as f:
|
||||||
|
json.dump(config, f, indent=2)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
COST_CONFIG = _load_cost_config()
|
||||||
|
|
||||||
|
# --- Dual Ledger ---
|
||||||
|
LEDGER_FILE = os.environ.get("LEDGER_FILE", "/var/lib/mortdecai-gateway/ledger.jsonl")
|
||||||
|
LEDGER_SECRET = os.environ.get("LEDGER_SECRET", "change_me_shared_secret")
|
||||||
|
CALLBACK_URL = os.environ.get("CALLBACK_URL", "") # Seth's server endpoint for transaction logging
|
||||||
|
_ledger_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
def _ledger_hash(entry):
|
||||||
|
"""Create a verification hash from transaction data + shared secret."""
|
||||||
|
raw = f"{entry['id']}|{entry['tokens_in']}|{entry['tokens_out']}|{entry['duration']}|{entry['cost']}|{LEDGER_SECRET}"
|
||||||
|
return hashlib.sha256(raw.encode()).hexdigest()[:16]
|
||||||
|
|
||||||
|
|
||||||
|
def _ledger_write(entry):
|
||||||
|
"""Append a transaction to the local ledger."""
|
||||||
|
with _ledger_lock:
|
||||||
|
try:
|
||||||
|
os.makedirs(os.path.dirname(LEDGER_FILE), exist_ok=True)
|
||||||
|
with open(LEDGER_FILE, "a") as f:
|
||||||
|
f.write(json.dumps(entry) + "\n")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Ledger write failed: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def _ledger_callback(entry):
|
||||||
|
"""Send transaction to the client's server for cross-verification."""
|
||||||
|
if not CALLBACK_URL:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
requests.post(
|
||||||
|
CALLBACK_URL,
|
||||||
|
json=entry,
|
||||||
|
headers={"Content-Type": "application/json"},
|
||||||
|
timeout=5,
|
||||||
|
)
|
||||||
|
except:
|
||||||
|
pass # Non-blocking — don't fail inference because callback is down
|
||||||
|
|
||||||
|
|
||||||
|
def _ledger_record(tokens_in, tokens_out, duration, cost, energy_wh, model):
|
||||||
|
"""Record a transaction in the ledger and notify the client."""
|
||||||
|
entry = {
|
||||||
|
"id": str(uuid.uuid4())[:12],
|
||||||
|
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ"),
|
||||||
|
"tokens_in": tokens_in,
|
||||||
|
"tokens_out": tokens_out,
|
||||||
|
"duration": round(duration, 3),
|
||||||
|
"cost": round(cost, 8),
|
||||||
|
"energy_wh": round(energy_wh, 4),
|
||||||
|
"model": model,
|
||||||
|
"billing_mode": COST_CONFIG["billing_mode"],
|
||||||
|
}
|
||||||
|
entry["hash"] = _ledger_hash(entry)
|
||||||
|
|
||||||
|
_ledger_write(entry)
|
||||||
|
|
||||||
|
# Send to client in background
|
||||||
|
threading.Thread(target=_ledger_callback, args=(entry,), daemon=True).start()
|
||||||
|
|
||||||
|
return entry
|
||||||
|
|
||||||
|
|
||||||
|
def _ledger_load():
|
||||||
|
"""Load all ledger entries."""
|
||||||
|
entries = []
|
||||||
|
try:
|
||||||
|
with open(LEDGER_FILE) as f:
|
||||||
|
for line in f:
|
||||||
|
if line.strip():
|
||||||
|
entries.append(json.loads(line))
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
def _ledger_verify(entries):
|
||||||
|
"""Verify all ledger entries against their hashes."""
|
||||||
|
results = {"total": len(entries), "valid": 0, "invalid": 0, "invalid_ids": []}
|
||||||
|
for entry in entries:
|
||||||
|
expected = _ledger_hash(entry)
|
||||||
|
if entry.get("hash") == expected:
|
||||||
|
results["valid"] += 1
|
||||||
|
else:
|
||||||
|
results["invalid"] += 1
|
||||||
|
results["invalid_ids"].append(entry.get("id", "?"))
|
||||||
|
return results
|
||||||
|
|
||||||
# --- Stats tracking ---
|
# --- Stats tracking ---
|
||||||
_stats_lock = threading.Lock()
|
_stats_lock = threading.Lock()
|
||||||
@@ -67,25 +197,52 @@ def _save_stats():
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
def _track_request(tokens_in, tokens_out, duration_seconds):
|
def _calc_marginal_cost(duration_seconds):
|
||||||
"""Track a completed inference request."""
|
"""Calculate marginal electricity cost for an inference call."""
|
||||||
|
c = COST_CONFIG
|
||||||
|
if c["billing_mode"] == "marginal":
|
||||||
|
# Only charge for extra watts above idle
|
||||||
|
marginal_gpu = c["gpu_load_watts"] - c["gpu_idle_watts"]
|
||||||
|
marginal_system = c["system_inference_watts"] - c["system_idle_watts"]
|
||||||
|
marginal_watts = marginal_gpu + marginal_system
|
||||||
|
else:
|
||||||
|
# Dedicated: charge for full system draw during inference
|
||||||
|
marginal_watts = c["gpu_load_watts"] + c["system_inference_watts"]
|
||||||
|
|
||||||
|
energy_wh = (marginal_watts * duration_seconds) / 3600
|
||||||
|
electricity_cost = (energy_wh / 1000) * c["electricity_rate"]
|
||||||
|
# Apply profit margin
|
||||||
|
cost = electricity_cost * (1 + c["profit_margin"])
|
||||||
|
return marginal_watts, energy_wh, cost
|
||||||
|
|
||||||
|
|
||||||
|
def _track_request(tokens_in, tokens_out, duration_seconds, model="mortdecai:0.4.0"):
|
||||||
|
"""Track a completed inference request and record in ledger."""
|
||||||
|
marginal_watts, energy_wh, cost = _calc_marginal_cost(duration_seconds)
|
||||||
|
|
||||||
|
# Record in dual ledger
|
||||||
|
_ledger_record(tokens_in, tokens_out, duration_seconds, cost, energy_wh, model)
|
||||||
|
|
||||||
with _stats_lock:
|
with _stats_lock:
|
||||||
_stats["total_requests"] += 1
|
_stats["total_requests"] += 1
|
||||||
_stats["total_tokens_in"] += tokens_in
|
_stats["total_tokens_in"] += tokens_in
|
||||||
_stats["total_tokens_out"] += tokens_out
|
_stats["total_tokens_out"] += tokens_out
|
||||||
_stats["total_inference_seconds"] += duration_seconds
|
_stats["total_inference_seconds"] += duration_seconds
|
||||||
_stats["last_request_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ")
|
_stats["last_request_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
# Power calculation
|
|
||||||
# GPU draws TDP watts during inference, plus system overhead
|
|
||||||
total_watts = GPU_TDP_WATTS + SYSTEM_OVERHEAD_WATTS
|
|
||||||
energy_wh = (total_watts * duration_seconds) / 3600
|
|
||||||
cost = (energy_wh / 1000) * ELECTRICITY_RATE
|
|
||||||
|
|
||||||
_stats["total_energy_wh"] += energy_wh
|
_stats["total_energy_wh"] += energy_wh
|
||||||
_stats["total_cost"] += cost
|
_stats["total_cost"] += cost
|
||||||
|
_stats["total_marginal_watts_avg"] = (
|
||||||
|
_stats.get("total_marginal_watts_avg", marginal_watts) * 0.95 + marginal_watts * 0.05
|
||||||
|
)
|
||||||
|
|
||||||
|
# Base rate for dedicated mode
|
||||||
|
if COST_CONFIG["billing_mode"] == "dedicated" and COST_CONFIG["base_rate_per_hour"] > 0:
|
||||||
|
# Add base rate proportional to time since last request
|
||||||
|
last = _stats.get("_last_base_calc", time.time())
|
||||||
|
elapsed_hours = (time.time() - last) / 3600
|
||||||
|
_stats["total_cost"] += COST_CONFIG["base_rate_per_hour"] * elapsed_hours
|
||||||
|
_stats["_last_base_calc"] = time.time()
|
||||||
|
|
||||||
# Save every 10 requests
|
|
||||||
if _stats["total_requests"] % 10 == 0:
|
if _stats["total_requests"] % 10 == 0:
|
||||||
_save_stats()
|
_save_stats()
|
||||||
|
|
||||||
@@ -93,7 +250,7 @@ def _track_request(tokens_in, tokens_out, duration_seconds):
|
|||||||
def _check_budget():
|
def _check_budget():
|
||||||
"""Returns True if under spending cap."""
|
"""Returns True if under spending cap."""
|
||||||
with _stats_lock:
|
with _stats_lock:
|
||||||
return _stats["total_cost"] < SPENDING_CAP
|
return _stats["total_cost"] < COST_CONFIG["spending_cap"]
|
||||||
|
|
||||||
|
|
||||||
def _get_gpu_utilization():
|
def _get_gpu_utilization():
|
||||||
@@ -185,17 +342,21 @@ class GatewayHandler(BaseHTTPRequestHandler):
|
|||||||
# Track token usage from response
|
# Track token usage from response
|
||||||
tokens_in = data.get("prompt_eval_count", 0)
|
tokens_in = data.get("prompt_eval_count", 0)
|
||||||
tokens_out = data.get("eval_count", 0)
|
tokens_out = data.get("eval_count", 0)
|
||||||
|
model_name = (body or {}).get("model", "unknown")
|
||||||
if tokens_in or tokens_out:
|
if tokens_in or tokens_out:
|
||||||
_track_request(tokens_in, tokens_out, duration)
|
_track_request(tokens_in, tokens_out, duration, model_name)
|
||||||
|
|
||||||
# Add gateway metadata to response
|
# Add gateway metadata to response
|
||||||
if isinstance(data, dict):
|
if isinstance(data, dict):
|
||||||
|
mw, ewh, ecost = _calc_marginal_cost(duration)
|
||||||
data["_gateway"] = {
|
data["_gateway"] = {
|
||||||
"duration_seconds": round(duration, 2),
|
"duration_seconds": round(duration, 2),
|
||||||
"energy_wh": round((GPU_TDP_WATTS + SYSTEM_OVERHEAD_WATTS) * duration / 3600, 4),
|
"marginal_watts": round(mw, 1),
|
||||||
"estimated_cost": round(((GPU_TDP_WATTS + SYSTEM_OVERHEAD_WATTS) * duration / 3600 / 1000) * ELECTRICITY_RATE, 6),
|
"energy_wh": round(ewh, 4),
|
||||||
|
"estimated_cost": round(ecost, 6),
|
||||||
"total_cost": round(_stats["total_cost"], 4),
|
"total_cost": round(_stats["total_cost"], 4),
|
||||||
"budget_remaining": round(SPENDING_CAP - _stats["total_cost"], 4),
|
"budget_remaining": round(COST_CONFIG["spending_cap"] - _stats["total_cost"], 4),
|
||||||
|
"billing_mode": COST_CONFIG["billing_mode"],
|
||||||
}
|
}
|
||||||
|
|
||||||
self._send_json(r.status_code, data)
|
self._send_json(r.status_code, data)
|
||||||
@@ -225,17 +386,46 @@ class GatewayHandler(BaseHTTPRequestHandler):
|
|||||||
return
|
return
|
||||||
gpu = _get_gpu_utilization()
|
gpu = _get_gpu_utilization()
|
||||||
with _stats_lock:
|
with _stats_lock:
|
||||||
stats_copy = dict(_stats)
|
stats_copy = {k: v for k, v in _stats.items() if not k.startswith("_")}
|
||||||
stats_copy["gpu"] = gpu
|
stats_copy["gpu"] = gpu
|
||||||
stats_copy["config"] = {
|
stats_copy["cost_config"] = COST_CONFIG
|
||||||
"gpu_tdp_watts": GPU_TDP_WATTS,
|
|
||||||
"system_overhead_watts": SYSTEM_OVERHEAD_WATTS,
|
|
||||||
"electricity_rate": ELECTRICITY_RATE,
|
|
||||||
"spending_cap": SPENDING_CAP,
|
|
||||||
}
|
|
||||||
self._send_json(200, stats_copy)
|
self._send_json(200, stats_copy)
|
||||||
return
|
return
|
||||||
|
|
||||||
|
if parsed.path == "/config":
|
||||||
|
if not self._check_auth():
|
||||||
|
return
|
||||||
|
self._send_json(200, COST_CONFIG)
|
||||||
|
return
|
||||||
|
|
||||||
|
if parsed.path == "/ledger":
|
||||||
|
if not self._check_auth():
|
||||||
|
return
|
||||||
|
entries = _ledger_load()
|
||||||
|
total_cost = sum(e.get("cost", 0) for e in entries)
|
||||||
|
self._send_json(200, {
|
||||||
|
"entries": len(entries),
|
||||||
|
"total_cost": round(total_cost, 6),
|
||||||
|
"last_10": entries[-10:],
|
||||||
|
})
|
||||||
|
return
|
||||||
|
|
||||||
|
if parsed.path == "/reconcile":
|
||||||
|
if not self._check_auth():
|
||||||
|
return
|
||||||
|
entries = _ledger_load()
|
||||||
|
verification = _ledger_verify(entries)
|
||||||
|
total_cost = sum(e.get("cost", 0) for e in entries)
|
||||||
|
self._send_json(200, {
|
||||||
|
"ledger_entries": len(entries),
|
||||||
|
"ledger_total_cost": round(total_cost, 6),
|
||||||
|
"stats_total_cost": round(_stats.get("total_cost", 0), 6),
|
||||||
|
"discrepancy": round(abs(total_cost - _stats.get("total_cost", 0)), 6),
|
||||||
|
"hash_verification": verification,
|
||||||
|
"status": "OK" if verification["invalid"] == 0 else "TAMPERED",
|
||||||
|
})
|
||||||
|
return
|
||||||
|
|
||||||
if parsed.path == "/dashboard":
|
if parsed.path == "/dashboard":
|
||||||
self._serve_dashboard()
|
self._serve_dashboard()
|
||||||
return
|
return
|
||||||
@@ -252,36 +442,132 @@ class GatewayHandler(BaseHTTPRequestHandler):
|
|||||||
length = int(self.headers.get("Content-Length", 0))
|
length = int(self.headers.get("Content-Length", 0))
|
||||||
body = json.loads(self.rfile.read(length)) if length > 0 else None
|
body = json.loads(self.rfile.read(length)) if length > 0 else None
|
||||||
|
|
||||||
|
# Config update endpoint — adjust cost parameters live
|
||||||
|
if self.path == "/config" and body:
|
||||||
|
global COST_CONFIG
|
||||||
|
for key in body:
|
||||||
|
if key in COST_CONFIG:
|
||||||
|
COST_CONFIG[key] = type(_DEFAULT_COST_CONFIG.get(key, ""))(body[key])
|
||||||
|
_save_cost_config(COST_CONFIG)
|
||||||
|
self._send_json(200, {"status": "updated", "config": COST_CONFIG})
|
||||||
|
return
|
||||||
|
|
||||||
|
# Model update endpoint — downloads new GGUF and reloads
|
||||||
|
if self.path == "/admin/update-model" and body:
|
||||||
|
self._handle_model_update(body)
|
||||||
|
return
|
||||||
|
|
||||||
self._proxy_to_ollama(self.path, body)
|
self._proxy_to_ollama(self.path, body)
|
||||||
|
|
||||||
|
def _handle_model_update(self, body):
|
||||||
|
"""Download a new GGUF from a URL and reload the model.
|
||||||
|
Request: {"url": "https://mortdec.ai/dl/...", "name": "mortdecai:0.5.0"}
|
||||||
|
This is opt-in — the gateway operator must enable ALLOW_MODEL_UPDATES=true.
|
||||||
|
"""
|
||||||
|
if os.environ.get("ALLOW_MODEL_UPDATES", "false").lower() != "true":
|
||||||
|
self._send_json(403, {"error": "Model updates disabled. Set ALLOW_MODEL_UPDATES=true in .env to enable."})
|
||||||
|
return
|
||||||
|
|
||||||
|
url = body.get("url")
|
||||||
|
name = body.get("name", "mortdecai-latest")
|
||||||
|
if not url:
|
||||||
|
self._send_json(400, {"error": "url is required"})
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
# Download GGUF
|
||||||
|
gguf_path = f"/models/{name}.gguf"
|
||||||
|
print(f"Downloading model from {url}...")
|
||||||
|
r = requests.get(url, stream=True, timeout=600)
|
||||||
|
r.raise_for_status()
|
||||||
|
with open(f"models/{name}.gguf", "wb") as f:
|
||||||
|
for chunk in r.iter_content(chunk_size=8192):
|
||||||
|
f.write(chunk)
|
||||||
|
|
||||||
|
# Create Modelfile and load
|
||||||
|
subprocess.run(
|
||||||
|
["docker", "exec", "mortdecai-ollama", "ollama", "create", name, "-f", f"/models/Modelfile"],
|
||||||
|
timeout=120, check=True
|
||||||
|
)
|
||||||
|
|
||||||
|
self._send_json(200, {"status": "ok", "model": name, "message": "Model updated and loaded"})
|
||||||
|
except Exception as e:
|
||||||
|
self._send_json(500, {"error": f"Update failed: {e}"})
|
||||||
|
|
||||||
def _serve_dashboard(self):
|
def _serve_dashboard(self):
|
||||||
"""Simple HTML dashboard showing usage stats."""
|
"""Simple HTML dashboard showing usage stats."""
|
||||||
with _stats_lock:
|
with _stats_lock:
|
||||||
s = dict(_stats)
|
s = {k: v for k, v in _stats.items() if not k.startswith("_")}
|
||||||
gpu = _get_gpu_utilization()
|
gpu = _get_gpu_utilization()
|
||||||
|
c = COST_CONFIG
|
||||||
|
marginal_w = (c["gpu_load_watts"] - c["gpu_idle_watts"]) + (c["system_inference_watts"] - c["system_idle_watts"])
|
||||||
|
active = _check_budget()
|
||||||
|
avg_cost_per_req = s["total_cost"] / max(s["total_requests"], 1)
|
||||||
|
reqs_remaining = int((c["spending_cap"] - s["total_cost"]) / max(avg_cost_per_req, 0.000001)) if avg_cost_per_req > 0 else "∞"
|
||||||
|
|
||||||
html = f"""<!DOCTYPE html>
|
html = f"""<!DOCTYPE html>
|
||||||
<html><head><title>Mortdecai Gateway</title>
|
<html><head><title>Mortdecai Gateway</title>
|
||||||
<meta http-equiv="refresh" content="10">
|
<meta http-equiv="refresh" content="10">
|
||||||
<style>
|
<style>
|
||||||
body {{ font-family: monospace; background: #1a1a1a; color: #e0e0e0; padding: 2rem; }}
|
body {{ font-family: monospace; background: #1a1a1a; color: #e0e0e0; padding: 2rem; max-width: 700px; margin: 0 auto; }}
|
||||||
h1 {{ color: #D35400; }}
|
h1 {{ color: #D35400; }}
|
||||||
.stat {{ background: #252525; border: 1px solid #333; padding: 1rem; margin: 0.5rem 0; border-radius: 6px; }}
|
h2 {{ color: #D35400; font-size: 1rem; margin-top: 1.5rem; border-bottom: 1px solid #333; padding-bottom: 0.3rem; }}
|
||||||
|
.stat {{ background: #252525; border: 1px solid #333; padding: 0.8rem 1rem; margin: 0.3rem 0; border-radius: 4px; display: flex; justify-content: space-between; }}
|
||||||
.label {{ color: #999; }}
|
.label {{ color: #999; }}
|
||||||
.value {{ color: #D35400; font-size: 1.2rem; font-weight: bold; }}
|
.value {{ color: #D35400; font-weight: bold; }}
|
||||||
|
.ok {{ color: #4caf50; }}
|
||||||
|
.warn {{ color: #ff9800; }}
|
||||||
|
.bad {{ color: #f44336; }}
|
||||||
|
.bar {{ background: #333; border-radius: 3px; height: 20px; margin: 0.5rem 0; }}
|
||||||
|
.bar-fill {{ background: #D35400; height: 100%; border-radius: 3px; transition: width 0.5s; }}
|
||||||
</style></head><body>
|
</style></head><body>
|
||||||
<h1>Mortdecai Gateway</h1>
|
<h1>Mortdecai Gateway</h1>
|
||||||
<div class="stat"><span class="label">Status:</span> <span class="value">{"ACTIVE" if _check_budget() else "PAUSED (cap reached)"}</span></div>
|
|
||||||
<div class="stat"><span class="label">Total Requests:</span> <span class="value">{s['total_requests']}</span></div>
|
<div class="stat"><span class="label">Status</span>
|
||||||
<div class="stat"><span class="label">Tokens (in/out):</span> <span class="value">{s['total_tokens_in']:,} / {s['total_tokens_out']:,}</span></div>
|
<span class="value {'ok' if active else 'bad'}">{'● ACTIVE' if active else '● PAUSED (cap reached)'}</span></div>
|
||||||
<div class="stat"><span class="label">Inference Time:</span> <span class="value">{s['total_inference_seconds']:.0f}s</span></div>
|
|
||||||
<div class="stat"><span class="label">Energy Used:</span> <span class="value">{s['total_energy_wh']:.1f} Wh</span></div>
|
<h2>Usage</h2>
|
||||||
<div class="stat"><span class="label">Estimated Cost:</span> <span class="value">${s['total_cost']:.4f} / ${SPENDING_CAP:.2f}</span></div>
|
<div class="stat"><span class="label">Requests</span><span class="value">{s['total_requests']:,}</span></div>
|
||||||
<div class="stat"><span class="label">Rejected (over cap):</span> <span class="value">{s['requests_rejected']}</span></div>
|
<div class="stat"><span class="label">Tokens (in / out)</span><span class="value">{s['total_tokens_in']:,} / {s['total_tokens_out']:,}</span></div>
|
||||||
<div class="stat"><span class="label">GPU Utilization:</span> <span class="value">{gpu['utilization']}% ({gpu['source']})</span></div>
|
<div class="stat"><span class="label">Inference Time</span><span class="value">{s['total_inference_seconds']:.0f}s ({s['total_inference_seconds']/3600:.1f}h)</span></div>
|
||||||
<div class="stat"><span class="label">GPU Temperature:</span> <span class="value">{gpu['temperature']}°C</span></div>
|
<div class="stat"><span class="label">Avg per Request</span><span class="value">{s['total_inference_seconds']/max(s['total_requests'],1):.1f}s, {s['total_tokens_out']//max(s['total_requests'],1)} tokens</span></div>
|
||||||
<div class="stat"><span class="label">Last Request:</span> <span class="value">{s['last_request_at'] or 'never'}</span></div>
|
<div class="stat"><span class="label">Rejected (cap)</span><span class="value">{s['requests_rejected']}</span></div>
|
||||||
<div class="stat"><span class="label">Config:</span> <span class="value">TDP={GPU_TDP_WATTS}W + {SYSTEM_OVERHEAD_WATTS}W overhead @ ${ELECTRICITY_RATE}/kWh</span></div>
|
<div class="stat"><span class="label">Last Request</span><span class="value">{s['last_request_at'] or 'never'}</span></div>
|
||||||
|
|
||||||
|
<h2>Cost</h2>
|
||||||
|
<div class="bar"><div class="bar-fill" style="width: {min(s['total_cost']/max(c['spending_cap'],0.01)*100, 100):.0f}%"></div></div>
|
||||||
|
<div class="stat"><span class="label">Spent</span><span class="value">${s['total_cost']:.4f}</span></div>
|
||||||
|
<div class="stat"><span class="label">Budget</span><span class="value">${c['spending_cap']:.2f}</span></div>
|
||||||
|
<div class="stat"><span class="label">Remaining</span><span class="value">${c['spending_cap'] - s['total_cost']:.4f} (~{reqs_remaining} requests)</span></div>
|
||||||
|
<div class="stat"><span class="label">Avg Cost/Request</span><span class="value">${avg_cost_per_req:.6f}</span></div>
|
||||||
|
<div class="stat"><span class="label">Energy Used</span><span class="value">{s['total_energy_wh']:.1f} Wh ({s['total_energy_wh']/1000:.4f} kWh)</span></div>
|
||||||
|
|
||||||
|
<h2>Labor & Profit</h2>
|
||||||
|
<div class="stat"><span class="label">Labor Rate</span><span class="value">${c['labor_rate_per_hour']:.2f}/hr</span></div>
|
||||||
|
<div class="stat"><span class="label">Hours Logged</span><span class="value">{c['labor_hours_logged']:.1f}h</span></div>
|
||||||
|
<div class="stat"><span class="label">Labor Cost</span><span class="value">${c['labor_rate_per_hour'] * c['labor_hours_logged']:.2f}</span></div>
|
||||||
|
<div class="stat"><span class="label">Profit Margin</span><span class="value">{c['profit_margin']*100:.0f}%</span></div>
|
||||||
|
<div class="stat"><span class="label">Total Owed (electricity + labor + margin)</span><span class="value">${s['total_cost'] + c['labor_rate_per_hour'] * c['labor_hours_logged']:.4f}</span></div>
|
||||||
|
|
||||||
|
<h2>Power Model</h2>
|
||||||
|
<div class="stat"><span class="label">Billing Mode</span><span class="value">{c['billing_mode']}</span></div>
|
||||||
|
<div class="stat"><span class="label">GPU (idle → load)</span><span class="value">{c['gpu_idle_watts']}W → {c['gpu_load_watts']}W</span></div>
|
||||||
|
<div class="stat"><span class="label">System (idle → load)</span><span class="value">{c['system_idle_watts']}W → {c['system_inference_watts']}W</span></div>
|
||||||
|
<div class="stat"><span class="label">Marginal Draw</span><span class="value">{marginal_w}W per inference call</span></div>
|
||||||
|
<div class="stat"><span class="label">Electricity Rate</span><span class="value">${c['electricity_rate']}/kWh</span></div>
|
||||||
|
{'<div class="stat"><span class="label">Base Rate</span><span class="value">$' + f"{c['base_rate_per_hour']:.3f}" + '/hr</span></div>' if c['billing_mode'] == 'dedicated' else ''}
|
||||||
|
|
||||||
|
<h2>GPU</h2>
|
||||||
|
<div class="stat"><span class="label">Utilization</span><span class="value">{gpu['utilization']}%</span></div>
|
||||||
|
<div class="stat"><span class="label">Temperature</span><span class="value {'warn' if gpu['temperature'] > 75 else 'ok'}">{gpu['temperature']}°C</span></div>
|
||||||
|
<div class="stat"><span class="label">Power Draw</span><span class="value">{gpu['power_watts']}W</span></div>
|
||||||
|
<div class="stat"><span class="label">Source</span><span class="value">{gpu['source']}</span></div>
|
||||||
|
|
||||||
|
<p style="color:#555; font-size:0.8rem; margin-top:2rem;">
|
||||||
|
Config: GET /config | Update: POST /config | Stats: GET /stats (auth required)
|
||||||
|
</p>
|
||||||
</body></html>"""
|
</body></html>"""
|
||||||
|
|
||||||
self.send_response(200)
|
self.send_response(200)
|
||||||
@@ -293,12 +579,14 @@ h1 {{ color: #D35400; }}
|
|||||||
def main():
|
def main():
|
||||||
_load_stats()
|
_load_stats()
|
||||||
|
|
||||||
|
c = COST_CONFIG
|
||||||
print(f"Mortdecai Gateway starting")
|
print(f"Mortdecai Gateway starting")
|
||||||
print(f" Ollama: {OLLAMA_URL}")
|
print(f" Ollama: {OLLAMA_URL}")
|
||||||
print(f" Listen: 0.0.0.0:{LISTEN_PORT}")
|
print(f" Listen: 0.0.0.0:{LISTEN_PORT}")
|
||||||
print(f" TDP: {GPU_TDP_WATTS}W + {SYSTEM_OVERHEAD_WATTS}W overhead")
|
print(f" GPU: {c['gpu_idle_watts']}W idle → {c['gpu_load_watts']}W load")
|
||||||
print(f" Rate: ${ELECTRICITY_RATE}/kWh")
|
print(f" System: {c['system_idle_watts']}W idle → {c['system_inference_watts']}W load")
|
||||||
print(f" Cap: ${SPENDING_CAP}")
|
print(f" Rate: ${c['electricity_rate']}/kWh | Mode: {c['billing_mode']}")
|
||||||
|
print(f" Cap: ${c['spending_cap']}")
|
||||||
print(f" Dashboard: http://localhost:{LISTEN_PORT}/dashboard")
|
print(f" Dashboard: http://localhost:{LISTEN_PORT}/dashboard")
|
||||||
|
|
||||||
# Save stats periodically
|
# Save stats periodically
|
||||||
|
|||||||
@@ -0,0 +1,147 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Ledger Receiver — runs on YOUR server to collect transaction records from remote gateways.
|
||||||
|
|
||||||
|
Each gateway POSTs transactions here. You keep an independent copy of every
|
||||||
|
transaction with hash verification. If the gateway operator resets their stats,
|
||||||
|
your ledger still has the full history.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 ledger_receiver.py
|
||||||
|
LEDGER_SECRET=shared_secret python3 ledger_receiver.py
|
||||||
|
|
||||||
|
Endpoints:
|
||||||
|
POST /transaction — receive a transaction from a gateway
|
||||||
|
GET /ledger — view all transactions
|
||||||
|
GET /reconcile/<host> — compare your ledger against a gateway's
|
||||||
|
GET /summary — total cost by gateway
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import hashlib
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
LISTEN_PORT = int(os.environ.get("RECEIVER_PORT", "8435"))
|
||||||
|
LEDGER_DIR = os.environ.get("LEDGER_DIR", "/var/lib/mortdecai-ledger")
|
||||||
|
LEDGER_SECRET = os.environ.get("LEDGER_SECRET", "change_me_shared_secret")
|
||||||
|
|
||||||
|
_lock = threading.Lock()
|
||||||
|
|
||||||
|
|
||||||
|
def _verify_hash(entry):
|
||||||
|
raw = f"{entry['id']}|{entry['tokens_in']}|{entry['tokens_out']}|{entry['duration']}|{entry['cost']}|{LEDGER_SECRET}"
|
||||||
|
expected = hashlib.sha256(raw.encode()).hexdigest()[:16]
|
||||||
|
return entry.get("hash") == expected
|
||||||
|
|
||||||
|
|
||||||
|
def _save_transaction(entry, source_ip):
|
||||||
|
"""Save a transaction to the per-gateway ledger file."""
|
||||||
|
entry["_received_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
entry["_source_ip"] = source_ip
|
||||||
|
entry["_hash_valid"] = _verify_hash(entry)
|
||||||
|
|
||||||
|
os.makedirs(LEDGER_DIR, exist_ok=True)
|
||||||
|
# One file per source IP
|
||||||
|
safe_ip = source_ip.replace(":", "_").replace(".", "_")
|
||||||
|
path = os.path.join(LEDGER_DIR, f"ledger_{safe_ip}.jsonl")
|
||||||
|
|
||||||
|
with _lock:
|
||||||
|
with open(path, "a") as f:
|
||||||
|
f.write(json.dumps(entry) + "\n")
|
||||||
|
|
||||||
|
|
||||||
|
def _load_all():
|
||||||
|
"""Load all ledger entries from all gateways."""
|
||||||
|
all_entries = {}
|
||||||
|
try:
|
||||||
|
for fname in os.listdir(LEDGER_DIR):
|
||||||
|
if fname.endswith(".jsonl"):
|
||||||
|
gateway = fname.replace("ledger_", "").replace(".jsonl", "")
|
||||||
|
entries = []
|
||||||
|
with open(os.path.join(LEDGER_DIR, fname)) as f:
|
||||||
|
for line in f:
|
||||||
|
if line.strip():
|
||||||
|
entries.append(json.loads(line))
|
||||||
|
all_entries[gateway] = entries
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
return all_entries
|
||||||
|
|
||||||
|
|
||||||
|
class ReceiverHandler(BaseHTTPRequestHandler):
|
||||||
|
def log_message(self, fmt, *args):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def _send_json(self, status, data):
|
||||||
|
body = json.dumps(data, indent=2).encode()
|
||||||
|
self.send_response(status)
|
||||||
|
self.send_header("Content-Type", "application/json")
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(body)
|
||||||
|
|
||||||
|
def do_POST(self):
|
||||||
|
if self.path == "/transaction":
|
||||||
|
length = int(self.headers.get("Content-Length", 0))
|
||||||
|
entry = json.loads(self.rfile.read(length))
|
||||||
|
source_ip = self.client_address[0]
|
||||||
|
|
||||||
|
valid = _verify_hash(entry)
|
||||||
|
_save_transaction(entry, source_ip)
|
||||||
|
|
||||||
|
self._send_json(200, {
|
||||||
|
"status": "recorded",
|
||||||
|
"id": entry.get("id"),
|
||||||
|
"hash_valid": valid,
|
||||||
|
})
|
||||||
|
return
|
||||||
|
|
||||||
|
self._send_json(404, {"error": "not found"})
|
||||||
|
|
||||||
|
def do_GET(self):
|
||||||
|
parsed = urlparse(self.path)
|
||||||
|
|
||||||
|
if parsed.path == "/summary":
|
||||||
|
all_data = _load_all()
|
||||||
|
summary = {}
|
||||||
|
for gateway, entries in all_data.items():
|
||||||
|
total_cost = sum(e.get("cost", 0) for e in entries)
|
||||||
|
total_tokens = sum(e.get("tokens_out", 0) for e in entries)
|
||||||
|
valid = sum(1 for e in entries if e.get("_hash_valid", False))
|
||||||
|
invalid = len(entries) - valid
|
||||||
|
summary[gateway] = {
|
||||||
|
"transactions": len(entries),
|
||||||
|
"total_cost": round(total_cost, 6),
|
||||||
|
"total_tokens_out": total_tokens,
|
||||||
|
"hashes_valid": valid,
|
||||||
|
"hashes_invalid": invalid,
|
||||||
|
}
|
||||||
|
self._send_json(200, summary)
|
||||||
|
return
|
||||||
|
|
||||||
|
if parsed.path == "/ledger":
|
||||||
|
all_data = _load_all()
|
||||||
|
flat = []
|
||||||
|
for entries in all_data.values():
|
||||||
|
flat.extend(entries)
|
||||||
|
flat.sort(key=lambda e: e.get("timestamp", ""))
|
||||||
|
|
||||||
|
total = sum(e.get("cost", 0) for e in flat)
|
||||||
|
self._send_json(200, {
|
||||||
|
"total_transactions": len(flat),
|
||||||
|
"total_cost": round(total, 6),
|
||||||
|
"last_20": flat[-20:],
|
||||||
|
})
|
||||||
|
return
|
||||||
|
|
||||||
|
self._send_json(404, {"error": "not found"})
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
os.makedirs(LEDGER_DIR, exist_ok=True)
|
||||||
|
print(f"Ledger Receiver on port {LISTEN_PORT}")
|
||||||
|
print(f" Ledger dir: {LEDGER_DIR}")
|
||||||
|
HTTPServer(("0.0.0.0", LISTEN_PORT), ReceiverHandler).serve_forever()
|
||||||
@@ -1,10 +1,15 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
# Quick setup for Mortdecai Gateway
|
# Mortdecai Gateway — fully automated setup
|
||||||
# Run this after cloning the repo
|
# Just run: ./setup.sh
|
||||||
|
# Everything downloads and configures automatically.
|
||||||
|
|
||||||
set -e
|
set -e
|
||||||
|
|
||||||
|
MODEL_URL="${MODEL_URL:-https://mortdec.ai/dl/m4gguf/mortdecai-0.4.0.gguf}"
|
||||||
|
MODEL_NAME="mortdecai:0.4.0"
|
||||||
|
|
||||||
echo "=== Mortdecai Gateway Setup ==="
|
echo "=== Mortdecai Gateway Setup ==="
|
||||||
|
echo ""
|
||||||
|
|
||||||
# Generate API key if not set
|
# Generate API key if not set
|
||||||
if [ ! -f .env ]; then
|
if [ ! -f .env ]; then
|
||||||
@@ -20,30 +25,52 @@ EOF
|
|||||||
echo "Saved to .env"
|
echo "Saved to .env"
|
||||||
else
|
else
|
||||||
echo ".env already exists"
|
echo ".env already exists"
|
||||||
|
KEY=$(grep API_KEY .env | cut -d= -f2)
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Start containers
|
# Start containers
|
||||||
|
echo ""
|
||||||
echo "Starting containers..."
|
echo "Starting containers..."
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
|
|
||||||
# Wait for Ollama to be ready
|
# Wait for Ollama to be ready
|
||||||
echo "Waiting for Ollama..."
|
echo "Waiting for Ollama to start..."
|
||||||
for i in $(seq 1 30); do
|
for i in $(seq 1 60); do
|
||||||
if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
|
if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
|
||||||
echo "Ollama is ready"
|
echo "Ollama is ready"
|
||||||
break
|
break
|
||||||
fi
|
fi
|
||||||
|
if [ $i -eq 60 ]; then
|
||||||
|
echo "ERROR: Ollama failed to start after 2 minutes"
|
||||||
|
echo "Check: docker logs mortdecai-ollama"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
sleep 2
|
sleep 2
|
||||||
done
|
done
|
||||||
|
|
||||||
# Load the model if GGUF exists
|
# Check if model already loaded
|
||||||
if ls models/*.gguf 1>/dev/null 2>&1; then
|
LOADED=$(curl -s http://localhost:11434/api/tags 2>/dev/null | python3 -c "import sys,json; print('yes' if any('$MODEL_NAME' in m['name'] for m in json.load(sys.stdin).get('models',[])) else 'no')" 2>/dev/null || echo "no")
|
||||||
GGUF=$(ls models/*.gguf | head -1)
|
|
||||||
MODEL_NAME=$(basename "$GGUF" .gguf | tr '[:upper:]' '[:lower:]')
|
|
||||||
echo "Loading model from $GGUF..."
|
|
||||||
|
|
||||||
cat > /tmp/Modelfile << MEOF
|
if [ "$LOADED" = "yes" ]; then
|
||||||
FROM /models/$(basename $GGUF)
|
echo "Model $MODEL_NAME already loaded"
|
||||||
|
else
|
||||||
|
# Download GGUF
|
||||||
|
mkdir -p models
|
||||||
|
GGUF_PATH="models/${MODEL_NAME}.gguf"
|
||||||
|
|
||||||
|
if [ ! -f "$GGUF_PATH" ]; then
|
||||||
|
echo ""
|
||||||
|
echo "Downloading model (~5.3 GB)..."
|
||||||
|
echo "Source: $MODEL_URL"
|
||||||
|
curl -L -o "$GGUF_PATH" "$MODEL_URL" --progress-bar
|
||||||
|
echo "Download complete"
|
||||||
|
else
|
||||||
|
echo "GGUF already downloaded"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create Modelfile
|
||||||
|
cat > models/Modelfile << 'MEOF'
|
||||||
|
FROM /models/mortdecai:0.4.0.gguf
|
||||||
TEMPLATE """{{- if .Messages }}
|
TEMPLATE """{{- if .Messages }}
|
||||||
{{- if or .System .Tools }}<|im_start|>system
|
{{- if or .System .Tools }}<|im_start|>system
|
||||||
{{- if .System }}
|
{{- if .System }}
|
||||||
@@ -51,11 +78,11 @@ TEMPLATE """{{- if .Messages }}
|
|||||||
{{- end }}
|
{{- end }}
|
||||||
<|im_end|>
|
<|im_end|>
|
||||||
{{ end }}
|
{{ end }}
|
||||||
{{- range \$m := .Messages }}
|
{{- range $m := .Messages }}
|
||||||
{{- if eq \$m.Role "user" }}<|im_start|>user
|
{{- if eq $m.Role "user" }}<|im_start|>user
|
||||||
{{ \$m.Content }}<|im_end|>
|
{{ $m.Content }}<|im_end|>
|
||||||
{{- else if eq \$m.Role "assistant" }}<|im_start|>assistant
|
{{- else if eq $m.Role "assistant" }}<|im_start|>assistant
|
||||||
{{ \$m.Content }}<|im_end|>
|
{{ $m.Content }}<|im_end|>
|
||||||
{{- end }}
|
{{- end }}
|
||||||
{{- end }}<|im_start|>assistant
|
{{- end }}<|im_start|>assistant
|
||||||
{{ end }}"""
|
{{ end }}"""
|
||||||
@@ -64,22 +91,40 @@ PARAMETER stop <|im_start|>
|
|||||||
PARAMETER temperature 0.7
|
PARAMETER temperature 0.7
|
||||||
MEOF
|
MEOF
|
||||||
|
|
||||||
docker exec mortdecai-ollama ollama create mortdecai-v4 -f /tmp/Modelfile
|
echo "Loading model into Ollama..."
|
||||||
echo "Model loaded as mortdecai-v4"
|
docker exec mortdecai-ollama ollama create "$MODEL_NAME" -f /models/Modelfile
|
||||||
|
echo "Model loaded as $MODEL_NAME"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Quick test
|
||||||
|
echo ""
|
||||||
|
echo "Running test inference..."
|
||||||
|
RESULT=$(curl -s http://localhost:8434/api/chat \
|
||||||
|
-H "Authorization: Bearer $KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d "{\"model\": \"$MODEL_NAME\", \"messages\": [{\"role\": \"user\", \"content\": \"say hello\"}], \"stream\": false}" 2>/dev/null)
|
||||||
|
|
||||||
|
if echo "$RESULT" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d['message']['content'][:80])" 2>/dev/null; then
|
||||||
|
echo "Test passed!"
|
||||||
else
|
else
|
||||||
echo "No GGUF found in models/ — place your GGUF file there and run:"
|
echo "Test inference returned unexpected result (model may still be loading)"
|
||||||
echo " docker exec mortdecai-ollama ollama create mortdecai-v4 -f Modelfile"
|
echo "Try again in a minute: curl -s http://localhost:8434/health"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
echo ""
|
echo ""
|
||||||
echo "=== Setup Complete ==="
|
echo "========================================="
|
||||||
echo "Dashboard: http://localhost:8434/dashboard"
|
echo " Mortdecai Gateway is ready!"
|
||||||
echo "API Key: $(grep API_KEY .env | cut -d= -f2)"
|
echo "========================================="
|
||||||
echo ""
|
echo ""
|
||||||
echo "Test: curl -s http://localhost:8434/health"
|
echo " Dashboard: http://localhost:8434/dashboard"
|
||||||
|
echo " Health: http://localhost:8434/health"
|
||||||
|
echo " API Key: $KEY"
|
||||||
echo ""
|
echo ""
|
||||||
echo "To use from remote:"
|
echo " Send this to Seth:"
|
||||||
echo " curl -X POST http://YOUR_IP:8434/api/chat \\"
|
echo " - Your public IP"
|
||||||
echo " -H 'Authorization: Bearer YOUR_API_KEY' \\"
|
echo " - Port: 8434"
|
||||||
echo " -H 'Content-Type: application/json' \\"
|
echo " - API Key: $KEY"
|
||||||
echo " -d '{\"model\": \"mortdecai-v4\", \"messages\": [{\"role\": \"user\", \"content\": \"test\"}]}'"
|
echo ""
|
||||||
|
echo " To stop: docker compose down"
|
||||||
|
echo " To start: docker compose up -d"
|
||||||
|
echo "========================================="
|
||||||
|
|||||||
Executable
+34
@@ -0,0 +1,34 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Update Mortdecai model to a new version
|
||||||
|
# Usage: ./update-model.sh [url] [name]
|
||||||
|
# Example: ./update-model.sh https://mortdec.ai/dl/m5gguf/mortdecai-0.5.0.gguf mortdecai:0.5.0
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
URL="${1:-https://mortdec.ai/dl/m4gguf/mortdecai-0.4.0.gguf}"
|
||||||
|
NAME="${2:-mortdecai:0.4.0}"
|
||||||
|
|
||||||
|
echo "=== Mortdecai Model Update ==="
|
||||||
|
echo " URL: $URL"
|
||||||
|
echo " Name: $NAME"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Download
|
||||||
|
echo "Downloading..."
|
||||||
|
mkdir -p models
|
||||||
|
curl -L -o "models/${NAME}.gguf" "$URL" --progress-bar
|
||||||
|
echo "Download complete"
|
||||||
|
|
||||||
|
# Load into Ollama
|
||||||
|
echo "Loading into Ollama..."
|
||||||
|
docker exec mortdecai-ollama ollama create "$NAME" -f /models/Modelfile
|
||||||
|
echo "Model loaded as $NAME"
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
echo ""
|
||||||
|
echo "Verifying..."
|
||||||
|
docker exec mortdecai-ollama ollama list | grep "$NAME"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=== Update complete ==="
|
||||||
|
echo "Model $NAME is ready"
|
||||||
Reference in New Issue
Block a user