0.6.0 training session: Oracle Bot, RL combat, Mind's Eye, multilingual pipeline

Major changes from this session:

Training:
- 0.6.0 training running: 9B on steel141 3090 Ti, 27B on rented H100 NVL
- 7,256 merged training examples (up from 3,183)
- New training data: failure modes (85), midloop messaging (27),
  prompt injection defense (29), personality (32), gold from quarantine
  bank (232), new tool examples (30), claude's own experience (10)
- All training data RCON-validated at 100% pass rate
- Bake-off: gemma3:27b 66%, qwen3.5:27b 61%, translategemma:27b 56%

Oracle Bot (Mind's Eye):
- Invisible spectator bot (mineflayer) streams world state via WebSocket
- HTML5 Canvas frontend at mind.mortdec.ai
- Real-time tool trace visualization with expandable entries
- Streaming model tokens during inference
- Gateway integration: fire-and-forget POST /trace on every tool call

Reinforcement Learning:
- Gymnasium environment wrapping mineflayer bot (minecraft_env.py)
- PPO training via Stable Baselines3 (10K param policy network)
- Behavioral cloning pretraining (97.5% accuracy on expert policy)
- Infinite training loop with auto-restart and checkpoint resume
- Bot learns combat, survival, navigation from raw experience

Bot Army:
- 8-soldier marching formation with autonomous combat
- Combat bots using mineflayer-pvp, pathfinder, armor-manager
- Multilingual prayer bots via translategemma:27b (18 languages)
- Frame-based AI architecture: LLM planner + reactive micro-scripts

Infrastructure:
- Fixed mattpc.sethpc.xyz billing gateway (API key + player list parser)
- Billing gateway now tracks all LAN traffic (LAN auto-auth)
- Gateway fallback for empty god-mode responses
- Updated mortdec.ai landing page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Seth
2026-03-22 20:22:50 -04:00
parent baab24f8b1
commit 5b28002001
44 changed files with 20873 additions and 4352 deletions
+211
View File
@@ -0,0 +1,211 @@
#!/bin/bash
# ──────────────────────────────────────────────────────────────────────────────
# Mortdecai Batch Training Pipeline
#
# Trains both 9B and 14B models sequentially on a rented GPU, then exports
# GGUFs at all required quant levels for the fleet.
#
# Usage (on rented H100):
# # 1. Upload this repo + dataset
# rsync -avz --exclude='.git' . gpu-host:~/mortdecai/
#
# # 2. SSH in and run
# cd ~/mortdecai
# bash training/scripts/batch_train.sh
#
# # 3. Monitor from another machine (pick one):
# ssh gpu-host "tail -f ~/mortdecai/training_progress.jsonl"
# # OR set DISCORD_WEBHOOK for push notifications:
# export DISCORD_WEBHOOK="https://discord.com/api/webhooks/..."
# bash training/scripts/batch_train.sh
#
# # 4. Download checkpoints when done
# rsync -avz gpu-host:~/mortdecai/training/checkpoints/mortdecai-0.6.0-* ./training/checkpoints/
#
# Prerequisites on the rented machine:
# pip install unsloth torch transformers datasets peft trl
# ──────────────────────────────────────────────────────────────────────────────
set -euo pipefail
VERSION="0.6.0"
DATASET="data/processed/merged_training_v06.jsonl"
CHECKPOINT_DIR="training/checkpoints"
PROGRESS_LOG="training_progress.jsonl"
# Discord bot token + channel for progress notifications
DISCORD_TOKEN="${DISCORD_TOKEN:-REDACTED_DISCORD_TOKEN_2}"
DISCORD_CHANNEL="${DISCORD_CHANNEL:-1485160229573361664}"
# ── Progress reporting ────────────────────────────────────────────────────────
notify() {
local stage="$1"
local message="$2"
local ts=$(date -u +%Y-%m-%dT%H:%M:%SZ)
# Log to file
echo "{\"ts\":\"$ts\",\"stage\":\"$stage\",\"message\":\"$message\"}" >> "$PROGRESS_LOG"
# Print locally
echo " [$ts] $stage: $message"
# Discord bot API
if [ -n "$DISCORD_TOKEN" ]; then
curl -s -X POST "https://discord.com/api/v10/channels/${DISCORD_CHANNEL}/messages" \
-H "Authorization: Bot ${DISCORD_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"content\":\"**Mortdecai Training** [${stage}] ${message}\"}" \
> /dev/null 2>&1 || true
fi
}
# Models to train
MODELS=(
"Qwen/Qwen3.5-9B"
"Qwen/Qwen3.5-14B"
)
# Quant levels per model (mapped to target GPUs)
# 9B: Q4=RTX4000(8GB), Q6=2080Ti(11GB), Q8=3090Ti(24GB)
# 14B: Q3=RTX4000(8GB), Q4=2080Ti(11GB), Q6=3090Ti(24GB), F16=StrixHalo(64GB)
declare -A QUANTS
QUANTS["Qwen/Qwen3.5-9B"]="Q3_K_M Q4_K_M Q6_K Q8_0"
QUANTS["Qwen/Qwen3.5-14B"]="Q3_K_M Q4_K_M Q6_K Q8_0"
# ── Preflight ─────────────────────────────────────────────────────────────────
if [ ! -f "$DATASET" ]; then
echo "ERROR: Dataset not found at $DATASET"
echo "Run: python3 training/scripts/merge_datasets.py"
exit 1
fi
EXAMPLE_COUNT=$(wc -l < "$DATASET")
echo "╔══════════════════════════════════════════════════════════╗"
echo "║ Mortdecai Batch Training Pipeline v${VERSION}"
echo "╠══════════════════════════════════════════════════════════╣"
echo "║ Dataset: ${EXAMPLE_COUNT} examples ║"
echo "║ Models: ${#MODELS[@]} ($(printf '%s ' "${MODELS[@]}" | sed 's|Qwen/||g'))║"
echo "╚══════════════════════════════════════════════════════════╝"
echo ""
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null || echo "WARNING: No GPU detected"
echo ""
# ── Conda/venv setup ──────────────────────────────────────────────────────────
if command -v conda &>/dev/null; then
source "$(conda info --base)/etc/profile.d/conda.sh"
conda activate mc-train 2>/dev/null || echo "No mc-train env, using current"
fi
mkdir -p "$CHECKPOINT_DIR"
# ── Training loop ─────────────────────────────────────────────────────────────
for BASE_MODEL in "${MODELS[@]}"; do
MODEL_SHORT=$(echo "$BASE_MODEL" | sed 's|Qwen/||; s|\.|-|g' | tr '[:upper:]' '[:lower:]')
CKPT_NAME="mortdecai-${VERSION}-${MODEL_SHORT}"
CKPT_PATH="${CHECKPOINT_DIR}/${CKPT_NAME}"
MERGED_PATH="${CKPT_PATH}-merged"
GGUF_DIR="${CKPT_PATH}-gguf"
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo " Training: ${BASE_MODEL}${CKPT_NAME}"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
TRAIN_START=$(date +%s)
# ── Step 1: LoRA fine-tune ──
if [ -d "$CKPT_PATH" ] && [ -f "$CKPT_PATH/adapter_model.safetensors" ]; then
notify "SKIP" "${CKPT_NAME} LoRA checkpoint exists"
else
notify "TRAIN" "Starting ${BASE_MODEL} LoRA fine-tune (${EXAMPLE_COUNT} examples)"
python3 training/scripts/train_lora.py \
--model "$BASE_MODEL" \
--dataset "$DATASET" \
--output "$CKPT_PATH" \
--epochs 3 \
--batch-size 4 \
--lr 2e-4 \
--rank 64 \
--alpha 128 \
--save-steps 100 \
2>&1 | tee "${CKPT_PATH}.train.log"
notify "TRAIN" "${CKPT_NAME} LoRA training complete"
fi
# ── Step 2: Merge LoRA into base ──
if [ -d "$MERGED_PATH" ] && [ -f "$MERGED_PATH/model.safetensors.index.json" ]; then
notify "SKIP" "${CKPT_NAME} merged weights exist"
else
notify "MERGE" "Merging ${CKPT_NAME} LoRA into base model..."
python3 -c "
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained('${CKPT_PATH}')
model.save_pretrained_merged('${MERGED_PATH}', tokenizer, save_method='merged_16bit')
print('Merge complete: ${MERGED_PATH}')
"
fi
# ── Step 3: Convert to GGUF (F16) ──
mkdir -p "$GGUF_DIR"
F16_GGUF="${GGUF_DIR}/${MODEL_SHORT}.F16.gguf"
if [ -f "$F16_GGUF" ]; then
notify "SKIP" "${CKPT_NAME} F16 GGUF exists"
else
notify "GGUF" "Converting ${CKPT_NAME} to F16 GGUF..."
LLAMA_CONVERT=$(find / -name "convert_hf_to_gguf.py" 2>/dev/null | head -1)
if [ -z "$LLAMA_CONVERT" ]; then
echo " WARNING: convert_hf_to_gguf.py not found, skipping GGUF export"
echo " Run manually: python3 convert_hf_to_gguf.py $MERGED_PATH --outfile $F16_GGUF --outtype f16"
continue
fi
python3 "$LLAMA_CONVERT" "$MERGED_PATH" --outfile "$F16_GGUF" --outtype f16
fi
# ── Step 4: Quantize ──
LLAMA_QUANTIZE=$(find / -name "llama-quantize" -o -name "quantize" 2>/dev/null | head -1)
if [ -z "$LLAMA_QUANTIZE" ]; then
echo " WARNING: llama-quantize not found, skipping quantization"
echo " Run manually on steel141 after downloading F16 GGUF"
else
echo " [4/4] Quantizing..."
for QUANT in ${QUANTS[$BASE_MODEL]}; do
QFILE="${GGUF_DIR}/${MODEL_SHORT}.${QUANT}.gguf"
if [ -f "$QFILE" ]; then
echo " [SKIP] $QUANT exists"
else
echo " Quantizing $QUANT..."
"$LLAMA_QUANTIZE" "$F16_GGUF" "$QFILE" "$QUANT"
fi
done
fi
TRAIN_END=$(date +%s)
ELAPSED=$(( (TRAIN_END - TRAIN_START) / 60 ))
notify "DONE" "${CKPT_NAME} complete in ${ELAPSED}m"
echo ""
echo "${CKPT_NAME} complete in ${ELAPSED}m"
echo " LoRA: $CKPT_PATH"
echo " Merged: $MERGED_PATH"
echo " GGUFs: $GGUF_DIR/"
ls -lh "$GGUF_DIR/"*.gguf 2>/dev/null | awk '{print " " $NF " (" $5 ")"}'
done
# ── Summary ───────────────────────────────────────────────────────────────────
echo ""
echo "╔══════════════════════════════════════════════════════════╗"
echo "║ All training complete! ║"
echo "╠══════════════════════════════════════════════════════════╣"
echo "║ Next steps: ║"
echo "║ 1. Download checkpoints to steel141 ║"
echo "║ 2. Register in Ollama: ║"
echo "║ ollama create mortdecai:0.6.0-9b -f Modelfile.9b ║"
echo "║ ollama create mortdecai:0.6.0-14b -f Modelfile.14b ║"
echo "║ 3. Run bake-off against 0.5.0 ║"
echo "║ 4. Deploy winner to prod ║"
echo "╚══════════════════════════════════════════════════════════╝"