Root cause: self-play opened/closed a new TCP socket for every RCON command
(hundreds/minute). Paper's RCON listener creates a thread per connection,
overwhelming the server until it stopped.
Fix: PersistentRCON class maintains a single connection per server with
auto-reconnect. Thread-safe via lock. Connection pool keyed by host:port.
Applied to:
- mc_aigod_paper.py (prod paper-ai + dev)
- mc_aigod.py (shrink-world)
- self_play.py (training data generation)
- persistent_rcon.py (shared module)
Before: ~100+ RCON connections/minute → server crash
After: 3 persistent connections total → stable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>