Three-tier constraint model, mode-aware eval, boundary examples, playtest tooling
Eval harness: - Mode-aware scoring: sudo=strict (exact match), pray/god=soft (category match, in-character, appropriate intensity) - New metrics: cmd_category_match, appropriate_intensity, scoring_mode breakdown - Eval defaults to steel141 (192.168.0.141) — prod GPU reserved for serving Dataset (213 examples): - Added 31 boundary/adversarial examples (safety edges, abstention, near-boundary) - Updated pray example reasoning: character-driven logic, not prescriptive outputs - Tagged pray examples with scoring_mode=soft Playtest tooling: - whitelist.sh: add/remove/list across all 3 servers - FRIENDS_INVITE.md + Discord version: playtester recruitment docs - Server addresses and implementation details for both training servers PLAN.md: - Three-tier constraint model documented (sudo/pray/god_system) - Success criteria split by scoring mode - All session decisions logged Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,81 @@
|
||||
# Playtest My Minecraft AI — I Need Your Bad Ideas
|
||||
|
||||
Hey — I built something for my Minecraft server that I think you'll get a kick out of, and I need people to come break it.
|
||||
|
||||
## What It Is
|
||||
|
||||
I have an AI running on my server that listens to in-game chat and does things in the world based on what you say. Two modes:
|
||||
|
||||
> **`sudo <anything>`** — Talk to the server in plain English. "sudo give me a diamond sword with sharpness 5" and it just... does it. "sudo build a house here" and it places blocks. "sudo kill all the zombies" and they die. It translates whatever you type into real server commands and runs them live.
|
||||
|
||||
> **`pray <anything>`** — Talk to God. Literally. There's an AI character playing God on the server. Pray for items, pray for smiting your enemies, pray something offensive and get punished. It responds in-character with dramatic messages and then actually grants or denies your request with real effects, items, lightning bolts, whatever it decides.
|
||||
|
||||
There's also `bug_log <what happened>` — if something goes wrong or doesn't do what you expected, type that and it captures the whole interaction so I can fix it.
|
||||
|
||||
## What's Actually Happening Under the Hood
|
||||
|
||||
The AI is a small open-source language model (7 billion parameters) running on a GPU in my server closet. No cloud, no OpenAI, no API costs — it's all local hardware. The model reads your chat message, figures out what Minecraft commands would accomplish what you asked for, and the server executes them. There's a safety layer that blocks dangerous stuff (it won't `/stop` the server or `/op` anyone, even if you ask nicely).
|
||||
|
||||
The interesting part: the model isn't great yet. It gets maybe 60-75% of requests right on the first try. It sometimes uses outdated command syntax, hallucinates item names that don't exist, or just doesn't understand what you want. **That's where you come in.**
|
||||
|
||||
## Why I Need You
|
||||
|
||||
I'm building a training dataset to fine-tune the model so it actually gets good at this. Every interaction you have — every sudo command, every prayer, every bug report — gets logged as a structured training example. The more variety I get, the better the model becomes.
|
||||
|
||||
What I can't do is generate this data myself. I've been writing my own test cases and I'm out of ideas. I need real people who will:
|
||||
- Ask for things I'd never think of
|
||||
- Phrase requests in ways I wouldn't
|
||||
- Try to confuse it, trick it, or find edge cases
|
||||
- Actually play the game and use it organically, not just run a test script
|
||||
|
||||
You don't need to do anything special. Just play Minecraft and talk to the AI when you feel like it. The logging happens automatically.
|
||||
|
||||
## The Servers
|
||||
|
||||
Both are Java Edition 1.21.x, whitelisted, always up. They run different AI implementations so I'm collecting data from both.
|
||||
|
||||
### `sethpc.xyz:25567` — Paper AI Server (the full experience)
|
||||
Paper server with the complete AI stack. This is the main training server.
|
||||
- `pray` and `sudo` both work for **all players**
|
||||
- LangGraph session gateway — the AI can use tools (wiki lookups, web search) mid-conversation
|
||||
- FastAsyncWorldEdit for building commands
|
||||
- Divine interventions on a random timer — God will occasionally just... do things
|
||||
- Prayer memory — God remembers your previous prayers and holds grudges
|
||||
- Full training audit logging — every interaction is captured as structured data
|
||||
|
||||
### `sethpc.xyz:25566` — Shrink World (the challenge server)
|
||||
Vanilla survival with a twist: the world border shrinks every time someone dies, and creeper spawns are 5x. Hard difficulty.
|
||||
- `pray` works, `sudo` is admin-only here
|
||||
- Simpler AI implementation — no gateway, no tools, no templates
|
||||
- Same God persona but less capable (fewer max commands, shorter context)
|
||||
- Starter kit on first join
|
||||
- This one is more about playing the game and using pray organically when you need help surviving
|
||||
|
||||
The Paper server is where I need the most data, but the shrink server gives me a different kind of interaction — players praying under pressure when they're actually in trouble, not just testing.
|
||||
|
||||
## What You Need
|
||||
|
||||
- Minecraft Java Edition
|
||||
- Your username so I can whitelist you
|
||||
|
||||
**DM me your Minecraft username and I'll add you.**
|
||||
|
||||
## If You Want to Nerd Out
|
||||
|
||||
The whole project is on my Gitea:
|
||||
|
||||
> **Main project** *(private, ask for access)*
|
||||
> <https://git.sethpc.xyz/Seth/Minecraft-AI-model>
|
||||
> 182 training examples and counting. Eval harness that scores models on command accuracy, syntax, safety compliance. Live bake-off tool that runs commands on the actual server and compares results.
|
||||
|
||||
> **The AI God service** *(private, ask for access)*
|
||||
> <https://git.sethpc.xyz/Seth/minecraft-ai-god-paper-fork>
|
||||
> ~3800 lines of Python — log tailing, RCON execution, LLM integration, safety guardrails. Prayer memory so God remembers what you said. Automatic syntax repair for common model mistakes. Divine interventions on a random timer (you'll see).
|
||||
|
||||
> **Model bake-off results** *(public)*
|
||||
> <https://git.sethpc.xyz/Seth/small-llm-bakeoff>
|
||||
> Tested 7 models from 3.8B to 30B parameters on the same tasks. The 7B model beat the 30B model on every metric. Found a bug where Qwen models were using all their tokens *thinking* and returning empty answers.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Come play Minecraft on my server. Talk to the AI. Try to break it. Every weird thing you ask it makes the model better. **DM me your username.**
|
||||
Reference in New Issue
Block a user