- Expanded dataset from 31 to 182 examples (45 manual + 106 extracted from server logs) - Built eval/harness.py with per-category breakdowns and baseline tracking - Built eval/live_bakeoff.py for RCON-verified model comparison on live server - Extracted training data from prayer logs, sudo logs, and bug reports on CT 644 - Added Reddit post draft and modmail for playtester recruitment - Updated server context: all servers now online-mode=false + whitelist - Updated PLAN.md with Phase 2 progress Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.0 KiB
Reddit Post
Subreddit: r/admincraft — could also work on r/Minecraft or r/mcservers
Title: Looking for a handful of playtesters for an experimental Minecraft server feature (1.21, Java)
Body:
I'm working on a custom feature for my 1.21 Java Edition server and I need some players to try it out and give feedback. It involves AI-powered in-game interactions — you'll be able to do some things through chat that you normally can't on a vanilla server.
I don't want to over-explain it before people try it — half the fun is seeing how players react to it cold. What I will say:
- It's something you interact with through in-game chat
- It does things in the world based on what you say
- It's entertaining, occasionally unpredictable, and I want to see what happens when real players poke at it
Details:
- Whitelisted server, Java Edition 1.21.x, hosted in the US
- Looking for ~10 players for a few sessions over the next couple weeks
- Sessions will be scheduled around availability (probably evenings/weekends)
- Your in-game chat during these sessions will be logged for development purposes — no personal data beyond your Minecraft username
- This is a hobby project, not commercial
If this sounds interesting, fill out the short form below and I'll follow up with details and the server IP.
[FORM LINK]
Happy to answer general questions in the comments, but I'm going to be vague about the specifics on purpose.
Form Questions
Google Form / Typeform — "Playtest Application"
Page header: Quick form to make sure we get a good group. Takes ~2 minutes.
1. What's your Minecraft Java Edition username?
(Short answer, required)
Purpose: Whitelist + Mojang API verification that the account exists.
2. How long have you been playing Minecraft?
(Multiple choice, required)
- Less than a year
- 1 – 3 years
- 3+ years
Purpose: Context. Not a dealbreaker either way.
3. Have you played on community/SMP servers before?
(Multiple choice, required)
- Yes, regularly
- A few times
- No, mostly singleplayer
Purpose: SMP players understand shared-world norms.
4. What interests you about this? (pick all that apply)
(Checkboxes, required)
- Curious what the feature actually is
- Helping test something new
- Trying to break things (in a helpful way)
- Looking for a server to hang out on
Purpose: "Looking for a server" alone is a soft red flag — they may not engage. Best candidates are curious or want to help test.
5. You're testing a new server feature and it refuses to do something you asked. What do you do?
(Long answer, required)
Purpose: The key screener. Good: curiosity, rephrasing, reporting the issue. Red flags: fixation on bypassing/forcing it, or frustration that reads as entitlement.
6. Have you ever been banned from a server? If so, what happened?
(Long answer, required)
Purpose: Honesty check. Minor/old bans with self-awareness are fine. Defensiveness or serial bans are red flags.
7. When are you generally available? (timezone + rough hours)
(Short answer, required)
Purpose: Scheduling. Also filters zero-effort applications.
8. Anything else?
(Long answer, optional)
Purpose: Personality signal. Thoughtful responses correlate with better testers.
Scoring Rubric (internal, not shown to applicants)
| Signal | Green | Yellow | Red |
|---|---|---|---|
| Q4 (interest) | Multiple boxes, especially "curious" or "test" | Single box, but reasonable | Only "looking for a server" |
| Q5 (refusal) | Curious, tries alternatives, reports it | Short but benign ("I'd move on") | Wants to force/bypass, hostile tone |
| Q6 (ban history) | Clean or honest with context | Vague but not defensive | Defensive, hostile, or serial bans |
| Overall effort | Complete sentences, reads like a person | Terse but present | Single-word answers, empty fields |
Auto-approve: All green. Manual review: Any yellow. Reject: Any red.