Files
Mortdecai/REDDIT_EVAL_INVITE.md
T
Seth 38b9a02e45 Phase 2: eval harness, 182 examples, live bake-off, playtest infrastructure
- Expanded dataset from 31 to 182 examples (45 manual + 106 extracted from server logs)
- Built eval/harness.py with per-category breakdowns and baseline tracking
- Built eval/live_bakeoff.py for RCON-verified model comparison on live server
- Extracted training data from prayer logs, sudo logs, and bug reports on CT 644
- Added Reddit post draft and modmail for playtester recruitment
- Updated server context: all servers now online-mode=false + whitelist
- Updated PLAN.md with Phase 2 progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 13:38:12 -04:00

4.0 KiB
Raw Blame History

Reddit Post

Subreddit: r/admincraft — could also work on r/Minecraft or r/mcservers

Title: Looking for a handful of playtesters for an experimental Minecraft server feature (1.21, Java)


Body:

I'm working on a custom feature for my 1.21 Java Edition server and I need some players to try it out and give feedback. It involves AI-powered in-game interactions — you'll be able to do some things through chat that you normally can't on a vanilla server.

I don't want to over-explain it before people try it — half the fun is seeing how players react to it cold. What I will say:

  • It's something you interact with through in-game chat
  • It does things in the world based on what you say
  • It's entertaining, occasionally unpredictable, and I want to see what happens when real players poke at it

Details:

  • Whitelisted server, Java Edition 1.21.x, hosted in the US
  • Looking for ~10 players for a few sessions over the next couple weeks
  • Sessions will be scheduled around availability (probably evenings/weekends)
  • Your in-game chat during these sessions will be logged for development purposes — no personal data beyond your Minecraft username
  • This is a hobby project, not commercial

If this sounds interesting, fill out the short form below and I'll follow up with details and the server IP.

[FORM LINK]


Happy to answer general questions in the comments, but I'm going to be vague about the specifics on purpose.


Form Questions

Google Form / Typeform — "Playtest Application"

Page header: Quick form to make sure we get a good group. Takes ~2 minutes.


1. What's your Minecraft Java Edition username?

(Short answer, required)

Purpose: Whitelist + Mojang API verification that the account exists.


2. How long have you been playing Minecraft?

(Multiple choice, required)

  • Less than a year
  • 1 3 years
  • 3+ years

Purpose: Context. Not a dealbreaker either way.


3. Have you played on community/SMP servers before?

(Multiple choice, required)

  • Yes, regularly
  • A few times
  • No, mostly singleplayer

Purpose: SMP players understand shared-world norms.


4. What interests you about this? (pick all that apply)

(Checkboxes, required)

  • Curious what the feature actually is
  • Helping test something new
  • Trying to break things (in a helpful way)
  • Looking for a server to hang out on

Purpose: "Looking for a server" alone is a soft red flag — they may not engage. Best candidates are curious or want to help test.


5. You're testing a new server feature and it refuses to do something you asked. What do you do?

(Long answer, required)

Purpose: The key screener. Good: curiosity, rephrasing, reporting the issue. Red flags: fixation on bypassing/forcing it, or frustration that reads as entitlement.


6. Have you ever been banned from a server? If so, what happened?

(Long answer, required)

Purpose: Honesty check. Minor/old bans with self-awareness are fine. Defensiveness or serial bans are red flags.


7. When are you generally available? (timezone + rough hours)

(Short answer, required)

Purpose: Scheduling. Also filters zero-effort applications.


8. Anything else?

(Long answer, optional)

Purpose: Personality signal. Thoughtful responses correlate with better testers.


Scoring Rubric (internal, not shown to applicants)

Signal Green Yellow Red
Q4 (interest) Multiple boxes, especially "curious" or "test" Single box, but reasonable Only "looking for a server"
Q5 (refusal) Curious, tries alternatives, reports it Short but benign ("I'd move on") Wants to force/bypass, hostile tone
Q6 (ban history) Clean or honest with context Vague but not defensive Defensive, hostile, or serial bans
Overall effort Complete sentences, reads like a person Terse but present Single-word answers, empty fields

Auto-approve: All green. Manual review: Any yellow. Reject: Any red.