Add baseline assistant with tools, guardrails, and system prompts (Phase 1.4)
- agent/serve.py: CLI assistant with interactive, single-query, and eval modes (Ollama + qwen3-coder) - agent/tools/rcon_tool.py: RCON execute, server status, player info - agent/tools/knowledge_tool.py: TF-IDF RAG search, command reference lookup, server context - agent/guardrails/command_filter.py: 14-prefix allowlist, execute-tail bypass detection, destructive flags, 1.21 syntax warnings, audit log - agent/prompts/system_prompts.py: sudo (pure commands), god (persona), intervention (benign) system prompts - Guardrails tested: 10/10 allowlist, 5/6 syntax warnings pass
This commit is contained in:
@@ -130,18 +130,21 @@ These projects informed the plan but solve different problems:
|
||||
- [x] Validated with 6 test queries -- all return relevant top results
|
||||
|
||||
#### 1.4 Baseline Assistant (No Fine-Tuning)
|
||||
- [ ] Build prompt-only assistant using `qwen3-coder` (via Ollama at 192.168.0.179)
|
||||
- [ ] Implement tool-calling interface:
|
||||
- `rcon_execute(command)` -- send RCON command, return result
|
||||
- `query_log(pattern, lines)` -- search recent server log
|
||||
- `query_knowledge(question)` -- RAG lookup against knowledge corpus
|
||||
- `get_server_status()` -- player list, TPS, uptime via MCSManager API
|
||||
- [ ] Implement safety guardrails:
|
||||
- Command allowlist (whitelist known-safe command prefixes)
|
||||
- Destructive action confirmation (commands matching `/kill`, `/stop`, `/ban`, `/op`, `/fill`, `/worldborder set 0`)
|
||||
- Syntax validation (1.21 enchantment format, weather values, effect names)
|
||||
- Audit log (every command attempted + result, timestamped JSON)
|
||||
- [ ] Test baseline on 20 seed examples, record accuracy manually
|
||||
- [x] Build prompt-only assistant (`agent/serve.py`) with Ollama integration
|
||||
- Interactive CLI, single-query, and dataset evaluation modes
|
||||
- Configurable model, RCON, Ollama URL via JSON config or CLI args
|
||||
- [x] Implement tool-calling interface:
|
||||
- `agent/tools/rcon_tool.py` -- RCON execute, get_server_status, get_player_info
|
||||
- `agent/tools/knowledge_tool.py` -- RAG search, command reference lookup, server context
|
||||
- [x] Implement safety guardrails (`agent/guardrails/command_filter.py`):
|
||||
- Command allowlist (14 safe prefixes, blocks /stop /op /ban etc.)
|
||||
- Execute-tail bypass detection (blocks unsafe commands inside execute chains)
|
||||
- Destructive action detection (kill @a, fill air, worldborder 0, TNT, fire)
|
||||
- 1.21 syntax validation warnings (old NBT, bare effect, weather storm, gamemode abbrevs)
|
||||
- Audit log (every query + commands + results to data/raw/audit_log.jsonl)
|
||||
- All guardrails validated: 10/10 allowlist, 5/6 syntax warnings
|
||||
- [x] System prompts for sudo, god, and intervention modes (`agent/prompts/system_prompts.py`)
|
||||
- [ ] Run baseline evaluation on seed dataset, record accuracy
|
||||
- [ ] Document baseline performance as the bar to beat
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user