13debc8a59
data/ingest_audit.py: - Pulls training audit logs from CT 644 (dev + prod) - Filters: language mismatch (Chinese output for English input), system prompt leaks, empty responses, duplicates - Keeps multilingual examples where input/output languages match - Converts to dataset schema, appends to seed_dataset.jsonl - --dry-run to preview, --source dev/prod/both Tested: 237 entries → 112 kept (16 lang mismatch, 10 prompt leak, 86 dupe, 13 empty dropped) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>