Files
gemma4-research/scripts/native-bakeoff/runs/matt-26b/ollama-native/research.json
T
Mortdecai df5542f7d6 feat: native-bakeoff scaffold — Ollama JSON vs native-token tool-calling
Three-arm harness under scripts/native-bakeoff/:
- arm A: /api/chat with JSON tools (current default)
- arm B: /api/generate raw:true with canonical HF jinja template rendered directly
- arm C: google-deepmind/gemma JAX ToolSampler (env-gated, JAX required)

Interim finding from A+B sweep on matt-strix gemma4:26b Q4: Ollama's
bidirectional JSON↔native tool-call translator is faithful. The "long"
multi-tool task produces identical behavior (7 steps / 6 tools) on both
arms. Earlier arm-B parser bug that looked like a divergence was a
harness issue: preserving the model's <|channel>thought\n<channel|>
prefix as assistant content tripped the jinja template's
tool_response-following conditional, appending a spurious <turn|>\n
that corrupted the next step's prompt. Fixed by dropping the channel
prefix on the assistant message.

Arm C left as scaffolded-but-not-run — the JAX/bf16 reference path
would answer "does the GGUF runtime diverge from DeepMind's
implementation" but requires a separate env with the `gemma` PyPI
package. Parked pending SDXL eviction or vast-h100 session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 05:45:12 -04:00

61 lines
4.0 KiB
JSON

{
"arm": "ollama-native",
"model": "gemma4:26b",
"num_ctx": 8192,
"num_predict": 2048,
"started_at": 1776600748.3832083,
"turns": [
{
"step": 1,
"elapsed_s": 0.8,
"prompt_eval_count": 1311,
"eval_count": 21,
"content_len": 95,
"tool_call_count": 1,
"stop_reason": "stop",
"history_chars_before_append": 2677,
"raw_completion_head": "<|channel>thought\n<channel|><|tool_call>call:web_search{query:<|\"|>what is Home Assistant<|\"|>}",
"raw_completion_tail": "",
"prompt_tail": "he deal with the proxmox cluster nodes again<turn|>\n<|turn>model\nFour nodes: pve173 (tank ZFS host, PowerEdge R820), pve112 (workhorse), pve241 (caddy + game servers), and pve197 (GPU inference). Corosync on 10.10.10.0/24.<turn|>\n<|turn>user\n[2026-04-18 14:20] @seth:sethpc.xyz: Look up what Home Assistant is, then check chat history for any prior mentions of it in this server.<turn|>\n<|turn>model\n",
"prompt_head": "<bos><|turn>system\nYou are Mort, a direct and witty AI assistant on Seth's Matrix server. Powered by Gemma 4. Current time: Saturday, April 18 2026 02:30 PM EDT.\n\nWhen a tool can answer the question, "
},
{
"step": 2,
"elapsed_s": 0.84,
"prompt_eval_count": 1472,
"eval_count": 19,
"content_len": 88,
"tool_call_count": 1,
"stop_reason": "stop",
"history_chars_before_append": 3272,
"raw_completion_head": "<|channel>thought\n<channel|><|tool_call>call:chat_search{query:<|\"|>Home Assistant<|\"|>}",
"raw_completion_tail": "",
"prompt_tail": "a community discussion with multiple perspectives and useful links to follow up on. https://example.com/two\n3. Example result three \u2014 official documentation or reference material. https://example.com/three\n4. Example result four \u2014 a recent news article with relevant context. https://example.com/four\n5. Example result five \u2014 a tutorial or how-to guide. https://example.com/five<|\"|>}<tool_response|>",
"prompt_head": "<bos><|turn>system\nYou are Mort, a direct and witty AI assistant on Seth's Matrix server. Powered by Gemma 4. Current time: Saturday, April 18 2026 02:30 PM EDT.\n\nWhen a tool can answer the question, "
},
{
"step": 3,
"elapsed_s": 2.43,
"prompt_eval_count": 1604,
"eval_count": 97,
"content_len": 502,
"tool_call_count": 0,
"stop_reason": "stop",
"history_chars_before_append": 3499,
"raw_completion_head": "<|channel>thought\n<channel|>Home Assistant is an open-source home automation platform that acts as a central hub for controlling smart devices (lights, sensors, thermostats, etc.) from different manufacturers, allowing you to create complex",
"raw_completion_tail": "oards.\n\nAs for the chat history, I couldn't find any prior mentions of \"Home Assistant\" in this server. The only recent infrastructure-related discussions I found were regarding setting up a shared Grafana dashboard for the Proxmox cluster.",
"prompt_tail": ".com/five<|\"|>}<tool_response|><|tool_call>call:chat_search{query:<|\"|>Home Assistant<|\"|>}<tool_call|><|tool_response>response:chat_search{value:<|\"|>chat_search results:\n[2026-03-14 22:00] @seth:sethpc.xyz in #general: we should set up a shared grafana dashboard for the proxmox cluster\n[2026-03-20 18:30] @seth:sethpc.xyz in #infra: done, it's on CT 300 at grafana.sethpc.xyz<|\"|>}<tool_response|>",
"prompt_head": "<bos><|turn>system\nYou are Mort, a direct and witty AI assistant on Seth's Matrix server. Powered by Gemma 4. Current time: Saturday, April 18 2026 02:30 PM EDT.\n\nWhen a tool can answer the question, "
}
],
"final": {
"halt_reason": "no_tool_calls",
"steps_used": 3,
"tool_calls_total": 2,
"wall_clock_s": 4.07,
"final_message_count": 18,
"final_history_chars": 4001
},
"task": "research",
"task_prompt": "Look up what Home Assistant is, then check chat history for any prior mentions of it in this server."
}