Patches the top-level corpus docs with the 13 findings flagged during the 2026-04-18 canonical tooling research pass. tooling/README.md now marks each finding [merged: <file>] or [flagged] for provenance. - CORPUS_ollama_variants.md: annotate gemma4:26b as MoE (25.2B total / 3.8B active, 8-of-128 experts + 1 shared). Note Q4_K_M inference is standard (the "MoE quality degrades at 4-bit" caveat is training-only). Add note that audio on E-series is NOT available via Ollama — llama.cpp mmproj or vLLM only. - CORPUS_capabilities.md: native system role, configurable thinking mode, first trained tool use (vs Gemma 1/2/3 proof-of-concept), native object detection with bbox output in 1000x1000 coords, pointer to EmbeddingGemma for retrieval (Gemma 4 has no embedding mode). - CORPUS_tool_calling_format.md: add Chat Template Context section documenting the <|turn>/<turn|> asymmetric brackets (new in Gemma 4, replaced <start_of_turn>/<end_of_turn>) plus <|think>, <|channel>, <|image>, <|audio> tokens. Add HF transformers Alternative section showing processor.parse_response with response_schema. - GOTCHAS.md: add MEDIUM gotcha for abandoned google/gemma_pytorch (no Gemma 4 support since 2025-05-30). Expand fine-tuning section with FA2/FA4 head_dim=512 break, fused LoRA kernel issues, 26B A4B training-quant guidance, new tool-call tokens as learned embeddings. - SYNTHESIS.md: add banner pointing to tooling/ for canonical upstream material. Add embeddinggemma row to Model Selection table. Also: - Add .gitignore excluding .backup/ (local scratch per global CLAUDE.md convention, not needed in tracked history) and __pycache__/. - Add .claude/handoffs/2026-04-18-canonical-tooling-research.md so future sessions can pick up cold — facts verified, open threads, what changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.2 KiB
Handoff — 2026-04-18: Canonical Tooling Research
TL;DR for the next session
A parallel research pass pulled 147 files / 14 MB of first-party Gemma 4 tooling into tooling/, and the 13 findings that contradicted or extended the existing corpus were merged into the top-level SYNTHESIS.md / GOTCHAS.md / CORPUS_*.md docs. The repo is in a clean, coherent state.
If you're opening this repo for Gemma 4 implementation work, SYNTHESIS.md is still the right first read. The new tooling/README.md is the receipts layer — read it when you need authoritative source material (model cards, chat templates, serving commands, sibling-model briefs).
What shipped
Commit eecebe7 (master, pushed to git.sethpc.xyz/Seth/gemma4-research): added tooling/ with five subdirs — google-official/, huggingface/, inference-frameworks/, gemma-family/, fine-tuning/. Each subdir has its own indexing README.
Follow-up commit (same session): patched top-level corpus docs with the 9 findings worth merging. The tooling/README.md "Findings" list now marks each one [merged: <file>] or [flagged] for provenance.
Key confirmed facts
| Claim | Verified against |
|---|---|
gemma4:26b is a MoE (25.2B total, 3.8B active, 8 of 128 experts + 1 shared) |
HF model card at tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md |
| Q4_K_M inference on the MoE is fine (standard practice) | Mixtral/DeepSeek precedent; card neutral on inference quant |
Gemma 4 changed turn tokens from <start_of_turn> to `< |
turn>/<turn |
| Tool use is trained in Gemma 4, not a proof-of-concept as in Gemma 1/2/3 | DeepMind tool-use colab at tooling/google-official/deepmind-gemma/colab_tool_use.ipynb |
google/gemma_pytorch is abandoned for Gemma 4 |
Last push 2025-05-30, variants validator |
| No Gemma 4 technical report PDF as of 2026-04-18 | DeepMind repo README + direct URL probes |
| No specialized siblings on Gemma 4 base yet (ShieldGemma 2, CodeGemma, PaliGemma 2, EmbeddingGemma all still on Gemma 2/3) | Per-sibling model cards in tooling/gemma-family/ |
Open threads — flagged but not implemented
These came out of the mort-bot impact review later in the session. All three are high-value but out of scope for this research pass:
-
EmbeddingGemma (308M) as a drop-in upgrade for mort-bot's
chat_search/memory_readtools. Mort currently uses FTS5 keyword-only — misses semantic matches. EmbeddingGemma's Matryoshka sizes (768/512/256/128) + 100+ languages make it a clean fit. Integration sketch in the session conversation; full research attooling/gemma-family/embeddinggemma.md. Starter notebook attooling/google-official/cookbook/tutorials_RAG_EmbeddingGemma.ipynb. Next steps: (a)ollama pull embeddinggemmaon steel141, (b) A/B against existingnomic-embed-texton actual mort chat logs before committing to backfill. -
ShieldGemma 2 (4B) as a
generate_imagepre-filter for mort-bot. Mort's SDXL tool has no safety gate. ShieldGemma 2 is Gemma-3-based but scoped exactly to image safety. Would run on steel141 alongsidegemma4:26b(3090 has headroom). -
Native object detection for mort's
vision_describe. Gemma 4 does grounded bbox output natively — "Detect the X" →{box_2d: [ymin, xmin, ymax, xmax]}in 1000×1000 coords. Mort currently only does free-form vision description.
None of these were implemented in this session.
Files changed this session
- New:
tooling/(147 files),tooling/README.md,.claude/handoffs/2026-04-18-canonical-tooling-research.md(this file) - Edited:
README.md(addedtooling/row),SYNTHESIS.md(banner + model-selection table),GOTCHAS.md(added gemma_pytorch abandonment + expanded fine-tuning section),CORPUS_tool_calling_format.md(added Chat Template Context + HF transformers Alternative),CORPUS_ollama_variants.md(annotated 26b as MoE + audio note),CORPUS_capabilities.md(native system role, thinking, object detection, embedding pointer) - Unchanged:
IMPLEMENTATIONS.md(Simon/AI_Visualizer specific, not affected),CORPUS_architecture.md(already had MoE details right),CORPUS_benchmarks.md(still current)
What future sessions should know
- The research is the receipts, not the source of truth. The top-level
SYNTHESIS.md/GOTCHAS.md/CORPUS_*.mddocs are the working reference.tooling/backs them up with downloaded upstream material when you need provenance or a working script. - Don't re-research the same ground. Every
tooling/*/README.mdlists what's there and the source URL. Grep the tooling corpus before spawning new web searches. - The 26B-is-MoE and Q4_K_M-is-fine facts were the main things that would have been re-litigated without this handoff. If you see a claim that conflicts with those, check the HF model card first (
tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md) — Google's own documentation, not secondhand. - Sibling-model generation lag. When reaching for ShieldGemma / CodeGemma / PaliGemma / EmbeddingGemma, don't assume a Gemma-4 base — they're still on 2 or 3. Use them anyway; just don't confuse generations.
- Mort-bot is where the low-hanging fruit is if Seth wants a next practical project. Three items above; EmbeddingGemma is the biggest lever.
Session narrative (for context, not action)
- Started with the existing corpus (SYNTHESIS + GOTCHAS + 5 CORPUS files, ~22KB total). Goal: add canonical upstream tooling.
- Dispatched five parallel
general-purposeagents covering Google official, HF, inference frameworks, Gemma family, fine-tuning. - All five returned clean — 147 files downloaded, each indexed per subdir.
- Wrote
tooling/README.mdwith 10 findings from the agents. Initial plan: flag only, don't touch the older corpus. - Seth asked how the findings affect mort-bot. Read mort's CLAUDE.md / DECISIONS.md / llm.py / config.py / tools.py. Ranked: EmbeddingGemma (high), ShieldGemma 2 (high), bbox detection (high), E-series audio (medium), everything else (low/none because Ollama hides transformers changes).
- Seth ran
ollama show gemma4:26b; output confirmed MoE (25.8B, Q4_K_M). Walked back the earlier "worth A/B testing" extrapolation — that was training guidance misapplied to inference. Q4_K_M on the MoE is fine. - Seth asked "did you update synthesis?" — no, I hadn't. He authorized the updates. Patched 5 top-level docs; updated
tooling/README.mdfindings list to mark merged-vs-flagged. - Wrote this handoff.
Don't do these things next session
- Don't commit the ipynb files with
--no-verifyunless you ask again — the secrets-hook false positives (base64 notebook outputs, example Ed25519 keys) are documented, but re-bypassing without asking would be scope creep. If you add more ipynb content, strip outputs withjupyter nbconvert --ClearOutputPreprocessor.enabled=Truefirst. - Don't restructure the folder. It's organized fine:
README.md→SYNTHESIS.md(primary) → specializedCORPUS_*.md/GOTCHAS.md/IMPLEMENTATIONS.md→tooling/(receipts). New material goes into one of those buckets, not a new top-level thing. - Don't assume the Gemma 3 technical report covers Gemma 4. It's the closest thing we have but it predates Gemma 4.