Files
gemma4-research/.claude/handoffs/2026-04-18-canonical-tooling-research.md
T
Mortdecai 5775978899 docs: merge tooling findings into SYNTHESIS/GOTCHAS/CORPUS_* and add handoff
Patches the top-level corpus docs with the 13 findings flagged during the
2026-04-18 canonical tooling research pass. tooling/README.md now marks each
finding [merged: <file>] or [flagged] for provenance.

- CORPUS_ollama_variants.md: annotate gemma4:26b as MoE (25.2B total / 3.8B
  active, 8-of-128 experts + 1 shared). Note Q4_K_M inference is standard
  (the "MoE quality degrades at 4-bit" caveat is training-only). Add note
  that audio on E-series is NOT available via Ollama — llama.cpp mmproj
  or vLLM only.
- CORPUS_capabilities.md: native system role, configurable thinking mode,
  first trained tool use (vs Gemma 1/2/3 proof-of-concept), native object
  detection with bbox output in 1000x1000 coords, pointer to EmbeddingGemma
  for retrieval (Gemma 4 has no embedding mode).
- CORPUS_tool_calling_format.md: add Chat Template Context section
  documenting the <|turn>/<turn|> asymmetric brackets (new in Gemma 4,
  replaced <start_of_turn>/<end_of_turn>) plus <|think>, <|channel>,
  <|image>, <|audio> tokens. Add HF transformers Alternative section
  showing processor.parse_response with response_schema.
- GOTCHAS.md: add MEDIUM gotcha for abandoned google/gemma_pytorch (no
  Gemma 4 support since 2025-05-30). Expand fine-tuning section with FA2/FA4
  head_dim=512 break, fused LoRA kernel issues, 26B A4B training-quant
  guidance, new tool-call tokens as learned embeddings.
- SYNTHESIS.md: add banner pointing to tooling/ for canonical upstream
  material. Add embeddinggemma row to Model Selection table.

Also:
- Add .gitignore excluding .backup/ (local scratch per global CLAUDE.md
  convention, not needed in tracked history) and __pycache__/.
- Add .claude/handoffs/2026-04-18-canonical-tooling-research.md so future
  sessions can pick up cold — facts verified, open threads, what changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:48:26 -04:00

7.2 KiB
Raw Blame History

Handoff — 2026-04-18: Canonical Tooling Research

TL;DR for the next session

A parallel research pass pulled 147 files / 14 MB of first-party Gemma 4 tooling into tooling/, and the 13 findings that contradicted or extended the existing corpus were merged into the top-level SYNTHESIS.md / GOTCHAS.md / CORPUS_*.md docs. The repo is in a clean, coherent state.

If you're opening this repo for Gemma 4 implementation work, SYNTHESIS.md is still the right first read. The new tooling/README.md is the receipts layer — read it when you need authoritative source material (model cards, chat templates, serving commands, sibling-model briefs).

What shipped

Commit eecebe7 (master, pushed to git.sethpc.xyz/Seth/gemma4-research): added tooling/ with five subdirs — google-official/, huggingface/, inference-frameworks/, gemma-family/, fine-tuning/. Each subdir has its own indexing README.

Follow-up commit (same session): patched top-level corpus docs with the 9 findings worth merging. The tooling/README.md "Findings" list now marks each one [merged: <file>] or [flagged] for provenance.

Key confirmed facts

Claim Verified against
gemma4:26b is a MoE (25.2B total, 3.8B active, 8 of 128 experts + 1 shared) HF model card at tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md
Q4_K_M inference on the MoE is fine (standard practice) Mixtral/DeepSeek precedent; card neutral on inference quant
Gemma 4 changed turn tokens from <start_of_turn> to `< turn>/<turn
Tool use is trained in Gemma 4, not a proof-of-concept as in Gemma 1/2/3 DeepMind tool-use colab at tooling/google-official/deepmind-gemma/colab_tool_use.ipynb
google/gemma_pytorch is abandoned for Gemma 4 Last push 2025-05-30, variants validator
No Gemma 4 technical report PDF as of 2026-04-18 DeepMind repo README + direct URL probes
No specialized siblings on Gemma 4 base yet (ShieldGemma 2, CodeGemma, PaliGemma 2, EmbeddingGemma all still on Gemma 2/3) Per-sibling model cards in tooling/gemma-family/

Open threads — flagged but not implemented

These came out of the mort-bot impact review later in the session. All three are high-value but out of scope for this research pass:

  1. EmbeddingGemma (308M) as a drop-in upgrade for mort-bot's chat_search / memory_read tools. Mort currently uses FTS5 keyword-only — misses semantic matches. EmbeddingGemma's Matryoshka sizes (768/512/256/128) + 100+ languages make it a clean fit. Integration sketch in the session conversation; full research at tooling/gemma-family/embeddinggemma.md. Starter notebook at tooling/google-official/cookbook/tutorials_RAG_EmbeddingGemma.ipynb. Next steps: (a) ollama pull embeddinggemma on steel141, (b) A/B against existing nomic-embed-text on actual mort chat logs before committing to backfill.

  2. ShieldGemma 2 (4B) as a generate_image pre-filter for mort-bot. Mort's SDXL tool has no safety gate. ShieldGemma 2 is Gemma-3-based but scoped exactly to image safety. Would run on steel141 alongside gemma4:26b (3090 has headroom).

  3. Native object detection for mort's vision_describe. Gemma 4 does grounded bbox output natively — "Detect the X" → {box_2d: [ymin, xmin, ymax, xmax]} in 1000×1000 coords. Mort currently only does free-form vision description.

None of these were implemented in this session.

Files changed this session

  • New: tooling/ (147 files), tooling/README.md, .claude/handoffs/2026-04-18-canonical-tooling-research.md (this file)
  • Edited: README.md (added tooling/ row), SYNTHESIS.md (banner + model-selection table), GOTCHAS.md (added gemma_pytorch abandonment + expanded fine-tuning section), CORPUS_tool_calling_format.md (added Chat Template Context + HF transformers Alternative), CORPUS_ollama_variants.md (annotated 26b as MoE + audio note), CORPUS_capabilities.md (native system role, thinking, object detection, embedding pointer)
  • Unchanged: IMPLEMENTATIONS.md (Simon/AI_Visualizer specific, not affected), CORPUS_architecture.md (already had MoE details right), CORPUS_benchmarks.md (still current)

What future sessions should know

  • The research is the receipts, not the source of truth. The top-level SYNTHESIS.md / GOTCHAS.md / CORPUS_*.md docs are the working reference. tooling/ backs them up with downloaded upstream material when you need provenance or a working script.
  • Don't re-research the same ground. Every tooling/*/README.md lists what's there and the source URL. Grep the tooling corpus before spawning new web searches.
  • The 26B-is-MoE and Q4_K_M-is-fine facts were the main things that would have been re-litigated without this handoff. If you see a claim that conflicts with those, check the HF model card first (tooling/huggingface/model-cards/gemma-4-26B-A4B-it-README.md) — Google's own documentation, not secondhand.
  • Sibling-model generation lag. When reaching for ShieldGemma / CodeGemma / PaliGemma / EmbeddingGemma, don't assume a Gemma-4 base — they're still on 2 or 3. Use them anyway; just don't confuse generations.
  • Mort-bot is where the low-hanging fruit is if Seth wants a next practical project. Three items above; EmbeddingGemma is the biggest lever.

Session narrative (for context, not action)

  1. Started with the existing corpus (SYNTHESIS + GOTCHAS + 5 CORPUS files, ~22KB total). Goal: add canonical upstream tooling.
  2. Dispatched five parallel general-purpose agents covering Google official, HF, inference frameworks, Gemma family, fine-tuning.
  3. All five returned clean — 147 files downloaded, each indexed per subdir.
  4. Wrote tooling/README.md with 10 findings from the agents. Initial plan: flag only, don't touch the older corpus.
  5. Seth asked how the findings affect mort-bot. Read mort's CLAUDE.md / DECISIONS.md / llm.py / config.py / tools.py. Ranked: EmbeddingGemma (high), ShieldGemma 2 (high), bbox detection (high), E-series audio (medium), everything else (low/none because Ollama hides transformers changes).
  6. Seth ran ollama show gemma4:26b; output confirmed MoE (25.8B, Q4_K_M). Walked back the earlier "worth A/B testing" extrapolation — that was training guidance misapplied to inference. Q4_K_M on the MoE is fine.
  7. Seth asked "did you update synthesis?" — no, I hadn't. He authorized the updates. Patched 5 top-level docs; updated tooling/README.md findings list to mark merged-vs-flagged.
  8. Wrote this handoff.

Don't do these things next session

  • Don't commit the ipynb files with --no-verify unless you ask again — the secrets-hook false positives (base64 notebook outputs, example Ed25519 keys) are documented, but re-bypassing without asking would be scope creep. If you add more ipynb content, strip outputs with jupyter nbconvert --ClearOutputPreprocessor.enabled=True first.
  • Don't restructure the folder. It's organized fine: README.mdSYNTHESIS.md (primary) → specialized CORPUS_*.md / GOTCHAS.md / IMPLEMENTATIONS.mdtooling/ (receipts). New material goes into one of those buckets, not a new top-level thing.
  • Don't assume the Gemma 3 technical report covers Gemma 4. It's the closest thing we have but it predates Gemma 4.