Files

T

History

Mortdecai eecebe7ef5 docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

Five-lane parallel research pass. Each subdir under tooling/ has its own
README indexing downloaded files with verified upstream sources.

- google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts,
  gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev
  HTML snapshots, Gemma 3 tech report
- huggingface/: 8 gemma-4-* model cards, chat-template .jinja files,
  tokenizer_config.json, transformers gemma4/ source, launch blog posts,
  official HF Spaces app.py
- inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI
  comparison, run_commands.sh with 8 working launches, 9 code snippets
- gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2,
  Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma)
- fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE),
  TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md

Findings that update earlier CORPUS_* docs are flagged in tooling/README.md
(not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch
abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM,
FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech
report PDF yet, no Gemma-4-generation specialized siblings yet.

Pre-commit secrets hook bypassed per user authorization — flagged "secrets"
are base64 notebook cell outputs and example Ed25519 keys in the HDP
agentic-security demo, not real credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-18 12:24:48 -04:00

axolotl

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

google-cookbook

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

huggingface-recipes

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

ollama-llamacpp

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

trl

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

unsloth

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

README.md

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

recipe-recommendation.md

docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

2026-04-18 12:24:48 -04:00

README.md

Gemma 4 Fine-Tuning Tooling — Index

Research captured 2026-04-18. All downloads verified against upstream repos.

TL;DR

Tool	Gemma 4 coverage	GPU floor (LoRA)	GPU floor (full FT)	Best at
Unsloth	Full parity — all 4 sizes, text/vision/audio/GRPO/RL	E2B: 8 GB, E4B: 17 GB, 26B A4B: ~40 GB, 31B QLoRA: 22 GB	Not recommended locally	Fastest path, Google-blessed, free Colab
TRL	Partial — no `sft_gemma4.py` yet; `sft_gemma3.py` + `AutoModelForImageTextToText` works	Same as Unsloth w/ `load_in_4bit`	2x H100 min for 31B	Research-grade control, DPO/GRPO/online RL, VLM GRPO on Gemma 4 (CARLA)
Axolotl	Native Gemma 4 configs shipped (`examples/gemma4/`)	Single 5090 (32 GB) for 26B A4B QLoRA validated	>80 GB, "not tested" per README	Declarative YAML, multi-GPU FSDP, MoE expert LoRA
Google cookbook	`docs/core/*` notebooks default to `google/gemma-4-E2B`	Depends on Colab tier	L4 (22 GB) for E4B QLoRA	Canonical baseline, paired with ai.google.dev docs
HF gemma-recipes	Inference + one GRPO VLM script (CARLA)	E2B on T4	—	VLM GRPO with tool-calling environment
Ollama	Serves fine-tuned Gemma 4 via Modelfile `ADAPTER`	—	—	Final serving step

Recommendation for Seth: Unsloth. See recipe-recommendation.md.

1. Unsloth (`unsloth/`)

Upstream: unslothai/notebooks, unslothai/unsloth License: LGPL-3.0 (notebooks), Apache-2.0 (library) Published Gemma 4 Dynamic quants:

unsloth/gemma-4-{E2B,E4B,31B,26B-A4B}-{,it}-unsloth-bnb-4bit (dynamic 4-bit)
unsloth/gemma-4-{E2B,E4B,31B,26B-A4B}-it-GGUF (GGUF for inference)
Collection: https://huggingface.co/collections/unsloth/gemma-4

Downloaded files (local paths under this directory):

unsloth/notebooks/Gemma4_(E2B)-Text.ipynb — canonical SFT notebook, T4-compatible
unsloth/notebooks/Gemma4_(E4B)-Text.ipynb — 10 GB VRAM, higher accuracy
unsloth/notebooks/Gemma4_(26B_A4B)-Text.ipynb — MoE SFT (needs A100+)
unsloth/notebooks/Gemma4_(31B)-Text.ipynb — dense 31B SFT
unsloth/notebooks/Gemma4_(E2B|E4B|26B_A4B|31B)-Vision.ipynb — vision SFT w/ UnslothVisionDataCollator
unsloth/notebooks/Gemma4_(E2B|E4B)-Audio.ipynb — audio SFT (E2B/E4B only — 31B/26B have no audio encoder)
unsloth/notebooks/Gemma4_(E2B)_GRPO.ipynb — GRPO RL w/ Python reward funcs
unsloth/notebooks/Gemma4_(E2B)_Reinforcement_Learning_{2048,Sudoku}_Game.ipynb — game-playing RL
unsloth/python_scripts/*.py — same content as .py scripts (easier to grep/modify)
unsloth/kaggle/Gemma4_(31B)-Text.ipynb, unsloth/kaggle/Gemma4_(E4B)-Text.ipynb — Kaggle-flavored variants
unsloth/docs/unsloth-README.md — top-level Unsloth README

Upstream URLs (useful to share):

SFT E4B Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma4_(E4B)-Text.ipynb
GRPO Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma4_(E2B)_GRPO.ipynb
Unsloth Gemma 4 docs: https://unsloth.ai/docs/models/gemma-4/train

Unsloth chat-template & masking detail (CRITICAL for Gemma 4)

Gemma 4 does not use Gemma 3's <start_of_turn> / <end_of_turn>. The new format is:

<bos><|turn>user
Hello<turn|>
<|turn>model
Hey there!<turn|>

Unsloth's helper:

from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(tokenizer, chat_template = "gemma-4")  # literal "gemma-4", not "gemma4"

Response-only masking (matches Unsloth's convention; everything before response_part is loss-masked):

from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|turn>user\n",
    response_part    = "<|turn>model\n",
)

<bos> gotcha: apply_chat_template prepends <bos>; Unsloth's formatting_prompts_func strips it with .removeprefix('<bos>') because the SFTTrainer's data collator adds its own — double <bos> silently degrades training.

Tool tokens (<|tool>, <|tool_call>, <|tool_response>, <|"|>) are not masked in Unsloth's default setup — they flow through as plain text inside user/assistant turns. If you're fine-tuning on tool-call data, include full <|tool_call>...<tool_call|> markup in the assistant content field; the template doesn't need a special role=tool branch.

Unsloth MoE note

For 26B A4B (128 experts): Unsloth explicitly recommends bf16/16-bit LoRA, NOT 4-bit QLoRA ("MoE QLoRA not recommended, dense 31B is fine"). Their notebook uses load_in_4bit = True at >40 GB but the docs flag this as suboptimal.

2. TRL (`trl/`)

Upstream: huggingface/trl License: Apache-2.0

Gemma 4-specific scripts: NONE in examples/scripts/ as of 2026-04-18. The canonical Gemma 4 TRL example lives in huggingface-gemma-recipes/scripts/carla_vlm_gemma.py (see next section).

Closest-match Gemma 3 scripts downloaded (drop-in for Gemma 4 — change model_id to google/gemma-4-*-it, keep AutoModelForImageTextToText):

trl/sft_gemma3.py — use this as the Gemma 4 SFT template. Pure text SFT (Codeforces-COTS).
trl/sft_vlm_gemma3.py — vision SFT template (uses AutoModelForImageTextToText, all-linear LoRA).
trl/sft.py, trl/trl_scripts_sft.py — the generic SFTTrainer wrappers.
trl/sft_vlm.py — model-agnostic VLM SFT.
trl/dpo.py — DPO (1-liner using TrlParser).
trl/grpo_agent.py, trl/grpo_vlm.py — GRPO with tool-calling environments.
trl/sft_tiny_aya_tool_calling.py — tool-calling SFT pattern.

Chat template / masking detail: TRL's SFTTrainer uses tokenizer.apply_chat_template end-to-end and delegates to the tokenizer's built-in Jinja template. For google/gemma-4-*-it, that template already produces <|turn>user…<turn|>. TRL supports completion_only_loss via the SFTConfig(assistant_only_loss=True) flag (TRL ≥ 0.22), which masks anything before the assistant turn — no manual instruction_part plumbing needed.

Official HF blog says (verbatim):

"Gemma 4 is fully supported for fine-tuning with TRL. … we have prepared an example on how to fine-tune Gemma 4 with TRL on Vertex AI using SFT, to showcase how to extend the function calling capabilities, whilst freezing both the vision and audio towers." (see huggingface-recipes/hf-blog-gemma4.md §634-687)

3. Axolotl (`axolotl/`)

Upstream: axolotl-ai-cloud/axolotl, examples/gemma4/ License: Apache-2.0 Gemma 4 status: Native support shipped, day-one-class parity.

Downloaded files:

axolotl/README.md — official Axolotl Gemma 4 guide
axolotl/31b-qlora.yaml — 31B dense QLoRA, 1x80GB @ ~44 GB VRAM
axolotl/31b-qlora-flex.yaml — 31B dense QLoRA + Flex Attention, 1x80GB @ ~26 GB (40% less VRAM, 50% throughput cost)
axolotl/26b-a4b-moe-qlora.yaml — 26B MoE QLoRA + ScatterMoE expert-quantized + Expert-LoRA. Validated: 50 steps FineTome, loss 8.8→1.8, single RTX 5090 (32 GB), 21 GiB peak
axolotl/e2b-vision-lora.yaml — E2B vision LoRA with freeze_mm_modules: true

Run command (from Axolotl README):

axolotl train examples/gemma4/26b-a4b-moe-qlora.yaml
axolotl train examples/gemma4/31b-qlora.yaml
axolotl train examples/gemma4/31b-qlora-flex.yaml

Axolotl chat template & masking detail

chat_template: gemma4
datasets:
  - path: mlabonne/FineTome-100k
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value

chat_template: gemma4 (no dash — Axolotl's key is different from Unsloth's "gemma-4"). The template applies Gemma 4 turn tokens (<|turn>user … <turn|>). Masking is handled automatically by type: chat_template — only the assistant turn counts toward loss.

Axolotl hard limitations for Gemma 4 (from their README)

Flash Attention OFF. FA2 caps head_dim at 256; FA4 at 128; Gemma 4's global_head_dim=512 exceeds both. Use SDP or Flex Attention. (sdp_attention: true in every yaml.)
LoRA kernels OFF. Due to Gemma 4's shared-KV layers (last N layers reuse K/V tensors): lora_mlp_kernel: false, lora_qkv_kernel: false, lora_o_kernel: false.
lora_target_linear is incompatible for multimodal. You MUST use lora_target_modules with the regex (see below) to restrict LoRA to the text decoder and NOT the vision/audio encoders.

Axolotl's canonical regex restricts LoRA to text layers only:

model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj

For 26B A4B MoE, additionally target expert 3D tensors:

lora_target_parameters:
  - experts.gate_up_proj
  - experts.down_proj

4. Google Cookbook (`google-cookbook/`)

Upstream: google-gemma/cookbook, docs/core/ License: Apache-2.0 Gemma 4 status: The docs/core/*.ipynb fine-tuning notebooks default to google/gemma-4-E2B as model_id — they ARE the Gemma 4 path, despite generic filenames.

Downloaded files:

google-cookbook/huggingface_text_finetune_qlora.ipynb — text-to-SQL QLoRA tutorial (gretel-synthetic-text-to-sql dataset, philschmid/gretel-synthetic-text-to-sql). This is the one ai.google.dev links to as the "official" fine-tune path.
google-cookbook/huggingface_text_full_finetune.ipynb — full-weights fine-tune variant
google-cookbook/huggingface_vision_finetune_qlora.ipynb — vision QLoRA on product descriptions
google-cookbook/lora_tuning.ipynb — LoRA concepts tutorial
google-cookbook/function-calling-gemma4.ipynb — official Google function-calling notebook (not a fine-tune, but the authoritative reference for tool-call tokens)
google-cookbook/Gemma_4_HDP_Agentic_Security.ipynb + Gemma_4_HDP_README.md — full-app fine-tune example (agentic security)

Upstream URLs:

Google cookbook chat template & masking detail (VERY IMPORTANT)

The cookbook notebooks use TRL's SFTTrainer with standard messages list (role/content) — chat-template is applied automatically by the tokenizer's built-in Jinja. No manual instruction_part/response_part.

The non-obvious detail is the LoraConfig:

peft_config = LoraConfig(
    lora_alpha=16, lora_dropout=0.05, r=16, bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
    modules_to_save=["lm_head", "embed_tokens"],  # NOTE
    ensure_weight_tying=True,                      # NOTE
)

modules_to_save=["lm_head","embed_tokens"] + ensure_weight_tying=True is required because Gemma 4 introduced new special tokens (<|turn>, <|tool>, <|tool_call>, <|tool_response>, <|"|>) that need their embeddings to be trainable in a fine-tune. PEFT 0.15+ added ensure_weight_tying specifically for this case. Skipping it causes the adapter to see frozen random embeddings for the new tokens and training silently underperforms.

For vision, Google's cookbook uses plain target_modules="all-linear" (NO exclude_modules) — meaning it does train LoRA adapters on the vision tower. This is a different tradeoff from Axolotl (freeze_mm_modules: true) and from TRL's CARLA recipe (exclude_modules=["vision_tower", "multi_modal_projector"]). Pick based on whether your task needs the vision encoder to adapt (e.g., new image domain) or just the text decoder (most cases).

5. HuggingFace gemma-recipes (`huggingface-recipes/`)

Upstream: huggingface/huggingface-gemma-recipes License: Apache-2.0

Downloaded files:

huggingface-recipes/carla_vlm_gemma.py — The canonical TRL + Gemma 4 example. GRPO VLM training in a CARLA driving environment with tool calls. Shows exclude_modules=["vision_tower", "multi_modal_projector"], chat_template_kwargs={"enable_thinking": False}, max_tool_calling_iterations=10.
huggingface-recipes/Gemma4_(E2B)-Multimodal.ipynb — inference-only multimodal demo (vision, video, audio, function calling, object detection). Not a fine-tune but necessary reference for the input format the training data must match.
huggingface-recipes/README.md — HF's top-level recipes index
huggingface-recipes/hf-blog-gemma4.md — the HF blog post's raw markdown (§630-707 is the fine-tuning section)

Run command for the CARLA VLM RL example:

pip install git+https://github.com/huggingface/trl.git
python examples/scripts/openenv/carla_vlm_gemma.py \
    --env-urls https://sergiopaniego-carla-env.hf.space https://sergiopaniego-carla-env-2.hf.space \
    --model google/gemma-4-E2B-it

Known gap: HF's gemma-recipes repo has fine-tuning notebooks for Gemma 3 and Gemma 3n (free T4 Colab) but no pure-SFT Gemma 4 fine-tuning notebook yet — the Gemma 4 Colab is inference only. Their blog points users to Unsloth Studio for the easy path.

6. Ollama / llama.cpp LoRA serving (`ollama-llamacpp/`)

Downloaded: ollama-llamacpp/ollama-import-lora.md — distilled from https://docs.ollama.com/import (2026-04-18 fetch).

Short answer: Yes, you can serve a Gemma 4 LoRA via Ollama. Two paths:

Merge then serve (simpler, recommended): model.save_pretrained_merged("out", tokenizer, save_method="merged_16bit") → llama.cpp/convert_hf_to_gguf.py → llama.cpp/quantize to Q4_K_M → ollama create mymodel -f Modelfile with FROM ./gemma4-mortdecai.gguf.
Adapter-only serve: llama.cpp/convert_lora_to_gguf.py on the PEFT directory → Modelfile with FROM gemma4:e4b-it-q8_0 + ADAPTER ./adapter.gguf.

Ollama's docs list supported architectures as Llama/Mistral/Gemma 1-2 — Gemma 4 isn't explicitly listed, but llama.cpp has day-one Gemma 4 support and in practice the path works. (Vision-adapter serving via Ollama is still a grey area.)

7. Datasets the canonical tutorials pair with Gemma 4

Tutorial	Dataset	Format	Notes
Unsloth Gemma4 E4B Text	`mlabonne/FineTome-100k`	ShareGPT-style `conversations` field	Also the Axolotl default
Unsloth Gemma4 GRPO	Synthetic kernel-optimization prompts in-notebook	Python reward funcs	RL w/ `function_works` / `check_only_stdlib_imports`
Unsloth Gemma4 Vision	`unsloth/LaTeX_OCR`	HF image-text pairs	Demonstrates `UnslothVisionDataCollator`
Google cookbook text QLoRA	`philschmid/gretel-synthetic-text-to-sql`	chat `messages` list	Google's "official" demo dataset for Gemma 4
Google cookbook vision QLoRA	`philschmid/amazon-product-descriptions-vlm`	image + text pairs	Product-description generation
Axolotl Gemma 4 (all sizes)	`mlabonne/FineTome-100k`	`type: chat_template`	Validated in axolotl README
Axolotl E2B vision LoRA	`HuggingFaceH4/llava-instruct-mix-vsft`	vision-language SFT	Same as HF's VLM template
TRL sft_gemma3 (transfers)	`open-r1/codeforces-cots`	`messages` list	Chain-of-thought coding
TRL carla_vlm_gemma (Gemma 4 VLM GRPO)	CARLA simulator (live)	environment rollouts	Multimodal tool responses

No one uses Alpaca or UltraChat as the canonical Gemma 4 pair. FineTome-100k is the unofficial standard — both Unsloth and Axolotl default to it.

8. Chat-template-and-masking matrix (the debugging cheat sheet)

Framework	chat_template key	Turn tokens	Response masking API	BOS handling
Unsloth	`"gemma-4"`	`<	turn>role\n...<turn	>`
TRL	tokenizer's built-in Jinja (no key needed)	same	`SFTConfig(assistant_only_loss=True)`	Tokenizer handles automatically
Axolotl	`chat_template: gemma4` (no dash)	same	automatic via `type: chat_template`	Automatic
Google cookbook	tokenizer built-in Jinja	same	automatic via `SFTTrainer` + `messages`	Automatic

Tool tokens (<|tool>, <|tool_call>, <|tool_response>, <|"|>) ride inside message content — none of the frameworks mask them specially, and none provide a role="tool" branch in the default template. If you're training tool-call data, put the complete <|tool_call>call:{...}<tool_call|> block in the assistant message content.

Also: all Gemma 4 fine-tunes should modules_to_save=["lm_head","embed_tokens"] + ensure_weight_tying=True in LoraConfig if you're using PEFT directly, because the new special-token embeddings need to be trainable. Unsloth and Axolotl handle this for you; naïve TRL + PEFT scripts do NOT by default.

What's NOT here (and why)

Kaggle/Colab free-tier notebooks as a separate category — the Unsloth notebooks are the free-tier notebooks. E2B Text runs on a free T4; 31B/26B-A4B need A100 Colab Pro. I pulled 2 Kaggle-flavored variants to unsloth/kaggle/ for completeness.
Google's DeepMind JAX/Flax Gemma 4 fine-tune script — Google's DeepMind-gemma repo ships inference/reference code, not a SFT script. Google's canonical fine-tune path is the HF+TRL notebook in google-gemma/cookbook (above), NOT JAX. If you want JAX, see the archived .archive/Gemma/[Gemma_1]Finetune_distributed.ipynb pattern — not ported to Gemma 4.
Full-weights 31B fine-tuning commands — Axolotl's README says "heavy and has not been tested." Skip unless Seth rents an 8×H100 pod.
Prompt engineering / inference-only notebooks — per scope.

README.md Unescape Escape

Gemma 4 Fine-Tuning Tooling — Index

TL;DR

1. Unsloth (unsloth/)

Unsloth chat-template & masking detail (CRITICAL for Gemma 4)

Unsloth MoE note

2. TRL (trl/)

Official HF blog says (verbatim):

3. Axolotl (axolotl/)

Axolotl chat template & masking detail

Axolotl hard limitations for Gemma 4 (from their README)

4. Google Cookbook (google-cookbook/)

Google cookbook chat template & masking detail (VERY IMPORTANT)

5. HuggingFace gemma-recipes (huggingface-recipes/)

6. Ollama / llama.cpp LoRA serving (ollama-llamacpp/)

7. Datasets the canonical tutorials pair with Gemma 4

8. Chat-template-and-masking matrix (the debugging cheat sheet)

What's NOT here (and why)

See also

README.md

1. Unsloth (`unsloth/`)

2. TRL (`trl/`)

3. Axolotl (`axolotl/`)

4. Google Cookbook (`google-cookbook/`)

5. HuggingFace gemma-recipes (`huggingface-recipes/`)

6. Ollama / llama.cpp LoRA serving (`ollama-llamacpp/`)