docs: add canonical tooling corpus (147 files) from Google/HF/frameworks

Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:24:48 -04:00
parent 5011059f5d
commit eecebe7ef5
149 changed files with 181297 additions and 0 deletions
@@ -0,0 +1,226 @@
+# Google-official Gemma tooling (as of 2026-04-18)
+
+Downloaded corpus of canonical Google / Google-DeepMind Gemma tooling. This
+directory mirrors only **upstream-authored** material — no third-party forks,
+no community ports, no Ollama-specific content (that lives in
+`../../CORPUS_ollama_variants.md`).
+
+Reach for this directory when you need to verify what the canonical code/docs
+actually say (prompt tokens, API shapes, supported variants) versus what a
+third-party wrapper claims they say.
+
+## Top-line findings (flag for cross-check with rest of corpus)
+
+1. **Canonical JAX/Flax library (`google-deepmind/gemma`) has first-class
+   Gemma 4 support today** — `gm.nn.Gemma4_E4B()`,
+   `gm.ckpts.CheckpointPath.GEMMA4_E4B_IT`, and the unified `ChatSampler` /
+   `ToolSampler` API explicitly lists "2, 3, 3n, 4" as supported. This is the
+   least-friction Python path if you want the actual reference behavior.
+2. **`google/gemma_pytorch` has NO Gemma 4 support** as of last push
+   (2025-05-30). `scripts/run.py` validates variant in
+   `['2b', '2b-v2', '7b', '9b', '27b', '1b']`; `scripts/run_multimodal.py` in
+   `['4b', '12b', '27b_v3']` (all Gemma 3). If someone tells you to "use
+   the official PyTorch repo" for Gemma 4, they're wrong — it's stale.
+3. **`google/gemma.cpp` README says Gemma 2-3 + PaliGemma 2 only** (no Gemma 4
+   yet), but the repo is actively pushed and explicitly notes active work
+   happens on the `dev` branch. Worth rechecking `dev` for Gemma 4 support.
+4. **Gemma 4 uses a NEW prompt-token syntax** distinct from Gemma 1/2/3:
+   - Gemma 1/2/3: `<start_of_turn>` / `<end_of_turn>` (symmetric angle brackets)
+   - Gemma 4: `<|turn>` / `<turn|>` (asymmetric pipe-brackets)
+   - Plus Gemma-4-new: `<|tool>`/`<tool|>`, `<|tool_call>`/`<tool_call|>`,
+     `<|tool_response>`/`<tool_response|>`, `<|think|>`,
+     `<|channel>`/`<channel|>`, `<|image>`/`<image|>`, `<|audio>`/`<audio|>`,
+     string delimiter `<|"|>`.
+   - Roles are named directly: `system`, `user`, `model` (no role brackets).
+   This directly contradicts any chat template built against Gemma 3 tokens.
+   `CORPUS_tool_calling_format.md` already captures the tool tokens correctly
+   but does NOT yet document the turn-token change or the thinking tokens.
+5. **`gemma.cpp` ships an HTTP API server (`gemma_api_server`) that speaks
+   the Google Gemini API protocol** (`POST /v1beta/models/<model>:generateContent`,
+   SSE streaming, session management). This is a canonical Google-built
+   alternative to Ollama that implements the *real* Gemini REST API locally.
+   See `gemma-cpp/API_SERVER_README.md`.
+6. **Tool use was NOT a trained capability in Gemma 1/2/3** — the DeepMind
+   `colabs/tool_use.ipynb` explicitly disclaims: *"The Gemma 1, 2 and 3 models
+   were not specifically trained for tool use. This is more a proof-of-concept
+   than an officially supported feature."* Gemma 4 is notably absent from that
+   caveat; the cookbook and blog confirm Gemma 4 has **native function
+   calling** as a first-class trained capability.
+7. **No Gemma 4 technical-report PDF exists yet.** All conventional URLs
+   (`storage.googleapis.com/deepmind-media/gemma/Gemma4Report.pdf`,
+   `goo.gle/gemma4report`) return 404/redirect-to-google.com, and the
+   DeepMind repo README explicitly says "Gemma 4 (Coming soon)". Current
+   most-authoritative scientific document for the family is the Gemma 3
+   technical report (arXiv:2503.19786), downloaded here.
+8. **Cookbook ships a Gemma-4-specific agentic reference app**
+   (`apps/Gemma_4_HDP_Agentic_Security/`) demonstrating how to cryptographically
+   gate Gemma 4's native function calls with Ed25519-signed delegation tokens
+   (IETF draft `draft-helixar-hdp-agentic-delegation-00`). A more
+   production-shaped pattern than the toy `tool_use.ipynb`.
+
+## File index
+
+### `deepmind-gemma/` — JAX/Flax reference (the primary Python library)
+Upstream: https://github.com/google-deepmind/gemma (`main`, pushed 2026-04-17).
+
+| File | What | Why keep |
+|------|------|----------|
+| `README.md` | PyPI `gemma` package entry point | Shows canonical `gm.nn.Gemma4_E4B()` API, `ChatSampler` multi-turn/multi-modal example |
+| `example_multimodal.py` | Image-captioning fine-tune (Kauldron config) | Canonical end-to-end SFT example; docstring shows exact `<start_of_turn>user / <start_of_image> / <end_of_turn>` interleave for Gemma 3 |
+| `example_lora.py` | LoRA fine-tuning recipe | Reach for this if doing PEFT against a Gemma 4 checkpoint |
+| `example_dpo.py` | Direct Preference Optimization recipe | Reference for preference-alignment post-training |
+| `example_classification.py` | Classification fine-tune | Shows Gemma as a feature extractor |
+| `example_sharding.py` | Multi-device sharding | Reference for running >E4B on multi-GPU/TPU |
+| `colab_tool_use.ipynb` | Tool-use demo (`ToolSampler`) | Important caveat inside: "not specifically trained for tool use" for Gemma 1/2/3; shows the `gm.tools.Tool` base class API |
+| `colab_sampling.ipynb` | Basic inference / chat notebook | Starter-grade canonical sampling example |
+
+Other scripts in the repo (not downloaded, cherry-picked above): `seq2seq.py`, `npo.py`, colabs for `quantization_aware_training`, `sharding`, `tokenizer`, `multimodal`, `finetuning`, `lora_finetuning`, `lora_sampling`. Fetch directly from https://github.com/google-deepmind/gemma/tree/main when needed.
+
+### `gemma-pytorch/` — PyTorch reference (STALE for Gemma 4)
+Upstream: https://github.com/google/gemma_pytorch (`main`, pushed 2025-05-30).
+
+| File | What | Why keep |
+|------|------|----------|
+| `README.md` | Entry-point docs | Only documents up through Gemma 3; no Gemma 4 |
+| `run.py` | Text-only inference entry point | Variant whitelist `['2b','2b-v2','7b','9b','27b','1b']` — Gemma 1/2 only |
+| `run_multimodal.py` | Multimodal inference entry point | Variant whitelist `['4b','12b','27b_v3']` — Gemma 3 only. Shows exact interleaved `<start_of_turn>user\n`, image, `text, <end_of_turn>\n<start_of_turn>model` pattern |
+| `run_xla.py` | TPU/XLA inference | Reference for running Gemma 3 on TPU |
+
+**Do not reach for this repo for Gemma 4 work** until it's updated. Use the
+DeepMind JAX lib, Hugging Face `transformers`, or gemma.cpp instead.
+
+### `gemma-cpp/` — C++ reference inference
+Upstream: https://github.com/google/gemma.cpp (`main`, pushed 2026-04-17; active dev on `dev` branch).
+
+| File | What | Why keep |
+|------|------|----------|
+| `README.md` | Project overview, build instructions | States "Gemma 2-3 + PaliGemma 2" in features; Gemma 4 status unclear from `main` — check `dev` branch |
+| `API_SERVER_README.md` | HTTP API server that speaks Gemini API protocol | **Most interesting find** — canonical drop-in for apps written against the Gemini API, runs locally. `POST /v1beta/models/<model>:generateContent`, SSE streaming, session KV-cache |
+| `examples_README.md` | Pointer to `hello_world` / `simplified_gemma` minimal embedding examples | Starting point for embedding gemma.cpp into your own C++ binary |
+
+### `cookbook/` — Official recipes and end-to-end apps
+Upstream: https://github.com/google-gemma/cookbook (`main`, pushed 2026-04-17).
+**Note:** `google-gemini/gemma-cookbook` now 301-redirects here; use the
+`google-gemma/cookbook` URL going forward.
+
+| File | What | Why keep |
+|------|------|----------|
+| `README.md` | Cookbook index | Authoritative list of Gemma variants incl. Gemma 4 (E2B / E4B / 26B A4B / 31B), the ecosystem (FunctionGemma, MedGemma, PaliGemma 2, RecurrentGemma, ShieldGemma 2, T5Gemma, TranslateGemma, TxGemma, VaultGemma, EmbeddingGemma) |
+| `tutorials_RAG_EmbeddingGemma.ipynb` | RAG with EmbeddingGemma | Currently the only notebook in `tutorials/` — reflects the "latest tested" tier |
+| `docs_gemma_chat.ipynb` | Chatbot with Gemma on Keras | Documents the `__START_TURN_USER__ = "<start_of_turn>user\n"` / `__END_TURN__ = "<end_of_turn>\n"` format explicitly; Gemma 2 example, but the class is the canonical illustration of the Gemma 1/2/3 chat template |
+| `apps_Gemma4_HDP_AgenticSecurity_README.md` | README for the HDP agentic-security reference app | Gemma-4-specific demo; real production pattern for gating native function calls |
+| `apps_Gemma4_HDP_hdp_middleware.py` | Drop-in middleware (`HDPMiddleware.gate()`) | Wraps any Gemma 4 tool executor with Ed25519-signed HDT verification |
+| `apps_Gemma4_HDP_AgenticSecurity.ipynb` | Walkthrough notebook | End-to-end: load Gemma 4, issue tokens, gate function calls |
+
+Other cookbook content worth noting (not downloaded — fetch on demand):
+- `docs/capabilities/thinking.ipynb` (438 KB) — Gemma 4 thinking-mode notebook
+- `docs/capabilities/audio.ipynb` — audio-input capability
+- `docs/functiongemma/{finetuning-with-functiongemma,full-function-calling-sequence-with-functiongemma,function-calling-with-hf}.ipynb` — **FunctionGemma** is a separate fine-tune on the Gemma 3 270M IT checkpoint specifically for function calling; distinct from Gemma 4's native function calling
+- `docs/core/pytorch_gemma.ipynb`, `keras_inference.ipynb`, `huggingface_*.ipynb` — framework-specific recipes
+- `docs/integrations/langchain.ipynb` — LangChain integration
+- `experiments/{MedGemma,TxGemma}/` and `experiments/[T5Gemma]Example.ipynb`, `[VaultGemma]FineTuning_Inference_Huggingface.ipynb`, etc. — domain-specific Gemma variants
+
+### `docs/` — Canonical ai.google.dev pages (HTML cached)
+Verified URLs below; HTML snapshots saved for verbatim preservation.
+
+| File | Source URL |
+|------|-----------|
+| `ai-google-dev_core.html` | https://ai.google.dev/gemma/docs/core — Gemma 4 overview |
+| `ai-google-dev_model_card_4.html` | https://ai.google.dev/gemma/docs/core/model_card_4 — Gemma 4 model card |
+| `ai-google-dev_prompt_formatting_gemma4.html` | https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4 — **Gemma 4 prompt tokens (new `<\|turn>`/`<turn\|>` syntax)** |
+| `ai-google-dev_function_calling_gemma4.html` | https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4 — **Gemma 4 native function calling spec** |
+| `ai-google-dev_formatting.html` | https://ai.google.dev/gemma/docs/formatting — Gemma 1/2/3 prompt format (`<start_of_turn>`/`<end_of_turn>`) |
+| `blog_announcement.html` | https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ — Gemma 4 launch blog, 2026-04-02 |
+
+Other canonical doc URLs (verified to exist, not snapshotted here — visit
+directly):
+- https://ai.google.dev/gemma/docs — top-level Gemma hub
+- https://ai.google.dev/gemma/docs/releases — release history
+- https://ai.google.dev/gemma/docs/functiongemma — FunctionGemma variant
+- https://ai.google.dev/gemma/docs/core/deploy_to_cloud_run_from_ai_studio — AI Studio → Cloud Run
+- https://docs.cloud.google.com/vertex-ai/generative-ai/docs/open-models/use-gemma — Vertex AI
+- https://aistudio.google.com — AI Studio
+- https://gemma-llm.readthedocs.io — DeepMind JAX lib docs
+- https://www.kaggle.com/models/google/gemma-4 — Gemma 4 on Kaggle
+- https://huggingface.co/collections/google/gemma-4 — Gemma 4 on HF
+
+### `tech-report/`
+| File | What | Source |
+|------|------|--------|
+| `Gemma3Report.pdf` | **Gemma 3 Technical Report** (arXiv:2503.19786, 2025-03-12) | https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf |
+
+No Gemma 4 technical report exists yet. Probed paths that return 404:
+- `Gemma4Report.pdf`, `gemma4-report.pdf`, `Gemma4Report_v1.pdf` under
+  `storage.googleapis.com/deepmind-media/gemma/`
+- `goo.gle/gemma4report` (not configured — redirects to google.com)
+
+DeepMind repo README line: **"Gemma 4 (Coming soon)"**. The Gemma 3 report
+remains the most-authoritative Google-DeepMind scientific document for the
+family and is the correct citation for architecture fundamentals (Grouped-Query
+Attention with post-norm/pre-norm RMSNorm, 5:1 local/global attention layer
+interleave, 1024-token local sliding window, RoPE base 1M on global / 10k on
+local, SigLIP 400M vision encoder at 896×896 shared across 4B/12B/27B and
+frozen during training, SentencePiece tokenizer with 262k vocab shared with
+Gemini 2.0, knowledge distillation during pre-training, QAT checkpoints via
+5k-step fine-tune for int4/SFP8). Per-variant parameter counts for Gemma 3:
+1B = 698M non-embedding + 302M embedding, 4B = 3209M + 675M, 12B = 10759M +
+1012M, 27B = 25600M + 1416M.
+
+## Canonical Gemma 4 prompt format (verified 2026-04-18)
+
+**Source:** https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4 and
+https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4
+
+Note the `<|turn>` / `<turn|>` are asymmetric — opening has the pipe on the
+left, closing has the pipe on the right. Same for all paired delimiters.
+
+```
+<|turn>system
+<|think|>  (optional — activates thinking mode)
+<|tool>declaration:FUNCTION_NAME{description:<|"|>...<|"|>,parameters:{properties:{...},required:[...]}}<tool|>
+You are a helpful assistant.<turn|>
+<|turn>user
+What's the weather in Tokyo?<turn|>
+<|turn>model
+<|channel>thought
+...internal reasoning...<channel|>
+<|tool_call>call:get_current_weather{location:<|"|>Tokyo, JP<|"|>}<tool_call|>
+<|tool_response>response:get_current_weather{temperature:15,weather:<|"|>sunny<|"|>}<tool_response|>
+The current weather in Tokyo is 15 degrees and sunny.<turn|>
+```
+
+Recommended sampling (per model card, verified):
+`temperature=1.0, top_p=0.95, top_k=64`. Tokenizer vocab = **262k** (same as
+Gemini 2.0). **BOS token required** — prepend `[BOS]` / set `add_bos=True`.
+
+**Gemma 1/2/3 prompt format (different — for reference):**
+```
+<start_of_turn>user
+[message]<end_of_turn>
+<start_of_turn>model
+[response]<end_of_turn>
+```
+Gemma 1/2/3 have no trained tool-use or thinking tokens. PT models end with
+`<eos>`; IT models end with `<end_of_turn>`.
+
+## Gemma 4 variants (canonical spec from model card)
+
+| Variant | Params | Active | Context | Multimodal |
+|---------|--------|--------|---------|------------|
+| Gemma 4 E2B | 2.3B effective (5.1B w/ embeddings), 35 layers | — | 128K | text+image+audio (30s max) |
+| Gemma 4 E4B | 4.5B effective (8B w/ embeddings), 42 layers | — | 128K | text+image+audio (30s max) |
+| Gemma 4 26B A4B | 25.2B total (MoE), 30 layers | 3.8B | 256K | text+image |
+| Gemma 4 31B | 30.7B dense, 60 layers | — | 256K | text+image |
+
+All variants: Apache 2.0, base + instruction-tuned (`-it`), 140+ languages,
+native function calling, native structured JSON output. Vision encoder = 150M
+(E2B/E4B) or 550M (26B/31B). Image resolution token budgets: 70, 140, 280,
+560, 1120. Released 2026-04-02.
+
+## Fetched using
+
+All files fetched via `curl -sL` from `raw.githubusercontent.com` on
+2026-04-18. Repos enumerated via the GitHub API
+(`https://api.github.com/repos/<owner>/<repo>/contents/<path>`). Google docs
+pages fetched via WebFetch tool. No GitHub auth needed for public raw files
+(unauthenticated rate limit = 60 req/hr, sufficient for this task).