docs: add canonical tooling corpus (147 files) from Google/HF/frameworks
Five-lane parallel research pass. Each subdir under tooling/ has its own README indexing downloaded files with verified upstream sources. - google-official/: deepmind-gemma JAX examples, gemma_pytorch scripts, gemma.cpp API server docs, google-gemma/cookbook notebooks, ai.google.dev HTML snapshots, Gemma 3 tech report - huggingface/: 8 gemma-4-* model cards, chat-template .jinja files, tokenizer_config.json, transformers gemma4/ source, launch blog posts, official HF Spaces app.py - inference-frameworks/: vLLM/llama.cpp/MLX/Keras-hub/TGI/Gemini API/Vertex AI comparison, run_commands.sh with 8 working launches, 9 code snippets - gemma-family/: 12 per-variant briefs (ShieldGemma 2, CodeGemma, PaliGemma 2, Recurrent/Data/Med/TxGemma, Embedding/Translate/Function/Dolphin/SignGemma) - fine-tuning/: Unsloth Gemma 4 notebooks, Axolotl YAMLs (incl 26B-A4B MoE), TRL scripts, Google cookbook fine-tune notebooks, recipe-recommendation.md Findings that update earlier CORPUS_* docs are flagged in tooling/README.md (not applied) — notably the new <|turn>/<turn|> prompt format, gemma_pytorch abandonment, gemma.cpp Gemini-API server, transformers AutoModelForMultimodalLM, FA2 head_dim=512 break, 26B-A4B MoE quantization rules, no Gemma 4 tech report PDF yet, no Gemma-4-generation specialized siblings yet. Pre-commit secrets hook bypassed per user authorization — flagged "secrets" are base64 notebook cell outputs and example Ed25519 keys in the HDP agentic-security demo, not real credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,80 @@
|
||||
|
||||
# Welcome to the Gemma Cookbook
|
||||
This is a collection of guides and examples for [Google Gemma](https://ai.google.dev/gemma/).
|
||||
|
||||
> **Disclaimer:** Gemma is a family of developer-focused models built by Google DeepMind. This cookbook is a collection of guides and examples for Google Gemma. Please keep in mind that Gemma is an open model and can hallucinate as you build on examples in this cookbook.
|
||||
|
||||
## Repository Structure
|
||||
* [**Tutorials**](tutorials/): The latest tested notebooks for Gemma models and variants.
|
||||
* [**Apps**](apps/): Full-stack demos and complex end-to-end use cases.
|
||||
* [**Experiments**](experiments/): Research-focused model notebooks, including [TxGemma](experiments/TxGemma) and [MedGemma](experiments/MedGemma).
|
||||
* [**Responsible**](responsible/): Notebooks for responsible AI development.
|
||||
* [**Docs**](docs/): Core documentation, capabilities, and technical guides.
|
||||
* [**Archive**](.archive/): All older notebooks and historical examples.
|
||||
|
||||
## Get started with the Gemma models
|
||||
Gemma is a family of lightweight, generative artificial intelligence (AI) open models, built from the same research and technology used to create the Gemini models. The Gemma model family includes:
|
||||
* Gemma\
|
||||
The core models of the Gemma family.
|
||||
* [Gemma](https://ai.google.dev/gemma/docs/core/model_card)\
|
||||
For a variety of text generation tasks and can be further tuned for specific use cases
|
||||
* [Gemma 2](https://ai.google.dev/gemma/docs/core/model_card_2)\
|
||||
Higher-performing and more efficient, available in 2B, 9B, 27B parameter sizes
|
||||
* [Gemma 3](https://ai.google.dev/gemma/docs/core/model_card_3)\
|
||||
Longer context window and handling text and image input, available in 1B, 4B, 12B, and 27B parameter sizes
|
||||
* [Gemma 3n](https://ai.google.dev/gemma/docs/gemma-3n/model_card) \
|
||||
Designed for efficient execution on low-resource devices. Handling text, image, video, and audio input, available in E2B and E4B parameter sizes
|
||||
* [Gemma 4](https://ai.google.dev/gemma/docs/core/model_card_4)\
|
||||
Well-suited for reasoning, agentic workflows, coding, and multimodal understanding, available in E2B, E4B, 26B A4B, and 31B parameter sizes.
|
||||
* Gemma variants
|
||||
* [CodeGemma](https://ai.google.dev/gemma/docs/codegemma)\
|
||||
Fine-tuned for a variety of coding tasks
|
||||
* [DataGemma](https://ai.google.dev/gemma/docs/datagemma)\
|
||||
Fine-tuned for using Data Commons to address AI hallucinations
|
||||
* [FunctionGemma](https://ai.google.dev/gemma/docs/functiongemma)\
|
||||
Fine-tuned on Gemma 3 270M IT checkpoint for function calling
|
||||
* [MedGemma](https://developers.google.com/health-ai-developer-foundations/medgemma)
|
||||
The MedGemma collection contains Google's most capable open models for medical text and image comprehension, built on Gemma 3. Developers can use MedGemma to accelerate building healthcare-based AI applications. MedGemma comes in two variants: a 4B multimodal version and a 27B text-only version.
|
||||
* [PaliGemma](https://ai.google.dev/gemma/docs/paligemma/model-card)\
|
||||
Vision Language Model\
|
||||
For a deeper analysis of images and provide useful insights
|
||||
* [PaliGemma 2](https://ai.google.dev/gemma/docs/paligemma/model-card-2)\
|
||||
VLM which incorporates the capabilities of the Gemma 2 models
|
||||
* [RecurrentGemma](https://ai.google.dev/gemma/docs/recurrentgemma)\
|
||||
Based on [Griffin](https://arxiv.org/abs/2402.19427) architecture\
|
||||
For a variety of text generation tasks
|
||||
* [ShieldGemma](https://ai.google.dev/gemma/docs/shieldgemma/model_card)\
|
||||
Fine-tuned for evaluating the safety of text prompt input and text output responses against a set of defined safety policies
|
||||
* [ShieldGemma 2](https://ai.google.dev/gemma/docs/shieldgemma/model_card_2)\
|
||||
Fine-tuned on Gemma 3 4B IT checkpoint for image safety classification
|
||||
* [T5Gemma](https://deepmind.google/models/gemma/t5gemma)\
|
||||
A collection of encoder-decoder models that provide a strong quality-inference efficiency tradeoff
|
||||
* [TranslateGemma](https://huggingface.co/collections/google/translategemma)\
|
||||
A collection of open model designed to handle translation tasks across 55 languages
|
||||
* [TxGemma](https://deepmind.google/models/gemma/txgemma)\
|
||||
A collection of open models designed to improve the efficiency of therapeutic development
|
||||
* [VaultGemma](https://deepmind.google/models/gemma/vaultgemma)\
|
||||
An open model trained from the ground up using differential privacy to prevent memorization and leaking of training data examples
|
||||
|
||||
You can find the Gemma models on the Hugging Face Hub, Kaggle, Google Cloud Vertex AI Model Garden, and [ai.nvidia.com](https://ai.nvidia.com).
|
||||
|
||||
## Additional Resources
|
||||
* [MedGemma on Google-Health](https://github.com/Google-Health/medgemma/tree/main/notebooks) : Google-Health has additional notebooks for using MedGemma
|
||||
* [Gemma on Google Cloud](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/open-models) : GCP open models has additional notebooks for using Gemma
|
||||
|
||||
## Get help
|
||||
Ask a Gemma cookbook-related question on the [developer forum](https://discuss.ai.google.dev/c/gemma/10), or open an [issue](https://github.com/google-gemini/gemma-cookbook/issues) on GitHub.
|
||||
|
||||
## Wish list
|
||||
If you want to see additional cookbooks implemented for specific features/integrations, please open a new issue with [“Feature Request” template](https://github.com/google-gemini/gemma-cookbook/issues/new?template=feature_request.yml).
|
||||
|
||||
If you want to make contributions to the Gemma Cookbook project, you are welcome to pick any idea in the [“Wish List”](https://github.com/google-gemini/gemma-cookbook/labels/wishlist) and implement it.
|
||||
|
||||
## Contributing
|
||||
Contributions are always welcome. Please read [contributing](https://github.com/google-gemini/gemma-cookbook/blob/main/CONTRIBUTING.md) before implementation.
|
||||
|
||||
Thank you for developing with Gemma! We’re excited to see what you create.
|
||||
|
||||
## Translation of this repository
|
||||
* [Traditional Chinese](https://github.com/doggy8088/gemma-cookbook)
|
||||
* [Simplified Chinese](https://github.com/xiaoxiong1006/gemma-cookbook)
|
||||
@@ -0,0 +1,526 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "colab-badge"
|
||||
},
|
||||
"source": [
|
||||
"<table align=\"left\">\n",
|
||||
" <td>\n",
|
||||
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemma/cookbook/blob/main/apps/Gemma_4_HDP_Agentic_Security/Gemma_4_HDP_Agentic_Security.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
||||
" </td>\n",
|
||||
"</table>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "byline"
|
||||
},
|
||||
"source": [
|
||||
"# Securing Gemma 4 Agentic Workflows with HDP\n",
|
||||
"\n",
|
||||
"**Author:** Asiri Dalugoda, Helixar Limited ([@asiridalugoda](https://github.com/asiridalugoda)) | [helixar.ai](https://helixar.ai)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "gpu-instructions"
|
||||
},
|
||||
"source": [
|
||||
"## Before you begin\n",
|
||||
"\n",
|
||||
"This notebook requires a GPU runtime. To enable GPU in Colab:\n",
|
||||
"1. Go to **Runtime → Change runtime type**\n",
|
||||
"2. Set **Hardware accelerator** to **GPU** (T4 is sufficient for E4B)\n",
|
||||
"3. Click **Save**\n",
|
||||
"\n",
|
||||
"You will also need a **Hugging Face token** to download Gemma 4 (gated model):\n",
|
||||
"1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)\n",
|
||||
"2. Create a token with **Read** access\n",
|
||||
"3. Accept the Gemma 4 model license at [huggingface.co/google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it)\n",
|
||||
"4. Run the cell below to authenticate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "hf-login"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from huggingface_hub import notebook_login\n",
|
||||
"notebook_login()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "overview"
|
||||
},
|
||||
"source": [
|
||||
"# Securing Gemma 4 Agentic Workflows with HDP\n",
|
||||
"\n",
|
||||
"**Human Delegation Provenance (HDP)** is an open protocol that adds a cryptographic chain-of-custody to AI agent function calls — ensuring every tool invocation can be traced back to an authorized human principal.\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to integrate HDP with Gemma 4's native function-calling capability to:\n",
|
||||
"\n",
|
||||
"- **Verify** that Gemma 4's function calls were authorized by a human principal before execution\n",
|
||||
"- **Classify** actions by irreversibility (read-only → irreversible → physical actuation)\n",
|
||||
"- **Block** unauthorized or out-of-scope tool calls at the middleware layer\n",
|
||||
"- **Audit** every decision with a pre-execution log\n",
|
||||
"\n",
|
||||
"This is particularly relevant for Gemma 4 deployments on edge devices (Jetson Nano, Raspberry Pi) where the model may be directing physical actuators offline with no out-of-band authorization check.\n",
|
||||
"\n",
|
||||
"**References:**\n",
|
||||
"- HDP IETF draft: [draft-helixar-hdp-agentic-delegation-00](https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/)\n",
|
||||
"- HDP-P (physical AI agents): [DOI 10.5281/ZENODO.19332440](https://doi.org/10.5281/ZENODO.19332440)\n",
|
||||
"- Helixar: [helixar.ai](https://helixar.ai)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "b3600ee25c8e"
|
||||
},
|
||||
"source": [
|
||||
"## Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "7a80251f52b3"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install -q transformers torch cryptography"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "ed80fe18f255"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Download the middleware\n",
|
||||
"!wget -q https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/apps/Gemma_4_HDP_Agentic_Security/hdp_middleware.py\n",
|
||||
"\n",
|
||||
"from hdp_middleware import (\n",
|
||||
" HDPDelegationToken,\n",
|
||||
" HDPMiddleware,\n",
|
||||
" IrreversibilityClass,\n",
|
||||
" DEFAULT_TOOL_CLASS_MAP,\n",
|
||||
")\n",
|
||||
"from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey\n",
|
||||
"import json"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "e88bdc7b7265"
|
||||
},
|
||||
"source": [
|
||||
"## 1. Load Gemma 4\n",
|
||||
"\n",
|
||||
"We use the 4B Effective model for this demo. For production agentic deployments, the 26B MoE or 31B Dense models are recommended."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "1e4e7779806d"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from transformers import pipeline\n",
|
||||
"\n",
|
||||
"# For edge/robotics use cases: swap to google/gemma-4-E2B-it\n",
|
||||
"MODEL_ID = \"google/gemma-4-E4B-it\"\n",
|
||||
"\n",
|
||||
"pipe = pipeline(\n",
|
||||
" \"text-generation\",\n",
|
||||
" model=MODEL_ID,\n",
|
||||
" device_map=\"auto\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "d91e36cfb0b2"
|
||||
},
|
||||
"source": [
|
||||
"## 2. Define Tools\n",
|
||||
"\n",
|
||||
"Gemma 4 uses structured JSON function-calling. We define a tool set spanning different IrreversibilityClasses to demonstrate the middleware's classification behaviour."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "1becdb52e7f8"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"TOOLS = [\n",
|
||||
" {\n",
|
||||
" \"name\": \"get_weather\",\n",
|
||||
" \"description\": \"Get the current weather for a location.\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"location\": {\"type\": \"string\", \"description\": \"City name\"}\n",
|
||||
" },\n",
|
||||
" \"required\": [\"location\"]\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"send_email\",\n",
|
||||
" \"description\": \"Send an email to a recipient.\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"to\": {\"type\": \"string\"},\n",
|
||||
" \"subject\": {\"type\": \"string\"},\n",
|
||||
" \"body\": {\"type\": \"string\"}\n",
|
||||
" },\n",
|
||||
" \"required\": [\"to\", \"subject\", \"body\"]\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"delete_file\",\n",
|
||||
" \"description\": \"Permanently delete a file by path.\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"path\": {\"type\": \"string\"}\n",
|
||||
" },\n",
|
||||
" \"required\": [\"path\"]\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"actuate_robot_arm\",\n",
|
||||
" \"description\": \"Command a robot arm to move to a target position.\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"joint_angles\": {\"type\": \"array\", \"items\": {\"type\": \"number\"}},\n",
|
||||
" \"force_limit_n\": {\"type\": \"number\"}\n",
|
||||
" },\n",
|
||||
" \"required\": [\"joint_angles\"]\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# Tools indexed by name for lookup\n",
|
||||
"TOOL_REGISTRY = {t[\"name\"]: t for t in TOOLS}\n",
|
||||
"print(f\"Registered {len(TOOLS)} tools\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "722948b00a92"
|
||||
},
|
||||
"source": [
|
||||
"## 3. Issue an HDP Delegation Token\n",
|
||||
"\n",
|
||||
"The human principal generates an Ed25519 keypair and issues an HDT that specifies:\n",
|
||||
"- Which tools the agent is permitted to call\n",
|
||||
"- The maximum IrreversibilityClass the agent can act on\n",
|
||||
"- The token's lifetime"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "b0622c68dfa5"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Human principal generates their signing keypair\n",
|
||||
"# In production: loaded from secure key storage (HSM, OS keychain, etc.)\n",
|
||||
"principal_private_key = Ed25519PrivateKey.generate()\n",
|
||||
"principal_public_key = principal_private_key.public_key()\n",
|
||||
"\n",
|
||||
"# Issue an HDT authorizing the Gemma 4 agent to call weather queries\n",
|
||||
"# and send emails (Class 0 and Class 2), but NOT delete files or actuate hardware\n",
|
||||
"token = HDPDelegationToken.issue(\n",
|
||||
" principal_id=\"alice@example.com\",\n",
|
||||
" agent_id=\"gemma4-agent-01\",\n",
|
||||
" scope=[\"get_weather\", \"send_email\"],\n",
|
||||
" max_class=IrreversibilityClass.CLASS_2,\n",
|
||||
" ttl_seconds=3600,\n",
|
||||
" private_key=principal_private_key,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(json.dumps(token.to_dict(), indent=2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "e206f950f4bc"
|
||||
},
|
||||
"source": [
|
||||
"## 4. Initialise the HDP Middleware\n",
|
||||
"\n",
|
||||
"The middleware takes the principal's **public key** only — it verifies but cannot issue tokens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "e24676f528bf"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"audit_log = []\n",
|
||||
"\n",
|
||||
"# Confirmation callback for Class 2 (irreversible) actions.\n",
|
||||
"# In production: this would invoke a push notification, SMS OTP,\n",
|
||||
"# or hardware confirmation device to the human principal.\n",
|
||||
"def require_human_confirmation(tool_name: str, parameters: dict) -> bool:\n",
|
||||
" print(f\"\\n⚠️ Class 2 action requested: {tool_name}\")\n",
|
||||
" print(f\" Parameters: {json.dumps(parameters, indent=4)}\")\n",
|
||||
" response = input(\" Confirm? [y/N]: \").strip().lower()\n",
|
||||
" return response == \"y\"\n",
|
||||
"\n",
|
||||
"middleware = HDPMiddleware(\n",
|
||||
" public_key=principal_public_key,\n",
|
||||
" tool_class_map=DEFAULT_TOOL_CLASS_MAP,\n",
|
||||
" confirmation_callback=require_human_confirmation,\n",
|
||||
" audit_log=audit_log,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(\"HDP middleware initialised.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "72d56542eba0"
|
||||
},
|
||||
"source": [
|
||||
"## 5. Gemma 4 Function Call → HDP Gate → Tool Execution\n",
|
||||
"\n",
|
||||
"This is the core integration pattern. Every function call Gemma 4 generates is passed through `middleware.gate()` before being forwarded to tool execution."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "da20bc191e71"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Simulated Gemma 4 function call outputs\n",
|
||||
"# In production these come from parsing Gemma 4's structured JSON output\n",
|
||||
"gemma_function_calls = [\n",
|
||||
" # ✅ Should ALLOW — Class 0, in scope\n",
|
||||
" {\"name\": \"get_weather\", \"parameters\": {\"location\": \"Auckland\"}},\n",
|
||||
"\n",
|
||||
" # ⚠️ Should CONFIRM then ALLOW — Class 2, in scope\n",
|
||||
" {\"name\": \"send_email\", \"parameters\": {\n",
|
||||
" \"to\": \"bob@example.com\",\n",
|
||||
" \"subject\": \"Weekly report\",\n",
|
||||
" \"body\": \"Please find attached.\"\n",
|
||||
" }},\n",
|
||||
"\n",
|
||||
" # ❌ Should BLOCK — Class 2, NOT in HDT scope\n",
|
||||
" {\"name\": \"delete_file\", \"parameters\": {\"path\": \"/data/important.csv\"}},\n",
|
||||
"\n",
|
||||
" # ❌ Should BLOCK — Class 3, physical actuation\n",
|
||||
" {\"name\": \"actuate_robot_arm\", \"parameters\": {\n",
|
||||
" \"joint_angles\": [0.0, -1.57, 0.0, -1.57, 0.0, 0.0],\n",
|
||||
" \"force_limit_n\": 50.0\n",
|
||||
" }},\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"print(\"=\" * 60)\n",
|
||||
"print(\"HDP VERIFICATION RESULTS\")\n",
|
||||
"print(\"=\" * 60)\n",
|
||||
"\n",
|
||||
"for call in gemma_function_calls:\n",
|
||||
" result = middleware.gate(call, token)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "be0d0dd05bce"
|
||||
},
|
||||
"source": [
|
||||
"## 6. Audit Log\n",
|
||||
"\n",
|
||||
"Every decision is logged pre-execution. This is the HDP audit trail — a cryptographically linked record of what was authorized, by whom, and when."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "e6dbab6d88d1"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"\\nAUDIT LOG\")\n",
|
||||
"print(\"-\" * 60)\n",
|
||||
"for i, entry in enumerate(audit_log):\n",
|
||||
" status = \"✅ ALLOWED\" if entry.allowed else \"❌ BLOCKED\"\n",
|
||||
" print(f\"{i+1}. {status} | {entry.tool_name} | {entry.action_class.name} | {entry.reason}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "bcadcb7040db"
|
||||
},
|
||||
"source": [
|
||||
"## 7. Token Expiry and Scope Violation Demo\n",
|
||||
"\n",
|
||||
"Demonstrate that expired tokens and out-of-scope calls are blocked regardless of the action class."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "deb2e3b6b20e"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"\n",
|
||||
"# Issue a token that's already expired\n",
|
||||
"expired_token = HDPDelegationToken.issue(\n",
|
||||
" principal_id=\"alice@example.com\",\n",
|
||||
" agent_id=\"gemma4-agent-01\",\n",
|
||||
" scope=[\"get_weather\"],\n",
|
||||
" max_class=IrreversibilityClass.CLASS_0,\n",
|
||||
" ttl_seconds=-1, # expired immediately\n",
|
||||
" private_key=principal_private_key,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(\"Testing expired token:\")\n",
|
||||
"middleware.gate({\"name\": \"get_weather\", \"parameters\": {\"location\": \"Auckland\"}}, expired_token)\n",
|
||||
"\n",
|
||||
"print(\"\\nTesting call outside HDT scope:\")\n",
|
||||
"middleware.gate({\"name\": \"delete_file\", \"parameters\": {\"path\": \"/etc/passwd\"}}, token)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "b8f4acddb6fa"
|
||||
},
|
||||
"source": [
|
||||
"## 8. Edge / Robotics Deployment (HDP-P)\n",
|
||||
"\n",
|
||||
"For Gemma 4 E2B/E4B running on Jetson Nano or Raspberry Pi and directing physical actuators, use the HDP-P extension. The key additions are:\n",
|
||||
"\n",
|
||||
"- **Embodiment context** — bind the token to a specific hardware ID\n",
|
||||
"- **Policy attestation** — hash the deployed model weights into the token\n",
|
||||
"- **Fleet delegation constraints** — prevent lateral movement across robot fleet\n",
|
||||
"- **Pre-execution logging** — write audit records *before* actuator commands are issued\n",
|
||||
"\n",
|
||||
"See the [HDP-P specification](https://doi.org/10.5281/ZENODO.19332440) for the full EDT extension structure."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "fcf7b451d175"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Minimal HDP-P Embodied Delegation Token (EDT) extension example\n",
|
||||
"# This shows how to attach physical constraints to an HDT\n",
|
||||
"\n",
|
||||
"hdp_p_extension = {\n",
|
||||
" \"hdp-p\": {\n",
|
||||
" \"version\": \"0.1\",\n",
|
||||
" \"embodiment\": {\n",
|
||||
" \"type\": \"mobile\",\n",
|
||||
" \"platform\": \"raspberry-pi-5\",\n",
|
||||
" \"hardware_id\": \"rpi-serial-XXXX\", # TPM-attested in production\n",
|
||||
" \"workspace\": \"lab-zone-a\"\n",
|
||||
" },\n",
|
||||
" \"action_scope\": {\n",
|
||||
" \"permitted_actions\": [\"move_base\", \"read_sensor\"],\n",
|
||||
" \"excluded_zones\": [\"human-workspace\"],\n",
|
||||
" \"force_limit_n\": 10.0,\n",
|
||||
" \"max_velocity_ms\": 0.5\n",
|
||||
" },\n",
|
||||
" \"irreversibility\": {\n",
|
||||
" \"max_class\": 1, # Class 1 max for this token\n",
|
||||
" \"class2_requires_confirmation\": True,\n",
|
||||
" \"class3_prohibited\": True\n",
|
||||
" },\n",
|
||||
" \"policy_attestation\": {\n",
|
||||
" \"policy_hash\": \"sha256:abc123...\", # SHA-256 of deployed model weights\n",
|
||||
" \"training_run_id\": \"gemma4-e2b-it\",\n",
|
||||
" \"sim_validated\": True\n",
|
||||
" },\n",
|
||||
" \"delegation_scope\": {\n",
|
||||
" \"fleet_delegation_permitted\": False, # No lateral movement\n",
|
||||
" \"max_delegation_depth\": 0\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"print(\"HDP-P EDT extension structure:\")\n",
|
||||
"print(json.dumps(hdp_p_extension, indent=2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "b0af7c701dfc"
|
||||
},
|
||||
"source": [
|
||||
"## Summary\n",
|
||||
"\n",
|
||||
"| Layer | What it solves | Tool |\n",
|
||||
"|---|---|---|\n",
|
||||
"| Gemma 4 function calling | Model generates structured tool calls | `pipeline(\"text-generation\")` |\n",
|
||||
"| HDP middleware | Was this call authorized by a human? | `HDPMiddleware.gate()` |\n",
|
||||
"| HDP-P EDT extension | Is this physical action within delegated bounds? | `hdp_p_extension` |\n",
|
||||
"| Audit log | Pre-execution record of every decision | `audit_log` |\n",
|
||||
"\n",
|
||||
"The full HDP specification (IETF draft), HDP-P companion paper, TypeScript SDK, and Python bindings are available at:\n",
|
||||
"\n",
|
||||
"- **IETF draft:** https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/\n",
|
||||
"- **HDP-P paper:** https://doi.org/10.5281/ZENODO.19332440\n",
|
||||
"- **GitHub:** https://github.com/Helixar-AI\n",
|
||||
"- **Site:** https://helixar.ai"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"name": "Gemma_4_HDP_Agentic_Security.ipynb",
|
||||
"toc_visible": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -0,0 +1,75 @@
|
||||
# Gemma 4 + HDP: Securing Agentic Function Calls
|
||||
|
||||
This example demonstrates how to integrate the **Human Delegation Provenance (HDP)** protocol with **Gemma 4's native function-calling** to cryptographically verify that every tool invocation was authorized by a human principal before execution.
|
||||
|
||||
## The problem
|
||||
|
||||
Gemma 4 is purpose-built for agentic workflows. Its native function-calling lets it autonomously call tools and APIs across multi-step plans — on anything from a cloud workstation to a Raspberry Pi running a robot offline.
|
||||
|
||||
This creates a gap: when Gemma 4 generates a function call, there is no verifiable record that a human principal authorized that specific action. An injected prompt, a compromised system prompt, or a lateral pivot from another agent can trigger function calls that are indistinguishable from legitimate requests at the tool interface.
|
||||
|
||||
HDP closes this gap.
|
||||
|
||||
## What HDP does
|
||||
|
||||
HDP (IETF draft: `draft-helixar-hdp-agentic-delegation-00`) provides:
|
||||
|
||||
- **Ed25519-signed Delegation Tokens (HDTs)** issued by a human principal
|
||||
- **Scope constraints** — which tools the agent is permitted to call
|
||||
- **Irreversibility classification** (Class 0–3) — from read-only to physical actuation
|
||||
- **Pre-execution verification** — the middleware gate runs *before* any tool executes
|
||||
- **Audit log** — a tamper-evident record of every authorization decision
|
||||
|
||||
For Gemma 4 on **edge devices directing physical actuators** (Jetson Nano, Raspberry Pi + robot arm), the HDP-P companion specification adds embodiment constraints, policy attestation, and fleet delegation controls.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Description |
|
||||
|---|---|
|
||||
| `Gemma_4_HDP_Agentic_Security.ipynb` | Full walkthrough notebook — load Gemma 4, issue tokens, gate function calls |
|
||||
| `hdp_middleware.py` | Drop-in middleware — `HDPMiddleware.gate()` wraps any Gemma 4 tool executor |
|
||||
|
||||
## Quick start
|
||||
|
||||
```python
|
||||
from hdp_middleware import HDPDelegationToken, HDPMiddleware, IrreversibilityClass
|
||||
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
|
||||
|
||||
# Human principal issues a delegation token
|
||||
private_key = Ed25519PrivateKey.generate()
|
||||
token = HDPDelegationToken.issue(
|
||||
principal_id="alice@example.com",
|
||||
agent_id="gemma4-agent-01",
|
||||
scope=["get_weather", "send_email"],
|
||||
max_class=IrreversibilityClass.CLASS_2,
|
||||
ttl_seconds=3600,
|
||||
private_key=private_key,
|
||||
)
|
||||
|
||||
# Middleware verifies every Gemma 4 function call before execution
|
||||
middleware = HDPMiddleware(public_key=private_key.public_key())
|
||||
|
||||
result = middleware.gate(
|
||||
function_call={"name": "send_email", "parameters": {"to": "bob@example.com", ...}},
|
||||
token=token,
|
||||
)
|
||||
|
||||
if result.allowed:
|
||||
execute_tool(function_call)
|
||||
```
|
||||
|
||||
## Irreversibility classes
|
||||
|
||||
| Class | Definition | Authorization |
|
||||
|---|---|---|
|
||||
| 0 | Fully reversible — reads, queries | HDT sufficient |
|
||||
| 1 | Reversible with effort — writes, moves | HDT sufficient |
|
||||
| 2 | Irreversible — send, delete, publish | HDT + principal confirmation |
|
||||
| 3 | Irreversible + potentially harmful — physical actuation | Dual-principal required (HDP-P) |
|
||||
|
||||
## References
|
||||
|
||||
- **IETF draft:** https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/
|
||||
- **Zenodo DOI:** https://doi.org/10.5281/zenodo.19332023
|
||||
- **HDP-P (physical AI):** https://doi.org/10.5281/ZENODO.19332440
|
||||
- **Helixar:** https://helixar.ai
|
||||
@@ -0,0 +1,390 @@
|
||||
"""
|
||||
HDP (Human Delegation Provenance) middleware for Gemma 4 function calling.
|
||||
|
||||
Intercepts Gemma 4 function call outputs and verifies that a valid HDP
|
||||
Delegation Token (HDT) authorizes the requested action before forwarding
|
||||
to the tool execution layer.
|
||||
|
||||
Reference: draft-helixar-hdp-agentic-delegation-00
|
||||
https://datatracker.ietf.org/doc/draft-helixar-hdp-agentic-delegation/
|
||||
DOI: 10.5281/zenodo.19332023
|
||||
|
||||
For physical AI agents (robots, edge devices), see HDP-P:
|
||||
DOI: 10.5281/ZENODO.19332440
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
import base64
|
||||
import hashlib
|
||||
import hmac
|
||||
from dataclasses import dataclass, field
|
||||
from enum import IntEnum
|
||||
from typing import Optional, Callable, Any
|
||||
from cryptography.hazmat.primitives.asymmetric.ed25519 import (
|
||||
Ed25519PrivateKey,
|
||||
Ed25519PublicKey,
|
||||
)
|
||||
from cryptography.hazmat.primitives.serialization import (
|
||||
Encoding,
|
||||
PublicFormat,
|
||||
PrivateFormat,
|
||||
NoEncryption,
|
||||
)
|
||||
from cryptography.exceptions import InvalidSignature
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Irreversibility Classes (HDP-P §4.2)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class IrreversibilityClass(IntEnum):
|
||||
"""
|
||||
Classification of physical action reversibility (HDP-P §4.2).
|
||||
|
||||
For digital-only Gemma 4 deployments, all tool calls are Class 0 or 1.
|
||||
For edge/robotics deployments (Jetson Nano, Raspberry Pi + actuators),
|
||||
Class 2 and 3 require explicit pre-execution confirmation.
|
||||
"""
|
||||
CLASS_0 = 0 # Fully reversible — read-only, query, observe
|
||||
CLASS_1 = 1 # Reversible with effort — write, create, move
|
||||
CLASS_2 = 2 # Irreversible under normal conditions — delete, send, publish
|
||||
CLASS_3 = 3 # Irreversible and potentially harmful — physical actuation
|
||||
|
||||
|
||||
# Default tool → irreversibility class mapping.
|
||||
# Deployments should override this for their specific tool set.
|
||||
DEFAULT_TOOL_CLASS_MAP: dict[str, IrreversibilityClass] = {
|
||||
# Class 0 — safe reads
|
||||
"get_weather": IrreversibilityClass.CLASS_0,
|
||||
"search_web": IrreversibilityClass.CLASS_0,
|
||||
"read_file": IrreversibilityClass.CLASS_0,
|
||||
"query_database": IrreversibilityClass.CLASS_0,
|
||||
# Class 1 — reversible writes
|
||||
"write_file": IrreversibilityClass.CLASS_1,
|
||||
"create_record": IrreversibilityClass.CLASS_1,
|
||||
"move_object": IrreversibilityClass.CLASS_1,
|
||||
# Class 2 — irreversible digital actions
|
||||
"send_email": IrreversibilityClass.CLASS_2,
|
||||
"delete_file": IrreversibilityClass.CLASS_2,
|
||||
"publish_post": IrreversibilityClass.CLASS_2,
|
||||
"execute_transaction": IrreversibilityClass.CLASS_2,
|
||||
# Class 3 — physical actuation (HDP-P scope)
|
||||
"actuate_robot_arm": IrreversibilityClass.CLASS_3,
|
||||
"command_vehicle": IrreversibilityClass.CLASS_3,
|
||||
"dispense_fluid": IrreversibilityClass.CLASS_3,
|
||||
"apply_force": IrreversibilityClass.CLASS_3,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# HDP Delegation Token (HDT)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class HDPDelegationToken:
|
||||
"""
|
||||
Simplified HDT structure derived from draft-helixar-hdp-agentic-delegation-00.
|
||||
|
||||
In production, HDTs are JOSE/JWT tokens signed with Ed25519.
|
||||
This implementation provides the core claims structure and verification logic.
|
||||
|
||||
Claims:
|
||||
iss — issuer (human principal identifier)
|
||||
sub — subject (agent being delegated to)
|
||||
iat — issued at (unix timestamp)
|
||||
exp — expiry (unix timestamp)
|
||||
scope — list of permitted tool names or wildcard patterns
|
||||
max_irreversibility_class — ceiling on action class (0–3)
|
||||
delegation_depth — remaining delegation hops permitted
|
||||
nonce — replay-attack prevention
|
||||
"""
|
||||
iss: str
|
||||
sub: str
|
||||
iat: int
|
||||
exp: int
|
||||
scope: list[str]
|
||||
max_irreversibility_class: IrreversibilityClass
|
||||
delegation_depth: int = 1
|
||||
nonce: str = ""
|
||||
_signature: bytes = field(default=b"", repr=False)
|
||||
_public_key: Optional[Ed25519PublicKey] = field(default=None, repr=False)
|
||||
|
||||
@classmethod
|
||||
def issue(
|
||||
cls,
|
||||
principal_id: str,
|
||||
agent_id: str,
|
||||
scope: list[str],
|
||||
max_class: IrreversibilityClass,
|
||||
ttl_seconds: int = 3600,
|
||||
delegation_depth: int = 1,
|
||||
private_key: Optional[Ed25519PrivateKey] = None,
|
||||
) -> "HDPDelegationToken":
|
||||
"""
|
||||
Issue a new HDT signed by the human principal's Ed25519 private key.
|
||||
|
||||
Args:
|
||||
principal_id: Human principal identifier (e.g. "alice@example.com")
|
||||
agent_id: Agent being delegated to (e.g. "gemma4-agent-01")
|
||||
scope: List of permitted tool names. Use ["*"] for unrestricted.
|
||||
max_class: Maximum IrreversibilityClass this token permits.
|
||||
ttl_seconds: Token lifetime in seconds.
|
||||
delegation_depth: How many times this token can be re-delegated.
|
||||
private_key: Ed25519 private key for signing. Generated if None.
|
||||
"""
|
||||
now = int(time.time())
|
||||
nonce = base64.urlsafe_b64encode(
|
||||
hashlib.sha256(f"{principal_id}{now}".encode()).digest()[:16]
|
||||
).decode()
|
||||
|
||||
token = cls(
|
||||
iss=principal_id,
|
||||
sub=agent_id,
|
||||
iat=now,
|
||||
exp=now + ttl_seconds,
|
||||
scope=scope,
|
||||
max_irreversibility_class=max_class,
|
||||
delegation_depth=delegation_depth,
|
||||
nonce=nonce,
|
||||
)
|
||||
|
||||
if private_key is None:
|
||||
private_key = Ed25519PrivateKey.generate()
|
||||
|
||||
token._public_key = private_key.public_key()
|
||||
token._signature = private_key.sign(token._canonical_bytes())
|
||||
return token
|
||||
|
||||
def _canonical_bytes(self) -> bytes:
|
||||
"""Deterministic serialisation for signing/verification."""
|
||||
payload = {
|
||||
"iss": self.iss,
|
||||
"sub": self.sub,
|
||||
"iat": self.iat,
|
||||
"exp": self.exp,
|
||||
"scope": sorted(self.scope),
|
||||
"max_irreversibility_class": int(self.max_irreversibility_class),
|
||||
"delegation_depth": self.delegation_depth,
|
||||
"nonce": self.nonce,
|
||||
}
|
||||
return json.dumps(payload, sort_keys=True, separators=(",", ":")).encode()
|
||||
|
||||
def verify(self, public_key: Ed25519PublicKey) -> bool:
|
||||
"""Verify the token's Ed25519 signature."""
|
||||
try:
|
||||
public_key.verify(self._signature, self._canonical_bytes())
|
||||
return True
|
||||
except InvalidSignature:
|
||||
return False
|
||||
|
||||
def is_expired(self) -> bool:
|
||||
return int(time.time()) > self.exp
|
||||
|
||||
def permits_tool(self, tool_name: str) -> bool:
|
||||
"""Check whether this token's scope covers the requested tool."""
|
||||
if "*" in self.scope:
|
||||
return True
|
||||
return tool_name in self.scope
|
||||
|
||||
def permits_class(self, action_class: IrreversibilityClass) -> bool:
|
||||
return action_class <= self.max_irreversibility_class
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"iss": self.iss,
|
||||
"sub": self.sub,
|
||||
"iat": self.iat,
|
||||
"exp": self.exp,
|
||||
"scope": self.scope,
|
||||
"max_irreversibility_class": int(self.max_irreversibility_class),
|
||||
"delegation_depth": self.delegation_depth,
|
||||
"nonce": self.nonce,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Verification result
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class VerificationResult:
|
||||
allowed: bool
|
||||
reason: str
|
||||
tool_name: str
|
||||
action_class: IrreversibilityClass
|
||||
token_iss: Optional[str] = None
|
||||
requires_confirmation: bool = False
|
||||
|
||||
def __str__(self) -> str:
|
||||
status = "ALLOWED" if self.allowed else "BLOCKED"
|
||||
conf = " [CONFIRMATION REQUIRED]" if self.requires_confirmation else ""
|
||||
return (
|
||||
f"[HDP] {status}{conf} — tool={self.tool_name} "
|
||||
f"class={self.action_class.name} reason={self.reason}"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# HDP Middleware
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class HDPMiddleware:
|
||||
"""
|
||||
HDP verification gate for Gemma 4 function calls.
|
||||
|
||||
Sits between Gemma 4's function-call output and the tool execution layer.
|
||||
For each function call Gemma 4 generates, this middleware:
|
||||
|
||||
1. Parses the tool name from the function call.
|
||||
2. Looks up its IrreversibilityClass.
|
||||
3. Verifies the attached HDT (signature, expiry, scope, class ceiling).
|
||||
4. For Class 2 actions, invokes the confirmation callback.
|
||||
5. Blocks Class 3 actions unless explicitly pre-authorized with
|
||||
dual verification (HDP-P §5.4).
|
||||
6. Logs all decisions before forwarding or blocking.
|
||||
|
||||
Usage:
|
||||
middleware = HDPMiddleware(
|
||||
public_key=principal_public_key,
|
||||
tool_class_map=DEFAULT_TOOL_CLASS_MAP,
|
||||
confirmation_callback=my_confirmation_fn,
|
||||
)
|
||||
|
||||
# Wrap your tool executor:
|
||||
result = middleware.gate(
|
||||
function_call=gemma_output, # {"name": "...", "parameters": {...}}
|
||||
token=hdp_token,
|
||||
)
|
||||
|
||||
if result.allowed:
|
||||
output = execute_tool(function_call)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
public_key: Ed25519PublicKey,
|
||||
tool_class_map: dict[str, IrreversibilityClass] = None,
|
||||
confirmation_callback: Optional[Callable[[str, dict], bool]] = None,
|
||||
default_class: IrreversibilityClass = IrreversibilityClass.CLASS_1,
|
||||
audit_log: Optional[list] = None,
|
||||
):
|
||||
"""
|
||||
Args:
|
||||
public_key: Principal's Ed25519 public key for HDT verification.
|
||||
tool_class_map: Mapping of tool names to IrreversibilityClass.
|
||||
Defaults to DEFAULT_TOOL_CLASS_MAP.
|
||||
confirmation_callback: Called for Class 2 actions. Receives
|
||||
(tool_name, parameters) and returns bool.
|
||||
If None, Class 2 actions are blocked.
|
||||
default_class: Class assigned to unknown tools. Defaults to CLASS_1.
|
||||
audit_log: Optional list to append VerificationResult records to.
|
||||
"""
|
||||
self.public_key = public_key
|
||||
self.tool_class_map = tool_class_map or DEFAULT_TOOL_CLASS_MAP
|
||||
self.confirmation_callback = confirmation_callback
|
||||
self.default_class = default_class
|
||||
self.audit_log = audit_log if audit_log is not None else []
|
||||
|
||||
def classify(self, tool_name: str) -> IrreversibilityClass:
|
||||
"""Return the IrreversibilityClass for a tool name."""
|
||||
return self.tool_class_map.get(tool_name, self.default_class)
|
||||
|
||||
def gate(
|
||||
self,
|
||||
function_call: dict,
|
||||
token: HDPDelegationToken,
|
||||
) -> VerificationResult:
|
||||
"""
|
||||
Main verification gate. Call this for every Gemma 4 function call.
|
||||
|
||||
Args:
|
||||
function_call: Gemma 4 function call dict:
|
||||
{"name": "tool_name", "parameters": {...}}
|
||||
token: HDPDelegationToken issued by the human principal.
|
||||
|
||||
Returns:
|
||||
VerificationResult — check .allowed before executing the tool.
|
||||
"""
|
||||
tool_name = function_call.get("name", "")
|
||||
parameters = function_call.get("parameters", {})
|
||||
action_class = self.classify(tool_name)
|
||||
|
||||
def _block(reason: str) -> VerificationResult:
|
||||
result = VerificationResult(
|
||||
allowed=False,
|
||||
reason=reason,
|
||||
tool_name=tool_name,
|
||||
action_class=action_class,
|
||||
token_iss=token.iss if token else None,
|
||||
)
|
||||
self.audit_log.append(result)
|
||||
print(result)
|
||||
return result
|
||||
|
||||
def _allow(reason: str, requires_confirmation: bool = False) -> VerificationResult:
|
||||
result = VerificationResult(
|
||||
allowed=True,
|
||||
reason=reason,
|
||||
tool_name=tool_name,
|
||||
action_class=action_class,
|
||||
token_iss=token.iss,
|
||||
requires_confirmation=requires_confirmation,
|
||||
)
|
||||
self.audit_log.append(result)
|
||||
print(result)
|
||||
return result
|
||||
|
||||
# ── 1. Token presence ───────────────────────────────────────────────
|
||||
if token is None:
|
||||
return _block("no HDT present")
|
||||
|
||||
# ── 2. Expiry ───────────────────────────────────────────────────────
|
||||
if token.is_expired():
|
||||
return _block("HDT expired")
|
||||
|
||||
# ── 3. Signature ────────────────────────────────────────────────────
|
||||
if not token.verify(self.public_key):
|
||||
return _block("HDT signature invalid")
|
||||
|
||||
# ── 4. Scope ────────────────────────────────────────────────────────
|
||||
if not token.permits_tool(tool_name):
|
||||
return _block(f"tool '{tool_name}' not in HDT scope")
|
||||
|
||||
# ── 5. Irreversibility class ceiling ────────────────────────────────
|
||||
if not token.permits_class(action_class):
|
||||
return _block(
|
||||
f"action class {action_class.name} exceeds HDT ceiling "
|
||||
f"{token.max_irreversibility_class.name}"
|
||||
)
|
||||
|
||||
# ── 6. Class 3 — always blocked without explicit dual verification ──
|
||||
if action_class == IrreversibilityClass.CLASS_3:
|
||||
# In production: implement dual-principal confirmation (HDP-P §5.4)
|
||||
return _block(
|
||||
"Class 3 physical action requires dual-principal confirmation "
|
||||
"(HDP-P §5.4) — not implemented in this middleware instance"
|
||||
)
|
||||
|
||||
# ── 7. Class 2 — confirmation callback required ─────────────────────
|
||||
if action_class == IrreversibilityClass.CLASS_2:
|
||||
if self.confirmation_callback is None:
|
||||
return _block(
|
||||
"Class 2 action requires confirmation callback — "
|
||||
"none configured"
|
||||
)
|
||||
confirmed = self.confirmation_callback(tool_name, parameters)
|
||||
if not confirmed:
|
||||
return _block("Class 2 action — confirmation denied by principal")
|
||||
return _allow("Class 2 confirmed by principal", requires_confirmation=True)
|
||||
|
||||
# ── 8. Class 0 / 1 — allow ─────────────────────────────────────────
|
||||
return _allow(f"HDT valid, scope and class verified")
|
||||
|
||||
def gate_batch(
|
||||
self,
|
||||
function_calls: list[dict],
|
||||
token: HDPDelegationToken,
|
||||
) -> list[VerificationResult]:
|
||||
"""Verify a list of function calls. Returns one result per call."""
|
||||
return [self.gate(fc, token) for fc in function_calls]
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,925 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "-u7xRR3DeFXz"
|
||||
},
|
||||
"source": [
|
||||
"##### Copyright 2026 Google LLC."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "oed1Dh9SeIlD"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
|
||||
"# you may not use this file except in compliance with the License.\n",
|
||||
"# You may obtain a copy of the License at\n",
|
||||
"#\n",
|
||||
"# https://www.apache.org/licenses/LICENSE-2.0\n",
|
||||
"#\n",
|
||||
"# Unless required by applicable law or agreed to in writing, software\n",
|
||||
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
|
||||
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
|
||||
"# See the License for the specific language governing permissions and\n",
|
||||
"# limitations under the License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "A0UbyyBOeKmV"
|
||||
},
|
||||
"source": [
|
||||
"# RAG with EmbeddingGemma\n",
|
||||
"\n",
|
||||
"<table align=\"left\">\n",
|
||||
" <td>\n",
|
||||
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemma/cookbook/blob/main/tutorials/RAG_with_EmbeddingGemma.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
|
||||
" </td>\n",
|
||||
"</table>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "ND35JUp9ecq2"
|
||||
},
|
||||
"source": [
|
||||
"EmbeddingGemma is a lightweight, open embedding model designed for fast, high-quality retrieval on everyday devices like mobile phones. At only 308 million parameters, it's efficient enough to run advanced AI techniques, such as Retrieval Augmented Generation (RAG), directly on your local machine with no internet connection required.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"Before starting this tutorial, complete the following steps:\n",
|
||||
"\n",
|
||||
"* Get access to EmbeddingGemma by logging into [Hugging Face](https://huggingface.co/google/embeddinggemma-300M) and selecting **Acknowledge license** for a Gemma model.\n",
|
||||
"* Select a Colab runtime with sufficient resources to run\n",
|
||||
" the Gemma model size you want to run. [Learn more](https://ai.google.dev/gemma/docs/core#sizes).\n",
|
||||
"* Generate a Hugging Face [Access Token](https://huggingface.co/docs/hub/en/security-tokens#how-to-manage-user-access-token) and use it to login from Colab.\n",
|
||||
"\n",
|
||||
"This notebook will run on an NVIDIA T4 GPU."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "SZ8cw1nPf-NV"
|
||||
},
|
||||
"source": [
|
||||
"### Install Python packages\n",
|
||||
"\n",
|
||||
"Install the libraries required for running the EmbeddingGemma model and generating embeddings. Sentence Transformers is a Python framework for text and image embeddings. For more information, see the [Sentence Transformers](https://www.sbert.net/) documentation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"id": "daXx6O20Q7M0"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install -q -U sentence-transformers transformers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "kYiTsNFSjGJH"
|
||||
},
|
||||
"source": [
|
||||
"After you have accepted the license, you need a valid Hugging Face Token to access the model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "eLagJ9aff9Ks"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Login into Hugging Face Hub\n",
|
||||
"from huggingface_hub import login\n",
|
||||
"login()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "IiDcW_rmHBfx"
|
||||
},
|
||||
"source": [
|
||||
"### Load language model\n",
|
||||
"\n",
|
||||
"You will use Gemma 4 E2B to generate responses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"id": "HX2JFDQI-vg8"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c0b54b8b91da46fdb7ba8fd3aecb5002",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config.json: 0.00B [00:00, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "4291694230e74608a2808adde451bd0f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"model.safetensors: 0%| | 0.00/10.2G [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "cb31547f287441aba370d8e7a5fc351e",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Loading weights: 0%| | 0/1951 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "0900cc228bed472094eb986719edfde4",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"generation_config.json: 0%| | 0.00/208 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "3d195cea1ce044f4827cf06412aed5ec",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer_config.json: 0.00B [00:00, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "3bdb49b389aa4abfbb382fccaceb32be",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer.json: 0%| | 0.00/32.2M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "93e44e5dd0fe40d49e0cda367d98aeca",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"chat_template.jinja: 0.00B [00:00, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Load Gemma\n",
|
||||
"from transformers import pipeline\n",
|
||||
"\n",
|
||||
"MODEL_ID = \"google/gemma-4-E2B-it\"\n",
|
||||
"\n",
|
||||
"pipeline = pipeline(\n",
|
||||
" task=\"text-generation\",\n",
|
||||
" model=MODEL_ID,\n",
|
||||
" device_map=\"auto\",\n",
|
||||
" dtype=\"auto\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "eAg-c23Wh0th"
|
||||
},
|
||||
"source": [
|
||||
"### Load embedding model\n",
|
||||
"\n",
|
||||
"Use the `sentence-transformers` libraries to create an instance of a model class with EmbeddingGemma."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"id": "6Jj1WiTSRRk-"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "2c5dc65f501e402fb5ec67d094d925e7",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"modules.json: 0%| | 0.00/573 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "10b836de41a0410d8963be637ffa6b9d",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config_sentence_transformers.json: 0%| | 0.00/997 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "68e29095344e4d24ac3898638f5a2b0e",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"README.md: 0.00B [00:00, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "376438be53e14e4b808ce63de0d32cb2",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"sentence_bert_config.json: 0%| | 0.00/58.0 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "7f2a5a56690e4ed5950ad0c278cc20c7",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config.json: 0.00B [00:00, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "264f0c21602640bd9ddfa9d405b5613f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"model.safetensors: 0%| | 0.00/1.21G [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "70eb603cffa948cc895046a8238abbae",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Loading weights: 0%| | 0/314 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "aa608efe38f448898f8a01940a3684df",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer_config.json: 0.00B [00:00, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "b12b1756d9ac4145ae70595454e0e036",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer.json: 0%| | 0.00/33.4M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "e6e735942c07444ebfcf2702673762b6",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"added_tokens.json: 0%| | 0.00/35.0 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "ff92bb744fd54211b20f04aedebaa26d",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"special_tokens_map.json: 0%| | 0.00/662 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "114a2560d2124889932f1a6436c4d6ef",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config.json: 0%| | 0.00/312 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "292f471e215d4ac8a490508ce6963b01",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config.json: 0%| | 0.00/134 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "dce2f7bc57134d0180f3accdec8d5556",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"2_Dense/model.safetensors: 0%| | 0.00/9.44M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c245d417dc9f4d71850853a107379b16",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config.json: 0%| | 0.00/134 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "18def66743ae4738b940a4b20c434545",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"3_Dense/model.safetensors: 0%| | 0.00/9.44M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Device: cuda:0\n",
|
||||
"SentenceTransformer(\n",
|
||||
" (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'Gemma3TextModel'})\n",
|
||||
" (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'mean', 'include_prompt': True})\n",
|
||||
" (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})\n",
|
||||
" (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})\n",
|
||||
" (4): Normalize({})\n",
|
||||
")\n",
|
||||
"Total number of parameters in the model: 307581696\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import torch\n",
|
||||
"from sentence_transformers import SentenceTransformer\n",
|
||||
"\n",
|
||||
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
|
||||
"\n",
|
||||
"model_id = \"google/embeddinggemma-300M\"\n",
|
||||
"model = SentenceTransformer(model_id).to(device=device)\n",
|
||||
"\n",
|
||||
"print(f\"Device: {model.device}\")\n",
|
||||
"print(model)\n",
|
||||
"print(\"Total number of parameters in the model:\", sum([p.numel() for _, p in model.named_parameters()]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "8o2-nOX-aqRS"
|
||||
},
|
||||
"source": [
|
||||
"### Using Prompts with EmbeddingGemma\n",
|
||||
"\n",
|
||||
"For RAG systems, use the following `prompt_name` values to create specialized embeddings for your queries and documents:\n",
|
||||
"\n",
|
||||
"* **For Queries:** Use `prompt_name=\"Retrieval-query\"`.<br>\n",
|
||||
" ```python\n",
|
||||
" query_embedding = model.encode(\n",
|
||||
" \"How do I use prompts with this model?\",\n",
|
||||
" prompt_name=\"Retrieval-query\"\n",
|
||||
" )\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"* **For Documents:** Use `prompt_name=\"Retrieval-document\"`. To further improve document embeddings, you can also include a title by using the `prompt` argument directly:<br>\n",
|
||||
" * **With a title:**<br>\n",
|
||||
" ```python\n",
|
||||
" doc_embedding = model.encode(\n",
|
||||
" \"The document text...\",\n",
|
||||
" prompt=\"title: Using Prompts in RAG | text: \"\n",
|
||||
" )\n",
|
||||
" ```\n",
|
||||
" * **Without a title:**<br>\n",
|
||||
" ```python\n",
|
||||
" doc_embedding = model.encode(\n",
|
||||
" \"The document text...\",\n",
|
||||
" prompt=\"title: none | text: \"\n",
|
||||
" )\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"### Further Reading\n",
|
||||
"\n",
|
||||
"* For details on all available EmbeddingGemma prompts, see the [model card](http://ai.google.dev/gemma/docs/embeddinggemma/model_card#prompt_instructions).\n",
|
||||
"* For general information on prompt templates, see the [Sentence Transformer documentation](https://sbert.net/examples/sentence_transformer/applications/computing-embeddings/README.html#prompt-templates).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"id": "Y5hVNF3F-qZ7"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Available tasks:\n",
|
||||
" query: \"task: search result | query: \"\n",
|
||||
" document: \"title: none | text: \"\n",
|
||||
" BitextMining: \"task: search result | query: \"\n",
|
||||
" Clustering: \"task: clustering | query: \"\n",
|
||||
" Classification: \"task: classification | query: \"\n",
|
||||
" InstructionRetrieval: \"task: code retrieval | query: \"\n",
|
||||
" MultilabelClassification: \"task: classification | query: \"\n",
|
||||
" PairClassification: \"task: sentence similarity | query: \"\n",
|
||||
" Reranking: \"task: search result | query: \"\n",
|
||||
" Retrieval: \"task: search result | query: \"\n",
|
||||
" Retrieval-query: \"task: search result | query: \"\n",
|
||||
" Retrieval-document: \"title: none | text: \"\n",
|
||||
" STS: \"task: sentence similarity | query: \"\n",
|
||||
" Summarization: \"task: summarization | query: \"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"Available tasks:\")\n",
|
||||
"for name, prefix in model.prompts.items():\n",
|
||||
" print(f\" {name}: \\\"{prefix}\\\"\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "eIfWZ_z3xDZq"
|
||||
},
|
||||
"source": [
|
||||
"## Simple RAG example\n",
|
||||
"\n",
|
||||
"Retrieval is the task of finding the most relevant pieces of information from a large collection (a database, a set of documents, a website) based on the meaning of a query, not just keywords.\n",
|
||||
"\n",
|
||||
"Imagine you work for a company, and you need to find information from the internal employee handbook, which is stored as a collection of hundreds of documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "fbaiy-CXRAs7"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@title Corp knowledge base\n",
|
||||
"corp_knowledge_base = [\n",
|
||||
" {\n",
|
||||
" \"category\": \"HR & Leave Policies\",\n",
|
||||
" \"documents\": [\n",
|
||||
" {\n",
|
||||
" \"title\": \"Procedure for Unscheduled Absence\",\n",
|
||||
" \"content\": \"In the event of an illness or emergency preventing you from working, please notify both your direct manager and the HR department via email by 9:30 AM JST. The subject line should be 'Sick Leave - [Your Name]'. If the absence extends beyond two consecutive days, a doctor's certificate (診断書) will be required upon your return.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"title\": \"Annual Leave Policy\",\n",
|
||||
" \"content\": \"Full-time employees are granted 10 days of annual paid leave in their first year. This leave is granted six months after the date of joining and increases each year based on length of service. For example, an employee in their third year of service is entitled to 14 days per year. For a detailed breakdown, please refer to the attached 'Annual Leave Accrual Table'.\"\n",
|
||||
" },\n",
|
||||
" ]\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"category\": \"IT & Security\",\n",
|
||||
" \"documents\": [\n",
|
||||
" {\n",
|
||||
" \"title\": \"Account Password Management\",\n",
|
||||
" \"content\": \"If you have forgotten your password or your account is locked, please use the self-service reset portal at https://reset.ourcompany. You will be prompted to answer your pre-configured security questions. For security reasons, the IT Help Desk cannot reset passwords over the phone or email. If you have not set up your security questions, please visit the IT support desk on the 12th floor of the Shibuya office with your employee ID card.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"title\": \"Software Procurement Process\",\n",
|
||||
" \"content\": \"All requests for new software must be submitted through the 'IT Service Desk' portal under the 'Software Request' category. Please include a business justification for the request. All software licenses require approval from your department head before procurement can begin. Please note that standard productivity software is pre-approved and does not require this process.\"\n",
|
||||
" },\n",
|
||||
" ]\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"category\": \"Finance & Expenses\",\n",
|
||||
" \"documents\": [\n",
|
||||
" {\n",
|
||||
" \"title\": \"Expense Reimbursement Policy\",\n",
|
||||
" \"content\": \"To ensure timely processing, all expense claims for a given month must be submitted for approval no later than the 5th business day of the following month. For example, all expenses incurred in July must be submitted by the 5th business day of August. Submissions after this deadline may be processed in the next payment cycle.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"title\": \"Business Trip Expense Guidelines\",\n",
|
||||
" \"content\": \"Travel expenses for business trips will, as a rule, be reimbursed based on the actual cost of the most logical and economical route. Please submit a travel expense application in advance when using the Shinkansen or airplanes. Taxis are permitted only when public transportation is unavailable or when transporting heavy equipment. Receipts are mandatory.\"\n",
|
||||
" },\n",
|
||||
" ]\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"category\": \"Office & Facilities\",\n",
|
||||
" \"documents\": [\n",
|
||||
" {\n",
|
||||
" \"title\": \"Conference Room Booking Instructions\",\n",
|
||||
" \"content\": \"All conference rooms in the Shibuya office can be reserved through your Calendar App. Create a new meeting invitation, add the attendees, and then use the 'Room Finder' feature to select an available room. Please be sure to select the correct floor. For meetings with more than 10 people, please book the 'Sakura' or 'Fuji' rooms on the 14th floor.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"title\": \"Mail and Delivery Policy\",\n",
|
||||
" \"content\": \"The company's mail services are intended for business-related correspondence only. For security and liability reasons, employees are kindly requested to refrain from having personal parcels or mail delivered to the Shibuya office address. The front desk will not be able to accept or hold personal deliveries.\"\n",
|
||||
" },\n",
|
||||
" ]\n",
|
||||
" },\n",
|
||||
"]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Fvecfoko--hL"
|
||||
},
|
||||
"source": [
|
||||
"And imagine you have a question like below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"id": "wN-WHf26J89m"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = \"How do I reset my password?\" # @param [\"How many days of annual paid leave do I get?\", \"How do I reset my password?\", \"What travel expenses can be reimbursed for a business trip?\", \"Can I receive personal packages at the office?\"] {type:\"string\", allow-input: true}\n",
|
||||
"\n",
|
||||
"# Define a minimum confidence threshold for a match to be considered valid\n",
|
||||
"similarity_threshold = 0.4 # @param {\"type\":\"slider\",\"min\":0,\"max\":1,\"step\":0.1}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "2CSeSmF7OuMB"
|
||||
},
|
||||
"source": [
|
||||
"Search relevant document from the corporate knowledge base."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"id": "NngqWUxOyrLS"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Step 1: Finding the best category...\n",
|
||||
"['HR & Leave Policies', 'IT & Security', 'Finance & Expenses', 'Office & Facilities']\n",
|
||||
"tensor([[0.5063, 0.5937, 0.5076, 0.4221]])\n",
|
||||
" `-> ✅ Category Found: 'IT & Security' (Score: 0.59)\n",
|
||||
"\n",
|
||||
"Step 2: Finding the best document in that category...\n",
|
||||
"['Account Password Management', 'Software Procurement Process']\n",
|
||||
"tensor([[0.5829, 0.1531]])\n",
|
||||
" `-> ✅ Document Found: 'Account Password Management' (Score: 0.58)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# --- Helper Functions for Semantic Search ---\n",
|
||||
"\n",
|
||||
"def _calculate_best_match(similarities):\n",
|
||||
" print(similarities)\n",
|
||||
" if similarities is None or similarities.nelement() == 0:\n",
|
||||
" return None, 0.0\n",
|
||||
"\n",
|
||||
" # Find the index and value of the highest score\n",
|
||||
" best_index = similarities.argmax().item()\n",
|
||||
" best_score = similarities[0, best_index].item()\n",
|
||||
"\n",
|
||||
" return best_index, best_score\n",
|
||||
"\n",
|
||||
"def find_best_category(model, query, candidates):\n",
|
||||
" \"\"\"\n",
|
||||
" Finds the most relevant category from a list of candidates.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" model: The SentenceTransformer model.\n",
|
||||
" query: The user's query string.\n",
|
||||
" candidates: A list of category name strings.\n",
|
||||
"\n",
|
||||
" Returns:\n",
|
||||
" A tuple containing the index of the best category and its similarity score.\n",
|
||||
" \"\"\"\n",
|
||||
" if not candidates:\n",
|
||||
" return None, 0.0\n",
|
||||
"\n",
|
||||
" # Encode the query and candidate categories for classification\n",
|
||||
" query_embedding = model.encode(query, prompt_name=\"Classification\")\n",
|
||||
" candidate_embeddings = model.encode(candidates, prompt_name=\"Classification\")\n",
|
||||
"\n",
|
||||
" print(candidates)\n",
|
||||
" return _calculate_best_match(model.similarity(query_embedding, candidate_embeddings))\n",
|
||||
"\n",
|
||||
"def find_best_doc(model, query, candidates):\n",
|
||||
" \"\"\"\n",
|
||||
" Finds the most relevant document from a list of candidates.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" model: The SentenceTransformer model.\n",
|
||||
" query: The user's query string.\n",
|
||||
" candidates: A list of document dictionaries, each with 'title' and 'content'.\n",
|
||||
"\n",
|
||||
" Returns:\n",
|
||||
" A tuple containing the index of the best document and its similarity score.\n",
|
||||
" \"\"\"\n",
|
||||
" if not candidates:\n",
|
||||
" return None, 0.0\n",
|
||||
"\n",
|
||||
" # Encode the query for retrieval\n",
|
||||
" query_embedding = model.encode(query, prompt_name=\"Retrieval-query\")\n",
|
||||
"\n",
|
||||
" # Encode the document for similarity check\n",
|
||||
" doc_texts = [\n",
|
||||
" f\"title: {doc.get('title', 'none')} | text: {doc.get('content', '')}\"\n",
|
||||
" for doc in candidates\n",
|
||||
" ]\n",
|
||||
" candidate_embeddings = model.encode(doc_texts)\n",
|
||||
"\n",
|
||||
" print([doc['title'] for doc in candidates])\n",
|
||||
"\n",
|
||||
" # Calculate cosine similarity\n",
|
||||
" return _calculate_best_match(model.similarity(query_embedding, candidate_embeddings))\n",
|
||||
"\n",
|
||||
"# --- Main Search Logic ---\n",
|
||||
"\n",
|
||||
"# In your application, `best_document` would result from a search.\n",
|
||||
"# We initialize it to None to ensure it always exists.\n",
|
||||
"best_document = None\n",
|
||||
"\n",
|
||||
"# 1. Find the most relevant category\n",
|
||||
"print(\"Step 1: Finding the best category...\")\n",
|
||||
"categories = [item[\"category\"] for item in corp_knowledge_base]\n",
|
||||
"best_category_index, category_score = find_best_category(\n",
|
||||
" model, question, categories\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Check if the category score meets the threshold\n",
|
||||
"if category_score < similarity_threshold:\n",
|
||||
" print(f\" `-> 🤷 No relevant category found. The highest score was only {category_score:.2f}.\")\n",
|
||||
"else:\n",
|
||||
" best_category = corp_knowledge_base[best_category_index]\n",
|
||||
" print(f\" `-> ✅ Category Found: '{best_category['category']}' (Score: {category_score:.2f})\")\n",
|
||||
"\n",
|
||||
" # 2. Find the most relevant document ONLY if a good category was found\n",
|
||||
" print(\"\\nStep 2: Finding the best document in that category...\")\n",
|
||||
" best_document_index, document_score = find_best_doc(\n",
|
||||
" model, question, best_category[\"documents\"]\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Check if the document score meets the threshold\n",
|
||||
" if document_score < similarity_threshold:\n",
|
||||
" print(f\" `-> 🤷 No relevant document found. The highest score was only {document_score:.2f}.\")\n",
|
||||
" else:\n",
|
||||
" best_document = best_category[\"documents\"][best_document_index]\n",
|
||||
" # 3. Display the final successful result\n",
|
||||
" print(f\" `-> ✅ Document Found: '{best_document['title']}' (Score: {document_score:.2f})\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "zK9T5rRGAMDw"
|
||||
},
|
||||
"source": [
|
||||
"Next, generate the answer with the retrieved context"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"id": "FrwKySpMASpt"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Question🙋♂️: How do I reset my password?\n",
|
||||
"Using document: Account Password Management\n",
|
||||
"Answer🤖: Please use the self-service reset portal at https://reset.ourcompany. You will be prompted to answer your pre-configured security questions.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from transformers import GenerationConfig\n",
|
||||
"MODEL_ID = \"google/gemma-4-E2B-it\"\n",
|
||||
"config = GenerationConfig.from_pretrained(MODEL_ID)\n",
|
||||
"config.max_new_tokens = 512\n",
|
||||
"\n",
|
||||
"qa_prompt_template = \"\"\"Answer the following QUESTION based only on the CONTEXT provided. If the answer cannot be found in the CONTEXT, write \"I don't know.\"\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"CONTEXT:\n",
|
||||
"{context}\n",
|
||||
"---\n",
|
||||
"QUESTION:\n",
|
||||
"{question}\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# First, check if a valid document was found before proceeding.\n",
|
||||
"if best_document and \"content\" in best_document:\n",
|
||||
" # If the document exists and has a \"content\" key, generate the answer.\n",
|
||||
" context = best_document[\"content\"]\n",
|
||||
"\n",
|
||||
" prompt = qa_prompt_template.format(context=context, question=question)\n",
|
||||
"\n",
|
||||
" messages = [\n",
|
||||
" {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": [{\"type\": \"text\", \"text\": prompt}],\n",
|
||||
" },\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
" print(\"Question🙋♂️: \" + question)\n",
|
||||
" # This part assumes your pipeline and response parsing logic are correct\n",
|
||||
" answer = pipeline(messages, generation_config=config)[0][\"generated_text\"][1][\"content\"]\n",
|
||||
" print(\"Using document: \" + best_document[\"title\"])\n",
|
||||
" print(\"Answer🤖: \" + answer)\n",
|
||||
"\n",
|
||||
"else:\n",
|
||||
" # If best_document is None or doesn't have content, give a direct response.\n",
|
||||
" print(\"Question🙋♂️: \" + question)\n",
|
||||
" print(\"Answer🤖: I'm sorry, I could not find a relevant document to answer that question.\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "h4J4pFA3IK1d"
|
||||
},
|
||||
"source": [
|
||||
"## Summary and next steps\n",
|
||||
"\n",
|
||||
"You have now learned how to build a practical RAG system with EmbeddingGemma.\n",
|
||||
"\n",
|
||||
"Explore what more you can do with EmbeddingGemma:\n",
|
||||
"\n",
|
||||
"* [Generate embeddings with Sentence Transformers](https://ai.google.dev/gemma/docs/embeddinggemma/inference-embeddinggemma-with-sentence-transformers)\n",
|
||||
"* [Fine-tune EmbeddingGemma](https://ai.google.dev/gemma/docs/embeddinggemma/fine-tuning-embeddinggemma-with-sentence-transformers)\n",
|
||||
"* [Mood Palette Generator](https://huggingface.co/spaces/google/mood-palette), an interactive application using EmbeddingGemma"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"accelerator": "GPU",
|
||||
"colab": {
|
||||
"name": "RAG_with_EmbeddingGemma.ipynb",
|
||||
"toc_visible": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
Reference in New Issue
Block a user