# Source: vllm-project/vllm main branch — vllm/model_executor/models/registry.py # Verified 2026-04-18 via GitHub API. # Line 99 (text-only Gemma 4 CausalLM): "Gemma4ForCausalLM": ("gemma4", "Gemma4ForCausalLM"), # Line 230 (multimodal Gemma 4: vision + audio + video): "Gemma4ForCausalLM": ("gemma4_mm", "Gemma4ForConditionalGeneration"), # The second (_mm) registration maps Gemma4ForCausalLM -> gemma4_mm.Gemma4ForConditionalGeneration, # which wires in: # - vision_tower (pixel_values, pixel_position_ids) # - audio_tower (input_features_padded, input_features_mask) [E2B/E4B only] # - video path (pixel_values_videos — decomposed to frames, up to 32 frames @ 70 soft tokens) # # vLLM dispatches based on whether the HF config has audio_config populated.