Gemma 3
Google's multimodal open model: image + text understanding with 140+ language support.
Generic Info
- Publisher: Google
- Release Date: March 2025
- Parameters: 1B, 4B, 12B, 27B
- Context Window: 128K tokens (4B+), 32K tokens (1B)
- License: Gemma Terms of Use
- Key Capabilities: Multimodal (Image + Text), 140+ Languages, Function Calling
Gemma 3 brings Google's Gemini research to the open-source community with native multimodality. The 27B model runs on a single consumer GPU (RTX 3090) while outperforming many larger models. Its custom SigLIP vision encoder enables sophisticated image understanding, from document analysis to visual Q&A.
Hello World Guide
Run Gemma 3 with transformers.
Python
from transformers import pipeline
import torch
pipe = pipeline(
"text-generation",
model="google/gemma-3-27b-it",
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain how neural networks learn in simple terms."},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Industry Usage
Visual Document Analysis
Multimodal capabilities enable extraction of data from images, charts, and scanned documents.
Consumer Hardware Deployment
Optimized quantized versions run locally on RTX 3090 for privacy-preserving applications.
Academic Research
Open weights and strong benchmarks make it a go-to baseline for AI research projects.