Gemma 3

Google's multimodal open model: image + text understanding with 140+ language support.

Generic Info

Publisher: Google
Release Date: March 2025
Parameters: 1B, 4B, 12B, 27B
Context Window: 128K tokens (4B+), 32K tokens (1B)
License: Gemma Terms of Use
Key Capabilities: Multimodal (Image + Text), 140+ Languages, Function Calling

Gemma 3 brings Google's Gemini research to the open-source community with native multimodality. The 27B model runs on a single consumer GPU (RTX 3090) while outperforming many larger models. Its custom SigLIP vision encoder enables sophisticated image understanding, from document analysis to visual Q&A.

Hello World Guide

Run Gemma 3 with transformers.

Python

from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="google/gemma-3-27b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain how neural networks learn in simple terms."},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)

print(outputs[0]["generated_text"][-1])

Industry Usage

Visual Document Analysis

Multimodal capabilities enable extraction of data from images, charts, and scanned documents.

Consumer Hardware Deployment

Optimized quantized versions run locally on RTX 3090 for privacy-preserving applications.

Academic Research

Open weights and strong benchmarks make it a go-to baseline for AI research projects.