← Back to Top 10
Google

Gemma 3

Google's multimodal open model: image + text understanding with 140+ language support.

Generic Info

  • Publisher: Google
  • Release Date: March 2025
  • Parameters: 1B, 4B, 12B, 27B
  • Context Window: 128K tokens (4B+), 32K tokens (1B)
  • License: Gemma Terms of Use
  • Key Capabilities: Multimodal (Image + Text), 140+ Languages, Function Calling

Gemma 3 brings Google's Gemini research to the open-source community with native multimodality. The 27B model runs on a single consumer GPU (RTX 3090) while outperforming many larger models. Its custom SigLIP vision encoder enables sophisticated image understanding, from document analysis to visual Q&A.

Hello World Guide

Run Gemma 3 with transformers.

Python
from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="google/gemma-3-27b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain how neural networks learn in simple terms."},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)

print(outputs[0]["generated_text"][-1])

Industry Usage

Visual Document Analysis

Multimodal capabilities enable extraction of data from images, charts, and scanned documents.

Consumer Hardware Deployment

Optimized quantized versions run locally on RTX 3090 for privacy-preserving applications.

Academic Research

Open weights and strong benchmarks make it a go-to baseline for AI research projects.