Gemma 4 26B A4B

Chat Completions

gemma-4-26b-a4b-it

Google|Created Apr 12, 2026|262.1k context

Chat Completions

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model developed by Google DeepMind. While it has 25.2B parameters in total, only 3.8B are activated per token during inference — providing performance close to a 31B model at a much lower computational cost. It supports multimodal inputs such as text, images, and video (up to 60 seconds at 1 fps). Key features include a 256K-token context window, native function calling, an adjustable thinking/reasoning mode, and support for structured outputs. It is released under the Apache 2.0 license.

Overview Specifications Activity Performance Uptime Examples API Reference

Compare

Endpoints and request shape

This page collects the public integration surface for the model: supported endpoints, available request parameters, and example calls through the NagaAI API.

Chat Completions

Code Example

Example code for using this model through our API with Python (OpenAI SDK) or cURL. Replace placeholders with your API key and model ID.

Basic request example. Ensure API key permissions. For more details, see our documentation.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

resp = client.chat.completions.create(
    model="gemma-4-26b-a4b-it",
    messages=[
        {"role": "user", "content": "What's 2+2?"}
    ],
    temperature=0.2,
)
print(resp.choices[0].message.content)

Supported Parameters

Available parameters for API requests

Max Completion TokensReasoning EffortResponse FormatStopTemperatureTool ChoiceToolsTop PWeb Search Options