Grok 2 Vision

grok-2-vision-1212
by x-ai|Created May 26, 2025

xAI’s Grok 2 Vision 1212 is a next-generation vision-language model designed for advanced image-based AI applications. It features robust visual comprehension, refined instruction-following, and strong multilingual support. The model excels at object recognition, style analysis, and visual reasoning, empowering developers to build intuitive, visually aware applications. Enhanced steerability and reasoning capabilities make it a solid foundation for next-generation image solutions. For more details, see the official [xAI announcement](https://x.ai/blog/grok-1212).

Pricing

Pay-as-you-go rates for this model. More details can be found here.

Input Tokens (1M)

$1.00

Output Tokens (1M)

$5.00

Capabilities

Input Modalities

Text
Image

Output Modalities

Text

Usage Analytics

Token usage across the last 30 active days

Throughput

Time-To-First-Token (TTFT)