Grok 2 Vision
grok-2-vision-1212
by x-ai|Created May 26, 2025
xAI’s Grok 2 Vision 1212 is a next-generation vision-language model designed for advanced image-based AI applications. It features robust visual comprehension, refined instruction-following, and strong multilingual support. The model excels at object recognition, style analysis, and visual reasoning, empowering developers to build intuitive, visually aware applications. Enhanced steerability and reasoning capabilities make it a solid foundation for next-generation image solutions. For more details, see the official [xAI announcement](https://x.ai/blog/grok-1212).
Pricing
Pay-as-you-go rates for this model. More details can be found here.
Input Tokens (1M)
$1.00
Output Tokens (1M)
$5.00
Capabilities
Input Modalities
Text
Image
Output Modalities
Text
Usage Analytics
Token usage across the last 30 active days