Qwen3 VL 235B A22B Instruct
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that combines strong text generation with advanced visual understanding for images and video. Designed for general vision-language tasks like VQA, document parsing, chart and table extraction, and multilingual OCR, the model emphasizes robust perception, spatial (2D/3D) understanding, and long-form visual comprehension, with competitive results on public benchmarks. Qwen3-VL also supports agentic interaction and tool use, following complex instructions in multi-image dialogues, aligning text to video timelines, operating GUIs for automation, and enabling visual coding workflows such as turning sketches into code or debugging UIs. Its strong text-only capabilities match Qwen3 language models, making it suitable for document AI, OCR, UI/software assistance, spatial reasoning, and vision-language agent research.
Pricing
Pay-as-you-go rates for this model. More details can be found here.
Input Tokens (1M)
$0.35
Output Tokens (1M)
$1.40
Capabilities
Input Modalities
Output Modalities
Supported Parameters
Available parameters for API requests
Usage Analytics
Token usage across the last 30 active days
Uptime
Reliability over the last 7 days
Throughput
Time-To-First-Token (TTFT)
Code Example
Example code for using this model through our API with Python (OpenAI SDK) or cURL. Replace placeholders with your API key and model ID.
Basic request example. Ensure API key permissions. For more details, see our documentation.
from openai import OpenAI
client = OpenAI(
base_url="https://api.naga.ac/v1",
api_key="YOUR_API_KEY",
)
resp = client.chat.completions.create(
model="qwen3-vl-235b-a22b-instruct",
messages=[
{{"role": "user", "content": "What's 2+2?"}}
],
temperature=0.2,
)
print(resp.choices[0].message.content)