The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that combines a linear attention mechanism with a sparse mixture-of-experts approach, delivering improved inference efficiency. It achieves state-of-the-art results comparable to top-tier models across a broad range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interaction. Thanks to its strong code-generation and agent capabilities, the model demonstrates solid generalization across a wide variety of agents.
$0.30
$1.30