Llama 3.2 11B Vision Instruct

Review Llama 3.2 11B Vision Instruct on key metrics including price, context length, throughput, and model features.

AuthorMeta Llama

Context Length131.1k

Supports Tools

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed for tasks combining visual and textual data. It excels at image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it is ideal for content creation, AI-driven customer service, and research.

Activity

Last 14 days

Not enough activity yet

Prompt

Completion

Total

Startup

Meta Llama

Latency (p50)-

Throughput (p50)-

Pricing

Input$0.10/M tokens

Output$0.10/M tokens

Features

Input Modalitiestext, image

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model