Gemini 3.1 Flash Lite Preview vs Llama 3.2 1B Instruct

Compare Gemini 3.1 Flash Lite Preview and Llama 3.2 1B Instruct on key metrics including price, context length, throughput, and other model features.

AuthorGoogle

Context Length1.0M

Supports Tools

Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model designed for high-throughput, high-volume use cases. It delivers better overall quality than Gemini 2.5 Flash Lite and comes close to Gemini 2.5 Flash performance across core capabilities. Enhancements include audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. It supports the full range of thinking levels (minimal, low, medium, high) to enable fine-grained cost/performance tuning. Pricing is set at half the cost of Gemini 3 Flash.

Activity

Last 14 days

Prompt

500M

Completion

56M

Total

555M

Startup

Google

Latency (p50)1.01s

Throughput (p50)128.9 tok/s

Pricing

Input$0.13/M tokens

Output$0.75/M tokens

Cached input$0.01/M tokens

Features

Input Modalitiestext, image, audio, file

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorMeta Llama

Context Length60k

Supports Tools

Llama 3.2 1B is a 1-billion-parameter language model focused on efficient natural language tasks, including summarization, dialogue, and multilingual text analysis. Its small size allows for deployment in low-resource environments while maintaining strong performance across eight core languages.

Activity

Last 14 days

Prompt

216M

Completion

81M

Total

297M

Startup

Meta Llama

Latency (p50)0.16s

Throughput (p50)349.2 tok/s

Pricing

Input$0.02/M tokens

Output$0.02/M tokens

Cached input-

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model