Gemini 2.5 Flash Lite vs Llama 3.2 1B Instruct

Compare Gemini 2.5 Flash Lite and Llama 3.2 1B Instruct on key metrics including price, context length, throughput, and other model features.

AuthorGoogle

Context Length1.0M

Supports Tools

Gemini 2.5 Flash-Lite is a streamlined reasoning model from the Gemini 2.5 family, designed for extremely low latency and cost-effectiveness. It delivers higher throughput, quicker token generation, and enhanced performance on standard benchmarks compared to previous Flash models.

Activity

Last 14 days

Prompt

Completion

16M

Total

Startup

Google

Latency (p50)9.72s

Throughput (p50)17.1 tok/s

Pricing

Input$0.05/M tokens

Output$0.20/M tokens

Features

Input Modalitiestext, image, file, audio

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorMeta Llama

Context Length60k

Supports Tools

Llama 3.2 1B is a 1-billion-parameter language model focused on efficient natural language tasks, including summarization, dialogue, and multilingual text analysis. Its small size allows for deployment in low-resource environments while maintaining strong performance across eight core languages.

Activity

Last 14 days

Prompt

116M

Completion

30M

Total

146M

Startup

Meta Llama

Latency (p50)0.10s

Throughput (p50)177.5 tok/s

Pricing

Input$0.02/M tokens

Output$0.02/M tokens

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model