Llama 4 Scout

MetaMULTIMODAL Open weight

Provider: Meta
Modality: MULTIMODAL
Parameters: 109B
Context window: 10,000,000 tokens
Weights: Open
Released: 5 Apr 2025

Llama 4 Scout is the lightweight, single-GPU Llama 4 with a record 10M-token context — the most self-hostable open multimodal model for cost-constrained teams.

Best for

Industry-leading 10M-token context for long docs / whole codebases
Single-GPU self-hosting — the most deployable Llama 4
Cost-efficient multimodal assistant (17B active)

How it compares — and the India angle

Its 10M context + single-GPU footprint make it the most practical Llama 4 for Indian SMEs/colleges with limited hardware.
Open-weight and free — self-host in-region for full data control vs a US API.
Independent tests found real-world long-context retrieval weaker than the 10M headline — validate on your own data.

Benchmarks

Representative public scores (approximate, higher is better) for relative comparison. Check the provider for the latest official results.

MMLU80

GPQA Diamond57

How to access

Free to self-host (Llama 4 community licence). 17B active / 109B total MoE; fits a single high-end GPU.

Access Llama 4 Scout