Llama 4 Scout
MetaMULTIMODAL Open weight
- Provider
- Meta
- Modality
- MULTIMODAL
- Parameters
- 109B
- Context window
- 10,000,000 tokens
- Weights
- Open
- Released
- 5 Apr 2025
Llama 4 Scout is the lightweight, single-GPU Llama 4 with a record 10M-token context — the most self-hostable open multimodal model for cost-constrained teams.
Best for
- Industry-leading 10M-token context for long docs / whole codebases
- Single-GPU self-hosting — the most deployable Llama 4
- Cost-efficient multimodal assistant (17B active)
How it compares — and the India angle
- Its 10M context + single-GPU footprint make it the most practical Llama 4 for Indian SMEs/colleges with limited hardware.
- Open-weight and free — self-host in-region for full data control vs a US API.
- Independent tests found real-world long-context retrieval weaker than the 10M headline — validate on your own data.
Benchmarks
Representative public scores (approximate, higher is better) for relative comparison. Check the provider for the latest official results.
MMLU80
GPQA Diamond57
How to access
Free to self-host (Llama 4 community licence). 17B active / 109B total MoE; fits a single high-end GPU.
Access Llama 4 Scout