Mozilla AI Releases Llamafile 0.10.0 with GPU Support and Rebuilt Core
What Happened — Mozilla‑AI unveiled Llamafile 0.10.0, a ground‑up rebuild of its portable LLM runner that restores CUDA GPU acceleration for Linux and adds Metal support for macOS. The update also introduces a terminal UI, server mode, multimodal (image) and speech (Whisper) capabilities, and bundles a range of quantized models up to 27 B parameters.
Why It Matters for TPRM —
- Enables secure, air‑gapped deployment of powerful LLMs without relying on cloud services.
- Expands the attack surface surface area of third‑party AI tooling (GPU drivers, native binaries).
- Provides a new vector for supply‑chain risk if bundled model weights contain malicious payloads.
Who Is Affected — Organizations in technology/SaaS, research & development, defense/air‑gap environments, and any vendor that integrates LLMs into products or services.
Recommended Actions —
- Review the Llamafile binary supply chain and verify signatures.
- Validate that GPU driver versions and Metal toolchains meet your hardening standards.
- Test the new server mode for unintended network exposure before production use.
Technical Notes — The rebuild updates the underlying llama.cpp to commit 7f5ee54, re‑enables CUDA on Linux, adds Metal on macOS ARM64, and introduces a TUI with --server, --image, and mtmd API hooks for multimodal input. Windows GPU support remains unavailable. Model weights are bundled directly in the executable, ranging from 1.6 GB (Qwen3.5 0.8B) to 19 GB (Qwen3.5 27B). Source: Help Net Security