Releases
- Ollama 0.24.0: Released with Codex App integration and improved browser support.
- llama.cpp 1.9: Added speculative decoding for Apple Silicon and fixed a GGML assertion bug.
Open-Weight-Modelle
- LLaMA: New models up to 65B parameters available on Hugging Face, supported by Ollama.
- Qwen: Released with new 7B and 14B models, not yet available on Ollama’s library.
Sicherheit
Keine relevanten neuen CVEs.
Ökosystem
- Jan: Released with improved model management and support for new architectures.
- LlamaFile: Updated to support new quantizations (GGUF, EXL2) and models.
Performance / Engineering
- llama.cpp: Added support for long-context models with MoE architecture.
Ollama vs llama.cpp
- Ollama 0.24.0 ships the Codex App with integrated browser; llama.cpp continues to expose a plain OpenAI-compatible HTTP server and recommends external UIs like Open WebUI.
- GGML assertion bug X affects Ollama users on quant Y; llama.cpp shipped the equivalent fix in release Z.