ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters. - Firethering

Who should care If you work with math, science problems, or complex coding tasks and you're looking for something small enough to run locally or cheaply via API, this is worth serious evaluation. The benchmark numbers at 760M active parameters are not normal and the Markovian RSA boost means performance scales with compute budget rather than hitting a fixed ceiling. If you're building agent workflows that need reliable tool calling or multi-step instruction following, look elsewhere for now. The agentic numbers are honest about that gap. Researchers working on test-time compute methods will find the Markovian RSA implementation worth studying regardless of whether they deploy the model itself. The co-design approach — training the model specifically to work with the inference method rather than applying the method after the fact — is an interesting direction that most labs haven't published on at this level of detail. The AMD training story is also worth paying attention to if you care about where the hardware ecosystem goes next. This is the most capable model trained end to end on AMD hardware that anyone has published. That matters beyond just this one release.

AI Summary

ZAYA1-8B is a small model with less than 1 billion active parameters that matches DeepSeek-R1's performance on math benchmarks while remaining competitive with much larger frontier models on reasoning and coding tasks. The model uses a Markovian RSA approach designed during training rather than applied afterward, allowing performance to scale with compute budget, and was trained entirely on AMD hardware—a notable achievement as most advanced models are trained on NVIDIA infrastructure. The development represents significant progress in efficient model design and AMD's capabilities in AI hardware, though the model has limitations in agentic tasks and multi-step instruction following.

Read Original → · Discuss with AI → · Share →
← Back to news