Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team
The EAGLE series — including EAGLE 1, EAGLE 2, and EAGLE 3 — has become one of the most widely adopted and practically deployed families of speculative decoding
AI Summary
EAGLE 3.1 is a new speculative decoding algorithm developed through collaboration between the EAGLE Team, vLLM, and TorchSpec that addresses stability issues in previous versions. The update tackles "attention drift," a phenomenon where the drafter model becomes less stable at deeper speculation depths due to imbalanced input representations and growing hidden-state magnitudes. EAGLE 3.1 introduces two architectural improvements—FC normalization after target hidden states and feeding post-norm hidden states into the next decoding step—to enhance robustness and performance across different chat templates, long-context inputs, and system prompts.








