Papers
arxiv:2511.16652

Evolution Strategies at the Hyperscale

Published on Nov 20
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

EGGROLL, an evolution strategies algorithm using low-rank matrix perturbations, scales efficiently for large neural networks while maintaining performance and enabling integer datatype pre-training.

AI-generated summary

We introduce Evolution Guided General Optimization via Low-rank Learning (EGGROLL), an evolution strategies (ES) algorithm designed to scale backprop-free optimization to large population sizes for modern large neural network architectures with billions of parameters. ES is a set of powerful blackbox optimisation methods that can handle non-differentiable or noisy objectives with excellent scaling potential through parallelisation. Na{ï}ve ES becomes prohibitively expensive at scale due to the computational and memory costs associated with generating matrix perturbations EinR^{mtimes n} and the batched matrix multiplications needed to compute per-member forward passes. EGGROLL overcomes these bottlenecks by generating random matrices Ain R^{mtimes r}, Bin R^{ntimes r} with rll min(m,n) to form a low-rank matrix perturbation A B^top that are used in place of the full-rank perturbation E. As the overall update is an average across a population of N workers, this still results in a high-rank update but with significant memory and computation savings, reducing the auxiliary storage from mn to r(m+n) per layer and the cost of a forward pass from O(mn) to O(r(m+n)) when compared to full-rank ES. A theoretical analysis reveals our low-rank update converges to the full-rank update at a fast Oleft(1{r}right) rate. Our experiments show that (1) EGGROLL does not compromise the performance of ES in tabula-rasa RL settings, despite being faster, (2) it is competitive with GRPO as a technique for improving LLM reasoning, and (3) EGGROLL enables stable pre-training of nonlinear recurrent language models that operate purely in integer datatypes.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.16652 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.16652 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.16652 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.