Modeling the Cosmological Lyman-$α$ Forest at the Field Level
Authors
Roger de Belsunce, Mikhail M. Ivanov, James M. Sullivan, Kazuyuki Akitsu, Shi-Fan Chen
Abstract
The distribution of absorption lines in the spectra of distant quasars, called the Lyman-$α$ (Ly-$α$) forest, is a unique probe of cosmology and the intergalactic medium at high redshifts and small scales. The statistical power of ongoing redshift surveys demands precise theoretical tools to model the Ly-$α$ forest. We address this challenge by developing an analytic, perturbative forward model to predict the Ly-$α$ forest at the field level for a given set of cosmological initial conditions. Our model shows a remarkable performance when compared with the Sherwood hydrodynamic simulations: it reproduces the flux distribution, the Ly-$α$ - dark matter halo cross-correlations, and the count-in-cell statistics at the percent level down to scales of a few Mpc. Our work provides crucial tools that bridge analytic modeling on large scales with simulations on small-scales, enabling field-level inference from Ly-$α$ forest data and simulation-based priors for cosmological analyses. This is especially timely for realizing the full scientific potential of the Ly-$α$ forest measurements by the Dark Energy Spectroscopic Instrument.
Concepts
The Big Picture
Imagine trying to read the universe’s autobiography, written in light. When we observe distant quasars, the blazing cores of galaxies billions of light-years away, their light has traveled for eons, passing through vast clouds of diffuse gas. Along the way, pockets of neutral hydrogen absorb certain wavelengths, leaving a characteristic pattern of dark lines stamped onto the spectrum. This pattern, called the Lyman-α forest, is one of the richest signals in observational cosmology.
Read carefully, it encodes the temperature of intergalactic gas, the nature of dark matter, the mass of neutrinos, even hints about dark energy.
The problem? We’re about to be flooded with more of this data than we know how to use. The Dark Energy Spectroscopic Instrument (DESI) is collecting spectra from roughly one million quasars over its five-year run. Current theoretical tools summarize the forest by averaging how similar any two points are across the sky, compressing all that three-dimensional information into a single number and throwing away enormous amounts of potential signal. It’s like trying to reconstruct a symphony from only the average volume at each frequency.
A team from MIT, Lawrence Berkeley, KEK, and the Institute for Advanced Study has now built a new kind of theoretical model, one that works at the full field level. Rather than summarizing the Lyman-α forest statistically, it predicts the entire thing in three dimensions from first principles, point by point.
Key Insight: Instead of comparing only statistical averages, this new analytic model matches the actual amplitudes and phases of every Fourier mode in the Lyman-α forest, passing a far stricter test of accuracy than any previous approach.
How It Works
The core idea is effective field theory (EFT), a framework borrowed from particle physics. EFT doesn’t require knowing all the microscopic physics in detail. It organizes predictions by scale: describe large-scale behavior using symmetry principles and dimensional analysis, then account for smaller-scale complexity through free parameters calibrated to simulations.
The model starts with a deceptively simple equation. The transmitted flux fluctuation, how much the forest deviates from its average absorption at any given point, is written as a sum of contributions from the underlying dark matter density field and its line-of-sight velocity gradient. These two terms come with bias parameters that capture how closely the gas traces the dark matter distribution.
The Lyman-α forest has an important wrinkle. Absorption happens along lines of sight to quasars, so the physics depends strongly on direction. The model therefore includes line-of-sight operators, mathematical terms accounting for this broken symmetry. Space no longer looks the same from every angle.

Beyond the linear model, the team adds three layers of higher-order corrections:
- Nonlinear bias terms, capturing how denser regions absorb more light than a simple proportional relationship would predict
- Higher-derivative operators, accounting for gas responding to its environment over a range of scales, not just locally
- Stochastic noise terms, representing small-scale gas physics that can’t be captured analytically, acting as a source of random scatter
Validation comes from the Sherwood simulation, a high-fidelity hydrodynamic simulation of the intergalactic medium that directly evolves gas, dark matter, gravity, and radiative processes together. The test is stringent: the model must not only reproduce statistical averages like the power spectrum but match the actual spatial distribution of absorption, cell by cell, at the few-percent level down to scales of just a few megaparsecs.

The model passes. It reproduces the flux probability distribution function, the cross-correlation between the Lyman-α forest and dark matter halos, and count-in-cell statistics (a sensitive probe of how flux values vary across different regions of space) all at the percent level. It even captures the extreme tail of the flux distribution, where most of the signal from rare, heavily absorbed regions lives.
Why It Matters
Until now, Lyman-α forest analyses have relied on two-point statistics, correlations between pairs of points. But the universe’s density field carries far more information than pairwise comparisons can extract.
Field-level inference, directly comparing a predicted three-dimensional field to observations location by location, changes the game. It’s the difference between comparing fingerprints one ridge at a time versus matching the overall whorl pattern.
The model also sits at a useful boundary between two approaches. Analytic methods are fast and scale to arbitrarily large volumes but lose accuracy at small scales. Hydrodynamic simulations are accurate at small scales but can’t cover cosmological volumes. This EFT-based forward model works analytically on large scales while using simulation-calibrated parameters to incorporate small-scale physics, making it a natural fit for simulation-based inference pipelines where machine learning algorithms explore the full range of possible cosmological models.
The timing is right. DESI’s Lyman-α forest measurements are already yielding new evidence about dynamical dark energy, and next-generation instruments (DESI-II, Spec-S5) will push even further. A theoretical framework that handles the full statistical content of those datasets, not just a compressed summary, will be essential to turning raw photon counts into fundamental physics.
Bottom Line: By casting the Lyman-α forest in the language of effective field theory and validating it against state-of-the-art simulations at the few-percent level, this work lays the theoretical groundwork for a new generation of cosmological analyses, where we finally use all the information the universe is giving us.
IAIFI Research Highlights
This work brings effective field theory from particle physics together with large-scale structure cosmology and hydrodynamic simulation science to build the first analytic field-level model of the Lyman-α forest.
The forward model provides the analytically tractable likelihood function that simulation-based inference and machine-learning-driven parameter estimation need to scale Lyman-α forest analyses to full cosmological volumes.
Field-level extraction of cosmological parameters from quasar spectra creates new opportunities to constrain neutrino masses, dark matter properties, and the thermal history of the intergalactic medium across cosmic time.
Future work will connect this analytic framework directly to DESI data pipelines and extend it to smaller scales using simulation-based priors; the paper is available at [arXiv:2507.00284](https://arxiv.org/abs/2507.00284).