The Compressed 3D Lyman-Alpha Forest Bispectrum
Authors
Roger de Belsunce, James M. Sullivan, Patrick McDonald
Abstract
Cosmological studies of the Lyman-Alpha (Lya) forest typically constrain parameters using two-point statistics. However, higher-order statistics, such as the three-point function (or its Fourier counterpart, the bispectrum) offer additional information and help break the degeneracy between the mean flux and power spectrum amplitude, albeit at a significant computational cost. To address this, we extend an existing highly informative compression of the bispectrum, the skew spectra, to the Lya forest. We derive the tree-level bispectrum of Lya forest fluctuations in the framework of effective field theory (EFT) directly in redshift space and validate our methodology on synthetic Lya forest data. We measure the anisotropic cross-spectra between the transmitted flux fraction and all quadratic operators arising in the bispectrum, yielding a set of 26 skew spectra. Using idealized 3D Gaussian smoothing (R=10 Mpc/h), we find good agreement (1-2 sigma level based on the statistical errors of the mocks) with the theoretical tree-level bispectrum prediction for monopole and quadrupole up to k <= 0.17 h/Mpc. To enable the cosmological analysis of Lya forest data from the currently observing Dark Energy Spectroscopic Instrument (DESI), where we cannot do 3D smoothing, we use a line-of-sight smoothing and introduce a new statistic, the shifted skew spectra. These probe non-squeezed bispectrum triangles and avoid locally applying quadratic operators to the field by displacing one copy of the field in the radial direction. Using a fixed displacement of 40 Mpc/h (and line-of-sight smoothing of 10 Mpc/h) yields a similar agreement with the theory prediction. For the special case of correlating the squared (and displaced) field with the original one, we analytically forward model the window function making this approach readily applicable to DESI data.
Concepts
The Big Picture
Imagine trying to understand the structure of a distant city by listening to echoes. A single echo tells you something, but comparing multiple echoes together tells you far more. Cosmologists face a version of this problem when studying the Lyman-alpha forest: the imprint of hydrogen gas scattered across billions of light-years, recorded as dark absorption lines in quasar spectra.
For decades, astronomers have extracted cosmic information from the Lyman-alpha forest by measuring how pairs of points in the signal relate to each other. Powerful, but incomplete. Gravity has sculpted the cosmic web into filaments, voids, and clusters, and pairwise comparisons can’t capture these complex shapes.
Getting at the hidden information means comparing three or more points at a time, picking up on asymmetries in cosmic structure that two-point statistics miss. The most natural such statistic is the bispectrum, the three-point analog of the power spectrum (which measures how strongly the signal fluctuates at different scales). The catch: computing the full bispectrum requires summing over every possible triangle you can draw between three points in the signal, orders of magnitude more expensive than the power spectrum.
A team from MIT, UC Berkeley, and Lawrence Berkeley National Laboratory has found a shortcut. By extending a compression technique called skew spectra to the three-dimensional Lyman-alpha forest, they’ve built a practical way to extract bispectrum information from current and future surveys, including DESI, which is collecting quasar spectra right now.
Key Insight: Skew spectra compress the bispectrum into a set of cross-power spectra that depend on only a single wavenumber. They’re nearly as cheap to compute as the power spectrum itself, yet retain most of the cosmological information the full bispectrum carries.
How It Works
Instead of computing the bispectrum directly, you compute cross-spectra between the original field and a set of quadratic operators applied to that field. These cross-spectra pick up how squared or mixed combinations of the field correlate with the field itself, which is exactly the non-Gaussian signal the bispectrum encodes.
For the Lyman-alpha forest, the relevant quantity is the transmitted flux fraction: how much quasar light passes through the hydrogen gas at each point. The team:
- Derived the tree-level bispectrum (the leading-order three-point statistic, valid on large scales where fluctuations are small) using the Effective Field Theory of Large-Scale Structure (EFT-LSS), a perturbative framework built on symmetry principles.
- Worked directly in redshift space, accounting for gas velocities that distort apparent positions along the line of sight, a phenomenon called redshift-space distortions.
- Identified all quadratic operators in the bispectrum expansion and measured cross-spectra between each and the original flux field, yielding 26 skew spectra.
They validated the theory against two mock datasets. The first uses synthetic 3D Lyman-alpha fields generated with second-order perturbation theory, where the theory is known to hold exactly. The second uses large-volume mocks built from AbacusSummit, a suite of N-body simulations that track gravitational clustering across cosmic time.
With idealized three-dimensional Gaussian smoothing at R = 10 Mpc/h, theory and measurement agree at the 1–2 sigma level for both monopole and quadrupole multipoles, up to k ≤ 0.17 h/Mpc.
Real data introduces a complication, though. The Lyman-alpha forest is observed as individual one-dimensional spectra along lines of sight to quasars, not as a fully sampled three-dimensional volume. DESI will collect up to one million quasar spectra over its lifetime, but standard 3D smoothing can’t be applied to such data. So the team introduced a new variant: shifted skew spectra.
The Shifted Solution
Instead of squaring the field locally, shifted skew spectra displace one copy of the field in the radial (line-of-sight) direction before computing the product. A fixed displacement of 40 Mpc/h lets the statistic probe non-squeezed bispectrum triangles (where all three sides have comparable length) while requiring only line-of-sight smoothing at 10 Mpc/h.
For the special case of correlating the squared-and-displaced field with the original field, the team derived an analytic window function treatment. Real survey data has irregular geometry and completeness masks that complicate any measurement, and this analytic forward model is what makes shifted skew spectra applicable to DESI data without approximation.
Even with conservative line-of-sight-only smoothing, the new statistic matches theoretical predictions about as well as the idealized 3D case.
Why It Matters
The Lyman-alpha forest occupies a cosmological sweet spot. It probes redshifts z = 2–5, when the universe was actively assembling its large-scale structure but perturbation theory still works. Galaxy surveys struggle to reach these redshifts. The Lyman-alpha forest does not.
Higher-order statistics like the bispectrum can break degeneracies that plague two-point analyses. The mean flux of the forest and the power spectrum amplitude are notoriously hard to separate using the power spectrum alone, but the bispectrum responds differently to each. Skew spectra give us a computationally affordable way in.
DESI is currently taking data, and future surveys like WEAVE-QSO, the Prime Focus Spectrograph, and 4MOST are on the horizon. This compressed bispectrum framework lets cosmologists extract non-Gaussian information from all of them at a fraction of the usual computational cost. The same compression approach carries over to other tracers of large-scale structure.
Bottom Line: Twenty-six skew spectra, derived from the Lyman-alpha forest bispectrum and validated against N-body simulations, provide a computationally cheap route to non-Gaussian cosmological information from DESI and future spectroscopic surveys.
IAIFI Research Highlights
This work bridges theoretical cosmology and survey data analysis, building an analytic EFT framework that applies directly to the world's largest ongoing spectroscopic survey.
Skew spectrum compression illustrates a principle familiar across data science: rather than computing an expensive full statistic, find low-dimensional projections that preserve the most information. The same idea recurs throughout scientific machine learning.
Higher-order statistics in the three-dimensional Lyman-alpha forest could sharpen constraints on neutrino masses, dark matter properties, and primordial non-Gaussianity at redshifts that current galaxy surveys cannot reach.
The analytic window function treatment makes shifted skew spectra ready for use with DESI data, and the approach is directly portable to future spectroscopic surveys. The full methodology is detailed in [arXiv:2510.23597](https://arxiv.org/abs/2510.23597).
Original Paper Details
The Compressed 3D Lyman-Alpha Forest Bispectrum
2510.23597
Roger de Belsunce, James M. Sullivan, Patrick McDonald
Cosmological studies of the Lyman-Alpha (Lya) forest typically constrain parameters using two-point statistics. However, higher-order statistics, such as the three-point function (or its Fourier counterpart, the bispectrum) offer additional information and help break the degeneracy between the mean flux and power spectrum amplitude, albeit at a significant computational cost. To address this, we extend an existing highly informative compression of the bispectrum, the skew spectra, to the Lya forest. We derive the tree-level bispectrum of Lya forest fluctuations in the framework of effective field theory (EFT) directly in redshift space and validate our methodology on synthetic Lya forest data. We measure the anisotropic cross-spectra between the transmitted flux fraction and all quadratic operators arising in the bispectrum, yielding a set of 26 skew spectra. Using idealized 3D Gaussian smoothing (R=10 Mpc/h), we find good agreement (1-2 sigma level based on the statistical errors of the mocks) with the theoretical tree-level bispectrum prediction for monopole and quadrupole up to k <= 0.17 h/Mpc. To enable the cosmological analysis of Lya forest data from the currently observing Dark Energy Spectroscopic Instrument (DESI), where we cannot do 3D smoothing, we use a line-of-sight smoothing and introduce a new statistic, the shifted skew spectra. These probe non-squeezed bispectrum triangles and avoid locally applying quadratic operators to the field by displacing one copy of the field in the radial direction. Using a fixed displacement of 40 Mpc/h (and line-of-sight smoothing of 10 Mpc/h) yields a similar agreement with the theory prediction. For the special case of correlating the squared (and displaced) field with the original one, we analytically forward model the window function making this approach readily applicable to DESI data.