How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds
Authors
Tri Nguyen, Francisco Villaescusa-Navarro, Siddharth Mishra-Sharma, Carolina Cuesta-Lazaro, Paul Torrey, Arya Farahi, Alex M. Garcia, Jonah C. Rose, Stephanie O'Neil, Mark Vogelsberger, Xuejian Shen, Cian Roche, Daniel Anglés-Alcázar, Nitya Kallivayalil, Julian B. Muñoz, Francis-Yan Cyr-Racine, Sandip Roy, Lina Necib, Kassidy E. Kollmann
Abstract
The connection between galaxies and their host dark matter (DM) halos is critical to our understanding of cosmology, galaxy formation, and DM physics. To maximize the return of upcoming cosmological surveys, we need an accurate way to model this complex relationship. Many techniques have been developed to model this connection, from Halo Occupation Distribution (HOD) to empirical and semi-analytic models to hydrodynamic. Hydrodynamic simulations can incorporate more detailed astrophysical processes but are computationally expensive; HODs, on the other hand, are computationally cheap but have limited accuracy. In this work, we present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of DM with an accuracy of hydrodynamic simulations but at a computational cost similar to HOD. By modeling galaxies/subhalos as point clouds, instead of binning or voxelization, we can resolve small spatial scales down to the resolution of the simulations. For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies. We train NeHOD on the TNG-Warm DM suite of the DREAMS project, which consists of 1024 high-resolution zoom-in hydrodynamic simulations of Milky Way-mass halos with varying warm DM mass and astrophysical parameters. We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters, including the mass functions, stellar-halo mass relations, concentration-mass relations, and spatial clustering. Our method can be used for a large variety of downstream applications, from galaxy clustering to strong lensing studies.
Concepts
The Big Picture
Imagine understanding traffic patterns in a city. You could deploy sensors everywhere, run a full physics simulation of every car, or use a statistical model trained on real data to generate realistic flows in seconds. Cosmologists face a similar problem: they need to know how galaxies cluster around dark matter halos (vast, invisible clumps of dark matter that act as gravitational scaffolding), but the most accurate tool available, hydrodynamic simulations, chews through computing resources at a staggering rate. These are supercomputer programs that model the full physics of gas, stars, and dark matter simultaneously.
The stakes are real. Rubin Observatory, the Roman Space Telescope, and DESI will soon deliver data on billions of galaxies. To extract physics from those observations, theorists need fast, accurate models of how galaxies populate dark matter halos. The gap between “fast but crude” and “accurate but slow” has been a stubborn bottleneck.
A team led by Tri Nguyen at MIT, with collaborators across IAIFI, Princeton, Harvard, and Flatiron, has built a framework that closes that gap: NeHOD, a neural network that generates realistic galaxy populations with hydrodynamic fidelity at a fraction of the computational cost.
Key Insight: NeHOD uses a diffusion model and Transformer to generate satellite galaxy populations as point clouds. It matches hydrodynamic-level accuracy at speeds comparable to traditional statistical models.
How It Works
The traditional workhorse for placing galaxies in simulations is the Halo Occupation Distribution (HOD), a set of statistical rules that say, roughly, “a halo of this mass probably hosts this many galaxies.” It’s fast, but it ignores spatial detail and the influence of baryonic physics (processes involving ordinary matter, like supernova explosions blowing gas out of galaxies). At the other extreme, hydrodynamic simulations solve full fluid dynamics equations for gas, stars, and dark matter all at once. Beautiful physics, but a single run can take millions of CPU hours.
NeHOD sits between these extremes. It learns the mapping from dark matter halo properties to galaxy populations by training on 1,024 high-resolution zoom-in simulations from the DREAMS project (the TNG-Warm DM suite). These simulations concentrate computing power on a single halo. Here, they all model Milky Way-mass halos while varying two parameters: warm dark matter particle mass and astrophysical feedback strength.
The architecture has two interlocking pieces:
- A Transformer encoder reads the dark matter halo as a cloud of particle positions and compresses it into a compact summary of the halo’s structure, mass profile, and shape.
- A variational diffusion model generates the satellite galaxy population conditioned on that summary and the simulation parameters. Starting from random noise, it iteratively refines a set of points into realistic galaxy positions, velocities, stellar masses, and concentration parameters.
Treating galaxies as a point cloud (individual points in space rather than values on a fixed grid) lets NeHOD avoid the resolution limits of grid-based approaches. It resolves spatial scales down to the simulation’s native resolution, which matters most for small satellite galaxies orbiting close to their host.
NeHOD correctly reproduces the statistics that matter: the subhalo mass function, the stellar-to-halo mass relation, the concentration-mass relation, and spatial clustering, all as a function of the simulation parameters. When warm dark matter is lighter and suppresses small-scale structure, NeHOD generates fewer low-mass satellites. Changing feedback parameters shifts stellar masses accordingly.
The model also captures correlations between properties that simpler models miss. In a real galaxy system, a satellite’s position, velocity, mass, and concentration all co-evolved under the same gravitational and baryonic forces. The diffusion model learns these joint distributions implicitly. No one has to specify them by hand.
The computational payoff is large: populating a halo with NeHOD takes seconds on a GPU. The equivalent hydrodynamic simulation takes months on a supercomputer cluster.
Why It Matters
Two major efforts in modern cosmology converge here. Surveys like SAGA and ELVES are already cataloging satellite galaxies around Milky Way analogs, with Rubin and Roman set to expand that census by orders of magnitude. At the same time, physicists are hunting for signatures of warm dark matter, a hypothetical variant whose fast-moving particles would smooth out small clumps and leave detectable gaps in satellite galaxy counts. Detecting that signal requires running thousands of forward models across wide parameter spaces. Only a fast emulator like NeHOD makes that feasible.
Accurate subhalo populations also feed directly into strong gravitational lensing studies, where a massive galaxy bends light from a distant source into arcs or rings. The exact pattern depends on the detailed matter distribution, including small satellites. The same is true for satellite kinematics, using dwarf galaxy motions to infer the mass profile of their host halo. The authors flag both as target applications.
There’s a methodological point worth pulling out, too. Point-cloud diffusion models, borrowed from machine learning work on 3D object generation, fit astrophysical structures well. Galaxies are inherently sparse and irregular in space, exactly the kind of data these architectures were built for.
Bottom Line: NeHOD delivers hydrodynamic-quality galaxy populations at HOD-level speed, making it practical to explore dark matter physics and galaxy formation across wide parameter spaces using a neural generative model trained on the DREAMS simulation suite.
IAIFI Research Highlights
This work brings variational diffusion models and Transformers from computer vision and NLP into astrophysical simulation. Point-cloud generative models, it turns out, can emulate complex hydrodynamic outputs with high fidelity.
NeHOD applies conditional variational diffusion models to scientific data in a new way, learning correlated, multi-property distributions over variable-size point sets conditioned on physical parameters.
Fast, accurate emulation of satellite galaxy populations across warm dark matter and baryonic parameter spaces speeds up the search for observational signatures of dark matter's particle nature.
Future work could extend NeHOD to larger cosmological volumes, additional dark matter models, or observational mock catalogs for upcoming surveys like Rubin LSST. The paper is available at [arXiv:2409.02980](https://arxiv.org/abs/2409.02980).
Original Paper Details
How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds
2409.02980
["Tri Nguyen", "Francisco Villaescusa-Navarro", "Siddharth Mishra-Sharma", "Carolina Cuesta-Lazaro", "Paul Torrey", "Arya Farahi", "Alex M. Garcia", "Jonah C. Rose", "Stephanie O'Neil", "Mark Vogelsberger", "Xuejian Shen", "Cian Roche", "Daniel Anglés-Alcázar", "Nitya Kallivayalil", "Julian B. Muñoz", "Francis-Yan Cyr-Racine", "Sandip Roy", "Lina Necib", "Kassidy E. Kollmann"]
The connection between galaxies and their host dark matter (DM) halos is critical to our understanding of cosmology, galaxy formation, and DM physics. To maximize the return of upcoming cosmological surveys, we need an accurate way to model this complex relationship. Many techniques have been developed to model this connection, from Halo Occupation Distribution (HOD) to empirical and semi-analytic models to hydrodynamic. Hydrodynamic simulations can incorporate more detailed astrophysical processes but are computationally expensive; HODs, on the other hand, are computationally cheap but have limited accuracy. In this work, we present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of DM with an accuracy of hydrodynamic simulations but at a computational cost similar to HOD. By modeling galaxies/subhalos as point clouds, instead of binning or voxelization, we can resolve small spatial scales down to the resolution of the simulations. For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies. We train NeHOD on the TNG-Warm DM suite of the DREAMS project, which consists of 1024 high-resolution zoom-in hydrodynamic simulations of Milky Way-mass halos with varying warm DM mass and astrophysical parameters. We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters, including the mass functions, stellar-halo mass relations, concentration-mass relations, and spatial clustering. Our method can be used for a large variety of downstream applications, from galaxy clustering to strong lensing studies.