← Back to Timeline

Differentiable Stochastic Halo Occupation Distribution with Galaxy Intrinsic Alignments

Astrophysics

Authors

Sneh Pandya, Jonathan Blazek

Abstract

We present diffHOD-IA, a fully differentiable implementation of a halo occupation distribution (HOD) model that incorporates galaxy intrinsic alignments (IA). Motivated by the diffHOD framework, we create a new implementation that extends differentiable galaxy population modeling to include orientation-dependent statistics crucial for weak gravitational lensing analyses. Our implementation combines this HOD formulation with an IA model, enabling end-to-end automatic differentiation from HOD and IA parameters through to the galaxy field. We additionally extend this framework to differentiably model two-point correlation functions, including galaxy clustering and IA statistics. We validate diffHOD-IA against the reference halotools-IA implementation using the Bolshoi-Planck simulation, demonstrating excellent agreement across both one-point and two-point statistics. We verify the accuracy of gradients computed via automatic differentiation by comparison with finite-difference estimates for both HOD and IA parameters. We present science use cases leveraging gradients in the simulations to recover the IA parameters of a galaxy field representative of the TNG300 simulation. Finally, we apply diffHOD-IA in a Hamiltonian Monte Carlo analysis and compare its performance with halotools-IA and a neural-network-based emulator, IAEmu. Unlike emulator-based approaches for statistics, diffHOD-IA provides differentiability at the galaxy catalog level, enabling integration into field-level inference pipelines and extension to arbitrary summary statistics for next-generation weak-lensing analyses. Our code is publicly available.

Concepts

halo occupation distribution intrinsic alignments differentiable galaxy modeling simulation-based inference bayesian inference monte carlo methods stochastic processes cosmological simulation posterior estimation dark matter emulation inverse problems hamiltonian systems

The Big Picture

Imagine trying to read a blurry photograph of the universe. Every galaxy in that photograph is slightly distorted, some stretched, some compressed, by the gravitational pull of all the matter between us and them. That distortion is the signal cosmologists call weak gravitational lensing, and it’s one of the most powerful tools we have for mapping the invisible dark matter that shapes the cosmos.

But there’s a catch: galaxies weren’t perfectly round to begin with. They came into the universe already tilted and stretched by the same dark matter structures that later bent their light. Untangling these two effects, lensing versus born-that-way shape, is one of the central challenges of modern cosmology.

The “born-that-way” effect has a name: intrinsic alignments (IA). Galaxies don’t point randomly; they tend to align with the vast web of dark matter threads and filaments (the cosmic web) that they formed inside. As next-generation telescopes like the Rubin Observatory, Euclid, and the Nancy Grace Roman Space Telescope prepare to image billions of galaxies, getting IA modeling right is non-negotiable. A bad IA model contaminates the cosmological signal and biases our understanding of dark energy, dark matter, and the universe’s expansion history.

Researchers Sneh Pandya and Jonathan Blazek at Northeastern University have built diffHOD-IA, a fully differentiable simulation of how galaxies populate dark matter halos, complete with realistic orientations. “Differentiable” means the simulation can compute gradients, mathematical measures of how outputs change when you tweak any input, which enables far more efficient parameter searches than blind trial-and-error.

Key Insight: By making galaxy simulations differentiable end-to-end, diffHOD-IA unlocks gradient-based inference methods that can be dramatically faster than traditional approaches, an increasingly important capability as next-generation surveys push into previously intractable regions of parameter space.

How It Works

To understand why differentiability matters, consider the traditional approach. Cosmologists use a framework called the Halo Occupation Distribution (HOD), a statistical recipe that says: given a dark matter halo of a certain mass, how many galaxies live inside it? HOD models populate simulated halos with galaxies, and by tuning a handful of parameters, you can reproduce the observed clustering of real galaxies.

The problem? Populating halos involves random draws, flipping coins, rolling dice, to decide which halos get galaxies. Those yes/no steps break differentiation: the math can’t calculate how changing a parameter would affect a binary outcome. You can’t take a gradient through a coin flip.

So scientists are stuck with brute-force methods like Markov Chain Monte Carlo (MCMC), which explores parameter space one random step at a time. For the kind of high-dimensional, high-precision analyses that next-generation surveys will demand, MCMC becomes prohibitively expensive. We’re talking potentially years of wall-clock compute for a single full weak-lensing analysis.

diffHOD-IA solves this with the Gumbel-Softmax trick. Instead of drawing discrete yes/no samples for whether a halo hosts a galaxy, the trick produces a continuous, smooth approximation of that random draw. It’s close enough to the real thing for science, but differentiable. Gradients can now flow backward through the entire simulation, from summary statistics all the way back to the physical parameters governing galaxy populations and orientations.

Figure 1

The IA component adds another layer. Galaxy orientations are modeled using the Dimroth-Watson distribution, a probability distribution describing how strongly a galaxy’s shape aligns with its host dark matter halo. Pandya and Blazek implemented a differentiable sampler using inverse CDF methods, a technique that converts simple random numbers into galaxy orientations by running them through a precisely constructed mathematical function while preserving the ability to compute gradients. Differentiating through galaxy orientation assignments works just as smoothly as through galaxy counts.

The full pipeline:

  1. Start with a dark matter halo catalog from an N-body simulation (Bolshoi-Planck)
  2. Differentiably assign central and satellite galaxies to halos using HOD parameters
  3. Differentiably assign each galaxy an orientation based on IA parameters and halo shape
  4. Differentiably compute two-point statistics: galaxy clustering, position-orientation correlations, and orientation-orientation correlations

The team validated diffHOD-IA against the established halotools-IA reference implementation, finding agreement within sample variance for galaxy number counts and less than 2% error in two-point statistics. They also verified that gradients computed via automatic differentiation (where code automatically calculates exact mathematical derivatives) match finite-difference estimates (which measure how outputs shift when you slightly nudge each input). Agreement between the two is the gold standard check for correctness.

Figure 2

The real payoff shows up in the inference comparison. When recovering IA parameters from a galaxy field matching TNG300, a well-validated reference simulation of a large cosmic volume, diffHOD-IA paired with Hamiltonian Monte Carlo (HMC) significantly outperforms both standard MCMC with halotools-IA and a neural-network emulator called IAEmu. HMC exploits gradients to make intelligent leaps through parameter space rather than stumbling randomly. The advantage shows clearly in the results.

Figure 3

Why It Matters

Most emulator-based approaches give you differentiability only at the level of summary statistics. You train a neural network on correlation functions, and it spits out correlation functions.

diffHOD-IA is differentiable at the catalog level: gradients flow with respect to the actual galaxy field, not a compressed summary of it. That opens the door to field-level inference, where cosmologists fit models directly to the full spatial distribution of galaxies rather than discarding information by compressing to a handful of numbers.

Why does this matter? The next generation of surveys will drown us in data. Rubin alone will image roughly 20 billion galaxies. Picking a few summary statistics and fitting them with MCMC will leave enormous cosmological information on the table. Field-level inference with differentiable simulators is a fundamentally different philosophy: use all the data, use all the gradients, trust the physics. diffHOD-IA is one of the first tools to combine galaxy clustering and intrinsic alignments within this differentiable paradigm, and the team has made the code publicly available for the community to build on.

Bottom Line: diffHOD-IA makes galaxy simulations with intrinsic alignments fully differentiable for the first time, enabling gradient-powered inference that’s faster and more data-efficient than existing methods. As next-generation surveys demand both precision and scale, tools like this will be essential.

IAIFI Research Highlights

Interdisciplinary Research Achievement
diffHOD-IA combines deep-learning infrastructure (automatic differentiation, Gumbel-Softmax relaxations) with physical galaxy models (HOD + intrinsic alignment), showing how AI tools can directly improve the computational machinery of cosmological inference.
Impact on Artificial Intelligence
The work extends differentiable programming to stochastic galaxy population modeling, showing that continuous relaxations of discrete distributions can faithfully reproduce physical observables while enabling gradient-based optimization and HMC sampling.
Impact on Fundamental Interactions
By enabling field-level inference of galaxy clustering and intrinsic alignments jointly, diffHOD-IA provides a new way to extract unbiased cosmological constraints from weak gravitational lensing data in Stage IV surveys like Rubin, Euclid, and Roman.
Outlook and References
Future work can extend diffHOD-IA to higher-order statistics, redshift-space distortions, and full joint cosmological inference pipelines. The paper is available at [arXiv:2602.04977](https://arxiv.org/abs/2602.04977), and the code is publicly released for community use.

Original Paper Details

Title
Differentiable Stochastic Halo Occupation Distribution with Galaxy Intrinsic Alignments
arXiv ID
[2602.04977](https://arxiv.org/abs/2602.04977)
Authors
Sneh Pandya, Jonathan Blazek
Abstract
We present diffHOD-IA, a fully differentiable implementation of a halo occupation distribution (HOD) model that incorporates galaxy intrinsic alignments (IA). Motivated by the diffHOD framework, we create a new implementation that extends differentiable galaxy population modeling to include orientation-dependent statistics crucial for weak gravitational lensing analyses. Our implementation combines this HOD formulation with an IA model, enabling end-to-end automatic differentiation from HOD and IA parameters through to the galaxy field. We additionally extend this framework to differentiably model two-point correlation functions, including galaxy clustering and IA statistics. We validate diffHOD-IA against the reference halotools-IA implementation using the Bolshoi-Planck simulation, demonstrating excellent agreement across both one-point and two-point statistics. We verify the accuracy of gradients computed via automatic differentiation by comparison with finite-difference estimates for both HOD and IA parameters. We present science use cases leveraging gradients in the simulations to recover the IA parameters of a galaxy field representative of the TNG300 simulation. Finally, we apply diffHOD-IA in a Hamiltonian Monte Carlo analysis and compare its performance with halotools-IA and a neural-network-based emulator, IAEmu. Unlike emulator-based approaches for statistics, diffHOD-IA provides differentiability at the galaxy catalog level, enabling integration into field-level inference pipelines and extension to arbitrary summary statistics for next-generation weak-lensing analyses. Our code is publicly available.