Variational diffusion transformers for conditional sampling of supernovae spectra
Authors
Yunyi Shen, Alexander T. Gagliano
Abstract
Type Ia Supernovae (SNe Ia) have become the most precise distance indicators in astrophysics due to their incredible observational homogeneity. Increasing discovery rates, however, have revealed multiple sub-populations with spectroscopic properties that are both diverse and difficult to interpret using existing physical models. These peculiar events are hard to identify from sparsely sampled observations and can introduce systematics in cosmological analyses if not flagged early; they are also of broader importance for building a cohesive understanding of thermonuclear explosions. In this work, we introduce DiTSNe-Ia, a variational diffusion-based generative model conditioned on light curve observations and trained to reproduce the observed spectral diversity of SNe Ia. In experiments with realistic light curves and spectra from radiative transfer simulations, DiTSNe-Ia achieves significantly more accurate reconstructions than the widely used SALT3 templates across a broad range of observation phases (from 10 days before peak light to 30 days after it). DiTSNe-Ia yields a mean squared error of 0.108 across all phases-five times lower than SALT3's 0.508-and an after-peak error of just 0.0191, an order of magnitude smaller than SALT3's 0.305. Additionally, our model produces well-calibrated credible intervals with near-nominal coverage, particularly at post-peak phases. DiTSNe-Ia is a powerful tool for rapidly inferring the spectral properties of SNe Ia and other transient astrophysical phenomena for which a physical description does not yet exist.
Concepts
The Big Picture
Supernovae give astronomers two kinds of data: a light curve (how the explosion brightens and fades over time) and a spectrum (a wavelength-by-wavelength chemical fingerprint captured at a single moment). Light curves are cheap to collect. Spectra, which carry the real physical insight, require expensive telescope time and often arrive too late, if they arrive at all.
This gap is about to get much worse. The Vera C. Rubin Observatory, now beginning its decade-long Legacy Survey of Space and Time (LSST), will discover roughly one million supernovae per year. Spectroscopic follow-up can cover at most a tenth of a percent of those events. Without a way to infer spectra from light curves alone, astronomers face a flood of explosions they can’t classify, including “peculiar” subtypes that quietly corrupt the distance measurements used to study dark energy.
Yunyi Shen and Alexander Gagliano at MIT and IAIFI have built a model called DiTSNe-Ia that predicts full supernova spectra directly from light curve data. It uses a generative diffusion model and beats existing templates by a wide margin.
Key Insight: DiTSNe-Ia achieves five times lower reconstruction error than the standard SALT3 template model. At post-peak phases, the improvement reaches an order of magnitude, all from photometry alone, no spectrograph required.
How It Works
The core problem is translation. Light curves and spectra both encode the same underlying explosion, but they slice it differently: one through time, one through wavelength. DiTSNe-Ia learns the mapping from time-domain photometry to wavelength-domain spectra, conditioned on the phase of observation.

The architecture combines two ideas from modern deep learning. A variational diffusion model starts from random noise and iteratively refines it into a realistic spectrum. Because it samples from a probability distribution rather than predicting a single point estimate, each prediction comes with calibrated uncertainty bounds.
The denoising engine is a Diffusion Transformer (DiT). Transformers handle irregularly spaced sequences well, and that describes astronomical data perfectly: observations come at odd cadences, through different filters, at arbitrary wavelengths.
The model takes three inputs at once:
- Light curve observations: brightness measurements through multiple color filters, each tagged with filter type, phase, and magnitude
- Observation phase: days before or after maximum light
- Diffusion timestep: the internal noise level in the generative process
Instead of concatenating these signals, DiTSNe-Ia uses cross-attention, letting the spectrum under construction query relevant light curve information at each network layer. Positional information (wavelength and time) is encoded with sinusoidal embeddings before entering the network, borrowing the original transformer paper’s trick for handling irregular positions.

Training and evaluation used realistic radiative transfer simulations of light passing through exploding stellar material, covering phases from 10 days before peak brightness to 30 days after. Across all phases, DiTSNe-Ia achieves a mean squared error of 0.108, compared to SALT3’s 0.508. After peak, the gap is even starker: 0.0191 versus 0.305. The model’s uncertainty intervals are well calibrated too, meaning its stated confidence tracks actual predictive accuracy.

Why It Matters
When LSST alerts start rolling in, astronomers will have hours to decide which events deserve scarce spectroscopic follow-up. DiTSNe-Ia can flag the strangest candidates (underluminous 1991bg-like events, overluminous 1991T-likes) from light curves alone. If these outliers slip through undetected, they silently bias the distance measurements behind our best dark energy constraints.

There’s a deeper point here about methodology. SALT3, the current workhorse, is a hand-built template that encodes known spectral correlations. It handles normal supernovae well enough but breaks down on diverse, physically interesting outliers. Those are exactly the cases where accurate classification matters most.
DiTSNe-Ia makes no such assumptions. It learns the full distribution of spectral behaviors from data, including behaviors that existing physical models can’t yet explain. That flexibility makes it a blueprint for any astrophysical transient (gamma-ray bursts, tidal disruption events, kilonovae) where a complete physical theory doesn’t yet exist.


Bottom Line: Generative diffusion models can reconstruct detailed physical spectra from sparse photometric data far more accurately than hand-crafted templates, putting rapid, unbiased classification of millions of LSST-era supernovae within reach.
IAIFI Research Highlights
This work brings transformer-based generative models, originally developed for language and image generation, to bear on a fundamental observational gap in supernova astronomy.
DiTSNe-Ia introduces a practical architecture for conditional generation over irregularly sampled scientific sequences, combining variational diffusion with cross-attention in a way that extends to any domain with sparse, heterogeneous observations.
Rapid spectral inference from photometry alone addresses a real bottleneck in the cosmological distance ladder, with implications for precision dark energy measurements and the physics of thermonuclear explosions.
Future work will extend DiTSNe-Ia to observed supernova spectra and other transient classes. The paper is available at [arXiv:2505.03063](https://arxiv.org/abs/2505.03063) and represents a collaboration between MIT and IAIFI.
Original Paper Details
Variational diffusion transformers for conditional sampling of supernovae spectra
[arXiv:2505.03063](https://arxiv.org/abs/2505.03063)
Yunyi Shen, Alexander T. Gagliano
Type Ia Supernovae (SNe Ia) have become the most precise distance indicators in astrophysics due to their incredible observational homogeneity. Increasing discovery rates, however, have revealed multiple sub-populations with spectroscopic properties that are both diverse and difficult to interpret using existing physical models. These peculiar events are hard to identify from sparsely sampled observations and can introduce systematics in cosmological analyses if not flagged early; they are also of broader importance for building a cohesive understanding of thermonuclear explosions. In this work, we introduce DiTSNe-Ia, a variational diffusion-based generative model conditioned on light curve observations and trained to reproduce the observed spectral diversity of SNe Ia. In experiments with realistic light curves and spectra from radiative transfer simulations, DiTSNe-Ia achieves significantly more accurate reconstructions than the widely used SALT3 templates across a broad range of observation phases (from 10 days before peak light to 30 days after it). DiTSNe-Ia yields a mean squared error of 0.108 across all phases-five times lower than SALT3's 0.508-and an after-peak error of just 0.0191, an order of magnitude smaller than SALT3's 0.305. Additionally, our model produces well-calibrated credible intervals with near-nominal coverage, particularly at post-peak phases. DiTSNe-Ia is a powerful tool for rapidly inferring the spectral properties of SNe Ia and other transient astrophysical phenomena for which a physical description does not yet exist.