← Back to Timeline

A Poisson Process AutoDecoder for X-ray Sources

Astrophysics

Authors

Yanke Song, Victoria Ashley Villar, Juan Rafael Martinez-Galarza, Steven Dillmann

Abstract

X-ray observing facilities, such as the Chandra X-ray Observatory and the eROSITA, have detected millions of astronomical sources associated with high-energy phenomena. The arrival of photons as a function of time follows a Poisson process and can vary by orders-of-magnitude, presenting obstacles for common tasks such as source classification, physical property derivation, and anomaly detection. Previous work has either failed to directly capture the Poisson nature of the data or only focuses on Poisson rate function reconstruction. In this work, we present Poisson Process AutoDecoder (PPAD). PPAD is a neural field decoder that maps fixed-length latent features to continuous Poisson rate functions across energy band and time via unsupervised learning. PPAD reconstructs the rate function and yields a representation at the same time. We demonstrate the efficacy of PPAD via reconstruction, regression, classification and anomaly detection experiments using the Chandra Source Catalog.

Concepts

autoencoders representation learning stochastic processes neural field decoding poisson rate reconstruction likelihood estimation anomaly detection embeddings self-supervised learning classification neural operators regression

The Big Picture

Imagine trying to understand a conversation where most of the words have been dropped. You catch a few syllables here, a phrase there, and from those sparse fragments you need to reconstruct what was said, classify the speaker, and flag anything unusual. That’s roughly the challenge astronomers face when studying X-ray sources.

X-ray telescopes like NASA’s Chandra X-ray Observatory don’t capture smooth, continuous streams of light. They detect individual photons, one by one. For faint sources, a telescope might collect only a handful of photons across an entire observation. Those photons still carry real information: what kind of object is emitting them (a neutron star? a black hole? a galaxy?), how the emission changes over time, and whether anything unusual is happening.

With facilities like Chandra, XMM-Newton, and eROSITA collectively cataloging roughly two million X-ray sources, no one is analyzing those by hand.

A team from Harvard, the Center for Astrophysics, and Stanford has developed a new framework called the Poisson Process AutoDecoder (PPAD). The name reflects the core idea: photons from an X-ray source arrive randomly in time, but at a predictable average rate. Mathematicians call this pattern a Poisson process. Rather than treating the randomness as noise to smooth away, PPAD learns to describe and categorize sources directly from it.

PPAD learns compact, useful descriptions of X-ray sources directly from raw photon arrival data, without ever needing human-assigned labels, by modeling the data as what it actually is: a Poisson process.

How It Works

When a photon arrives at Chandra’s detector, the telescope records two things: when it arrived and how much energy it had. This produces what astronomers call an event file, a list of timestamps and energies.

The underlying process is Poisson: each photon arrives unpredictably, but the average arrival rate tells you something real about the source. Think of raindrops hitting a window. Each drop is random, but the rate per minute reflects actual weather. Traditional machine learning approaches sidestep all of this by binning data into histograms, turning the list into a grid of counts. That throws away information, especially for faint sources where a single bin might contain zero or one photon.

PPAD takes a different approach. At its core is a neural field, a neural network that represents a continuous, smooth function rather than a discrete lookup table. Instead of counting photons per bin, the model learns a smooth function that, given any moment in time and any energy band, outputs the expected photon arrival rate. This captures fine structure in source variability without committing to any particular time resolution.

Figure 1

The learning architecture uses an autodecoder, a twist on the familiar autoencoder. In a standard autoencoder, an encoder compresses input data into a latent vector (a compact numerical fingerprint of the input), and a decoder reconstructs the data from that fingerprint. PPAD drops the encoder entirely. Each X-ray source gets its own learnable latent vector, initialized randomly and refined during training.

A single shared decoder network takes these latent vectors and reconstructs the original photon arrival patterns. It optimizes both its own weights and all the latent vectors at once. When training converges, each latent vector has become a compact, meaningful description of its source, all learned without labels.

The loss function is what makes this physically principled. Rather than mean squared error (which assumes Gaussian noise), PPAD uses a Poisson log-likelihood, the statistically correct way to score predictions for count data. A total variation penalty keeps the reconstructed rate functions smooth rather than artificially spiky.

Figure 2

Results on the Chandra Source Catalog hold up across multiple tasks:

  • Reconstruction: PPAD accurately recovers light curves and spectral shapes even for faint sources with very few photons, outperforming binning-based approaches.
  • Regression: The learned latent representations correlate strongly with physical properties like X-ray luminosity and spectral hardness, properties that normally require specialized analysis.
  • Classification: When tested on sources with known types, the latent vectors cluster meaningfully. Simple classifiers trained on these clusters distinguish source classes with competitive accuracy.
  • Anomaly detection: Sources whose photon arrival patterns don’t fit the general population register as outliers in latent space, allowing automated flagging without anyone having to define what “unusual” looks like in advance.

Figure 3

Why It Matters

For astronomers, PPAD operates at the scale modern surveys demand: feed millions of raw event files in, get a structured, searchable catalog of source representations out. No labels required, no manual feature engineering, and no bin-size choices that might wash out real physical signals.

The framework handles rare or surprising objects especially well. Anomaly detection in latent space flags unusual sources automatically, without anyone having to know what to look for ahead of time. That matters for the kind of serendipitous discoveries that large surveys occasionally turn up.

On the machine learning side, PPAD shows what happens when you match your statistical model to the actual data-generating process. The Poisson likelihood isn’t a physics-specific trick. It’s the right tool whenever your data consists of event counts, whether that’s click-stream analysis, neuroscience spike trains, or particle physics detectors. The autodecoder is also a practical choice when input data has no fixed structure, since you never need to build an encoder.

Open questions remain. The current work focuses on the Chandra Source Catalog; extending PPAD to the larger eROSITA dataset, or to multi-telescope joint observations, would be a natural next step. Adding multi-wavelength data (optical, radio, infrared) could yield richer representations. And as new X-ray facilities come online, this kind of statistically principled automation will only become more necessary.

PPAD turns the statistical challenge of sparse X-ray photon data into a feature rather than a bug, learning physically meaningful source representations directly from raw event files, with no labels, no binning, and no approximations of the underlying physics.


IAIFI Research Highlights

Interdisciplinary Research Achievement
The work brings together neural field methods from computer vision and Poisson statistics from physics, producing a tool that respects the quantum nature of light detection while drawing on modern representation learning.
Impact on Artificial Intelligence
Swapping generic loss functions for domain-appropriate likelihoods turns out to substantially improve learned representations. The principle applies to any count-data domain, not just astrophysics.
Impact on Fundamental Interactions
Label-free analysis of millions of X-ray sources at scale speeds up discovery at major observatories and makes it easier to identify new classes of high-energy astrophysical objects.
Outlook and References
Future extensions to multi-telescope and multi-wavelength datasets could make PPAD a standard tool for next-generation X-ray astronomy; the paper is available at [arXiv:2502.01627](https://arxiv.org/abs/2502.01627).