← Back to Timeline

The Ensemble Inverse Problem: Applications and Methods

Foundational AI

Authors

Zhengyan Huan, Camila Pazos, Martin Klassen, Vincent Croft, Pierre-Hugues Beauchemin, Shuchin Aeron

Abstract

We introduce a new multivariate statistical problem that we refer to as the Ensemble Inverse Problem (EIP). The aim of EIP is to invert for an ensemble that is distributed according to the pushforward of a prior under a forward process. In high energy physics (HEP), this is related to a widely known problem called unfolding, which aims to reconstruct the true physics distribution of quantities, such as momentum and angle, from measurements that are distorted by detector effects. In recent applications, the EIP also arises in full waveform inversion (FWI) and inverse imaging with unknown priors. We propose non-iterative inference-time methods that construct posterior samplers based on a new class of conditional generative models, which we call ensemble inverse generative models. For the posterior modeling, these models additionally use the ensemble information contained in the observation set on top of single measurements. Unlike existing methods, our proposed methods avoid explicit and iterative use of the forward model at inference time via training across several sets of truth-observation pairs that are consistent with the same forward model, but originate from a wide range of priors. We demonstrate that this training procedure implicitly encodes the likelihood model. The use of ensemble information helps posterior inference and enables generalization to unseen priors. We benchmark the proposed method on several synthetic and real datasets in inverse imaging, HEP, and FWI. The codes are available at https://github.com/ZhengyanHuan/The-Ensemble-Inverse-Problem--Applications-and-Methods.

Concepts

inverse problems generative models ensemble inverse learning posterior estimation unfolding simulation-based inference bayesian inference normalizing flows collider physics diffusion models density estimation full waveform inversion

The Big Picture

Imagine trying to reconstruct a signal from a broken radio transmission. You have the crackling, distorted output, but the radio itself is a black box. You can’t peer inside it, can’t run it backwards, and you only get one shot at each transmission.

Now imagine the “radio” is a multi-billion-dollar particle detector, and the “signals” are fundamental particles colliding at nearly the speed of light.

This is the daily reality of physicists at colliders like the LHC. Detectors smear, blur, and corrupt the true physics. To learn what actually happened in a collision (what particles’ true speeds and angles were) physicists must undo the detector’s distortions. The same mathematical challenge appears in earthquake seismology and medical imaging. All three fields have developed their own solutions, often in isolation, each requiring expensive repeated simulations of the very process they’re trying to undo.

Researchers at Tufts University and IAIFI have now unified these scattered problems under a single framework, the Ensemble Inverse Problem, and proposed a family of machine learning methods that solve them faster, more flexibly, and without ever re-running the physical process they’re trying to invert.

Key Insight: Treating collections of measurements as a group, rather than solving each one in isolation, lets a model learn to invert complex physical processes without simulating them at deployment. It can even generalize to distributions it has never seen before.

How It Works

The core observation is simple: in practice, you never receive just one measurement. A particle physics experiment logs millions of collision events. A seismologist records waveforms across a dense sensor array. These collections, or ensembles, carry information that no single measurement can.

Traditional approaches treat each measurement independently: given one blurry detector readout, invert the physics to recover the truth. The EIP framework asks a different question. Given the whole set of observations together, what underlying distribution of true physics must have generated them?

The researchers formalize two related problems:

  • EIP-I asks for the prior distribution, the natural spread of true physical quantities before any particular observation is taken into account.
  • EIP-II asks for the posterior: given a specific observation and the full ensemble context, what was the most likely true value for that particular event? This is a refined best-guess that combines one measurement with everything the ensemble reveals about the broader physics.

EIP-I is the coarser target; EIP-II is more precise. Solving EIP-II automatically gives you EIP-I, but not vice versa. You can approximate EIP-I with wrong posteriors that happen to integrate correctly, as Figure 1 illustrates.

Figure 1

Their solution is a new class of models called ensemble inverse generative models, trained with a three-step recipe:

  1. Collect many datasets, each generated by a different prior distribution fed through the same forward model (the physical process being inverted, whether detector response, seismic wave equations, or an imaging system).
  2. Train a conditional generative model that takes both a single observation and the full ensemble as inputs.
  3. The model learns to sample from the posterior, conditioned on all available context.

This training procedure implicitly encodes the forward model’s behavior into the network’s weights. At inference time, you never run the forward model again. No expensive simulations, no iterative loops. One forward pass through the trained network, and you have your answer.

Why It Matters

In particle physics, unfolding (the standard name for this inverse problem) is a prerequisite for nearly every precision measurement. If the LHC is going to detect subtle deviations from the Standard Model, physicists need to strip away detector artifacts without introducing biases.

Current state-of-the-art methods like OmniFold require iterative reweighting of simulated events, which is both computationally expensive and dependent on explicit access to the forward model. The EIP framework sidesteps this entirely.

Figure 2

In seismology, full waveform inversion already demands supercomputers running physics simulations for days. A method that learns the forward model implicitly and inverts instantly for new observations could change how geoscientists image the Earth’s interior. The authors show this isn’t hypothetical: they benchmark against real FWI datasets alongside synthetic tests and particle physics benchmarks, with competitive or superior performance across all three domains. The ensemble context proves especially powerful when generalizing to distributions the model never encountered during training.

There’s also a connection to in-context learning, the ability of AI systems to pick up new tasks from examples provided at query time without any retraining. Large language models do this routinely: show them a few examples in the prompt, and they adapt on the fly. The EIP framework formalizes an analogous capability for physical inverse problems. The ensemble acts as context that lets the model adapt to a new prior without modification.

Bottom Line: The Ensemble Inverse Problem gives a unified mathematical home to inverse challenges across physics, geoscience, and imaging. The proposed generative model approach solves them without forward-model simulations at inference time, while generalizing to distributions never seen during training.

IAIFI Research Highlights

Interdisciplinary Research Achievement
The paper puts particle physics unfolding, seismic full waveform inversion, and inverse imaging under a single statistical framework. Fields that have historically solved these problems in isolation can now share methods directly.
Impact on Artificial Intelligence
The ensemble inverse generative model is a new kind of conditional generative model that uses observation-set context for prior generalization, with implications for in-context learning and scalable Bayesian inference beyond physics applications.
Impact on Fundamental Interactions
Removing the need for explicit forward-model simulation at inference time could speed up precision measurements at particle colliders, easing a major computational bottleneck in Standard Model tests and searches for new physics.
Outlook and References
Future work could scale to higher-dimensional physics observables and tighter integration with LHC data pipelines; the paper is available as [arXiv:2601.22029](https://arxiv.org/abs/2601.22029).

Original Paper Details

Title
The Ensemble Inverse Problem: Applications and Methods
arXiv ID
2601.22029
Authors
["Zhengyan Huan", "Camila Pazos", "Martin Klassen", "Vincent Croft", "Pierre-Hugues Beauchemin", "Shuchin Aeron"]
Abstract
We introduce a new multivariate statistical problem that we refer to as the Ensemble Inverse Problem (EIP). The aim of EIP is to invert for an ensemble that is distributed according to the pushforward of a prior under a forward process. In high energy physics (HEP), this is related to a widely known problem called unfolding, which aims to reconstruct the true physics distribution of quantities, such as momentum and angle, from measurements that are distorted by detector effects. In recent applications, the EIP also arises in full waveform inversion (FWI) and inverse imaging with unknown priors. We propose non-iterative inference-time methods that construct posterior samplers based on a new class of conditional generative models, which we call ensemble inverse generative models. For the posterior modeling, these models additionally use the ensemble information contained in the observation set on top of single measurements. Unlike existing methods, our proposed methods avoid explicit and iterative use of the forward model at inference time via training across several sets of truth-observation pairs that are consistent with the same forward model, but originate from a wide range of priors. We demonstrate that this training procedure implicitly encodes the likelihood model. The use of ensemble information helps posterior inference and enables generalization to unseen priors. We benchmark the proposed method on several synthetic and real datasets in inverse imaging, HEP, and FWI. The codes are available at https://github.com/ZhengyanHuan/The-Ensemble-Inverse-Problem--Applications-and-Methods.