← Back to Timeline

Preserving New Physics while Simultaneously Unfolding All Observables

Theoretical Physics

Authors

Patrick Komiske, W. Patrick McCormack, Benjamin Nachman

Abstract

Direct searches for new particles at colliders have traditionally been factorized into model proposals by theorists and model testing by experimentalists. With the recent advent of machine learning methods that allow for the simultaneous unfolding of all observables in a given phase space region, there is a new opportunity to blur these traditional boundaries by performing searches on unfolded data. This could facilitate a research program where data are explored in their natural high dimensionality with as little model bias as possible. We study how the information about physics beyond the Standard Model is preserved by full phase space unfolding using an important physics target at the Large Hadron Collider (LHC): exotic Higgs boson decays involving hadronic final states. We find that if the signal cross section is high enough, information about the new physics is visible in the unfolded data. We will show that in some cases, quantifiably all of the information about the new physics is encoded in the unfolded data. Finally, we show that there are still many cases when the unfolding does not work fully or precisely, such as when the signal cross section is small. This study will serve as an important benchmark for enhancing unfolding methods for the LHC and beyond.

Concepts

unfolding new physics searches collider physics bsm unfolding fidelity classification simulation-based inference likelihood ratio signal detection anomaly detection standard model inverse problems event reconstruction

The Big Picture

Imagine trying to identify a rare flower in a dense forest using only a blurry photograph. That’s roughly the situation physicists face at the Large Hadron Collider (LHC). Particle detectors don’t record clean physics events. They record a smeared, distorted version of reality filtered through layers of hardware. Mathematically “unblurring” that image to recover the underlying physics is called unfolding, and for decades it has been a bottleneck in searches for new particles.

Traditional unfolding methods were designed for a world where we already know what we’re looking for. They compress data into a handful of hand-selected measurements and use simulations of known physics (the Standard Model, our best theory of matter and forces) to correct for detector distortions. But if something genuinely new is hiding in the data, something whose detector signature differs from anything we’ve simulated, the standard approach could systematically erase exactly the signal we’re trying to find.

A team from MIT and Lawrence Berkeley National Laboratory asked a sharp question: if you unfold everything at once (all particles, all their properties, the full shape of every collision event) can you preserve the fingerprint of new physics nobody was specifically looking for? Their answer, with important caveats, is yes.

Key Insight: Machine learning-powered full phase space unfolding can preserve signatures of exotic new particles in LHC data. This makes model-agnostic searches viable on corrected data, without requiring access to raw detector output, but only when signals are strong enough to survive the procedure.

How It Works

At the center of this effort is OmniFold, a machine learning method that replaces the traditional bin-and-invert correction workflow. OmniFold trains neural network classifiers to iteratively reweight simulated events until they match actual experimental data. What comes out is a corrected, detector-free snapshot of particle collisions, not for one measurement at a time, but for all measurements simultaneously.

Figure 1

The OmniFold loop runs in two alternating steps:

  1. Step 1 — Match detector-level simulation to data: Train a classifier to distinguish real detector data from simulated detector events. Use the classifier output as event weights, pushing the simulation to look like the data.
  2. Step 2 — Push weights back to particle level: Apply those detector-level weights to the corresponding particle-level simulated events. Train a second classifier to make the unweighted particle-level simulation match the newly weighted version. These particle-level weights are the unfolded result.

Repeat. Iterate. Converge.

OmniFold can tackle the full phase space (the complete description of every particle produced in every collision) because it uses particle cloud networks: neural architectures built to process any number of particles in any order, exactly as they emerge from a collision. Every particle track, every jet (a spray of particles from a quark or gluon), every photon gets included.

The benchmark physics target is deliberately tricky: exotic Higgs boson decays into a Z boson (carrier of the weak nuclear force) plus a light neutral particle that decays into ordinary matter. This produces a distinctive particle cluster buried under ordinary Z+jets production, the everyday process where a Z boson appears alongside jets. Simple bump-hunting in a one-dimensional mass spectrum would miss it.

Figure 2

Two variants were tested. MultiFold processes the full event as a point cloud, treating each collision as a swarm of particles with no assumed structure, which makes it sensitive to the complete event geometry. OmniFold operates on a compact numerical summary of the event, trading some information for computational tractability. Comparing the two reveals where information actually lives in high-dimensional event space.

Why It Matters

The results are nuanced, which is part of what makes them useful. When the signal cross section (roughly, how often the exotic decay occurs relative to all collisions) is high enough, OmniFold successfully preserves the new physics information. The authors quantify this using Fisher information, a measure of how much a dataset “knows” about a particular parameter. In favorable cases, Fisher information in the unfolded data matches the raw detector data. Nothing is lost.

Rare signals are another story. Classifiers trained overwhelmingly on Standard Model events don’t correctly model the rare signal’s detector response. Corrected data can then misrepresent the new physics in ways that are difficult to catch after the fact.

What happens when new physics is included in the simulation used for unfolding? This is a more realistic scenario where physicists have some prior suspicion of what to look for. Including the signal in the generation step improves fidelity substantially. Hybrid strategies, part model-agnostic and part model-informed, may offer the best practical path forward.

For most of the LHC’s history, the workflow has been linear: theorists propose a model, experimentalists design a targeted analysis, and the two communities meet when results are published. Full phase space unfolding breaks that pattern. Experimentalists could publish unfolded datasets that any theorist can analyze with any model, no access to raw detector data or proprietary simulation chains required.

The stakes are high. With the High-Luminosity upgrade, the LHC will produce data at rates that make exhaustive model-by-model searches increasingly impractical. Methods that preserve new physics information in a corrected, model-agnostic form could become the primary interface between raw collision data and the broader physics community. This paper provides an honest benchmark, failure modes included, that the field needs before deploying these methods at scale.

Bottom Line: OmniFold-style full phase space unfolding can preserve exotic new physics signatures in LHC data when signals are strong, making model-agnostic searches on corrected data practical. Signal rarity is still the hard problem, and solving it is where the next generation of methods will be won or lost.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work combines deep learning (neural network classifiers and particle cloud architectures) with high-energy experimental physics. AI-powered unfolding now offers a direct bridge between theorist-designed models and experimentalist-driven searches at the LHC.
Impact on Artificial Intelligence
The study tests how iterative neural reweighting methods preserve or lose information across a high-dimensional transformation, with direct lessons for scientific machine learning pipelines more broadly.
Impact on Fundamental Interactions
Full phase space unfolding can preserve signatures of exotic Higgs boson decays, broadening the LHC's ability to search for physics beyond the Standard Model without committing to a specific signal model in advance.
Outlook and References
Future work will focus on improving unfolding fidelity at low signal cross sections and integrating these methods into real experimental analyses; the paper is available at [arXiv:2105.09923](https://arxiv.org/abs/2105.09923).

Original Paper Details

Title
Preserving New Physics while Simultaneously Unfolding All Observables
arXiv ID
2105.09923
Authors
Patrick Komiske, W. Patrick McCormack, Benjamin Nachman
Abstract
Direct searches for new particles at colliders have traditionally been factorized into model proposals by theorists and model testing by experimentalists. With the recent advent of machine learning methods that allow for the simultaneous unfolding of all observables in a given phase space region, there is a new opportunity to blur these traditional boundaries by performing searches on unfolded data. This could facilitate a research program where data are explored in their natural high dimensionality with as little model bias as possible. We study how the information about physics beyond the Standard Model is preserved by full phase space unfolding using an important physics target at the Large Hadron Collider (LHC): exotic Higgs boson decays involving hadronic final states. We find that if the signal cross section is high enough, information about the new physics is visible in the unfolded data. We will show that in some cases, quantifiably all of the information about the new physics is encoded in the unfolded data. Finally, we show that there are still many cases when the unfolding does not work fully or precisely, such as when the signal cross section is small. This study will serve as an important benchmark for enhancing unfolding methods for the LHC and beyond.