Anomaly preserving contrastive neural embeddings for end-to-end model-independent searches at the LHC
Authors
Kyle Metzger, Lana Xu, Mia Sodini, Thea K. Arrestad, Katya Govorkova, Gaia Grosso, Philip Harris
Abstract
Anomaly detection - identifying deviations from Standard Model predictions - is a key challenge at the Large Hadron Collider due to the size and complexity of its datasets. This is typically addressed by transforming high-dimensional detector data into lower-dimensional, physically meaningful features. We tackle feature extraction for anomaly detection by learning powerful low-dimensional representations via contrastive neural embeddings. This approach preserves potential anomalies indicative of new physics and enables rare signal extraction using novel machine learning-based statistical methods for signal-independent hypothesis testing. We compare supervised and self-supervised contrastive learning methods, for both MLP- and Transformer-based neural embeddings, trained on the kinematic observables of physics objects in LHC collision events. The learned embeddings serve as input representations for signal-agnostic statistical detection methods in inclusive final states. We achieve significant improvement in discovery power for both rare new physics signals and rare Standard Model processes across diverse final states, demonstrating its applicability for efficiently searching for diverse signals simultaneously. We study the impact of architectural choices, contrastive loss formulations, supervision levels, and embedding dimensionality on anomaly detection performance. We show that the optimal representation for background classification does not always maximize sensitivity to new physics signals, revealing an inherent trade-off between background structure preservation and anomaly enhancement. We demonstrate that combining compression with domain knowledge for label encoding produces the most effective data representation for statistical discovery of anomalies.
Concepts
The Big Picture
Imagine searching for a single misspelled word in every book ever printed, not knowing what word you’re looking for, or even what language it might be in. That’s roughly the challenge physicists face at the Large Hadron Collider. Every second, the LHC produces about 40 million proton collisions, and somewhere in that avalanche of data might be a particle or interaction that upends our understanding of fundamental physics. The catch: you don’t know what it looks like.
This is the core problem of model-independent anomaly detection, hunting for signs of new physics without a specific theory to guide the search. Standard approaches compress the LHC’s raw sensor data into a handful of physically meaningful numbers: particle energies, momenta, masses, and directions. But measurements hand-picked to capture known physics might throw away exactly the subtle patterns that would signal something genuinely new.
A new study from MIT and ETH Zürich, led by Kyle Metzger and collaborators at IAIFI, takes a different route: let the machine learn its own compact representation of collision events, built from the ground up to preserve anomalies rather than smooth them away.
Key Insight: Contrastive neural embeddings can learn low-dimensional representations of LHC collision data that are far more sensitive to rare new physics signals than classical feature engineering, but only when the right balance is struck between organizing background events and leaving room for anomalies to stand out.
How It Works
The core idea borrows from contrastive learning, a technique that teaches a neural network what “similar” and “different” mean without explicit labels. The network sees pairs or groups of events and learns to map them into a compact geometric space. Events that look alike cluster together; unusual events get pushed to the outskirts.

The team trained two flavors of contrastive embeddings, compact mathematical fingerprints of each collision event:
- Supervised contrastive learning, where physics labels like particle types and process categories guide the clustering.
- Self-supervised contrastive learning, where the network discovers structure on its own by randomly dropping or perturbing particle objects within an event to create different “views” of the same underlying collision.
Both were tested with two neural architectures. Multilayer perceptrons (MLPs) process a fixed set of particle-level features in a standard feed-forward way. Transformers, the same architecture behind modern language models, treat each collision event as a variable-length sequence of physics objects and capture relationships between particles more naturally.
The inputs are measured kinematic properties of reconstructed physics objects: jets (sprays of particles from quark or gluon scattering), leptons (electrons, muons, and their heavier cousins), and missing energy, a signature of particles that escape the detector unseen. The network compresses all of this into an embedding of just a few dimensions.

These embeddings then feed into signal-agnostic statistical detection methods, algorithms that scan for unusual clusters in the embedding space without being told what signal to look for. Tested against a suite of benchmark new-physics signals and rare Standard Model processes, the contrastive embeddings consistently outperform hand-crafted features, showing higher sensitivity to injected rare signals at fixed false-positive rates.
Why It Matters
The most interesting finding here isn’t a benchmark number. It’s a conceptual warning.

There’s a tension between two goals that might seem aligned: learning to classify background events accurately and learning to detect anomalies. When you optimize a representation to organize known background processes as cleanly as possible, you suppress the very irregularities that make new physics stand out. A perfectly organized background leaves no room for the unexpected. The best representation for known physics is not the best representation for discovering unknown physics. Future searches need to be designed with that tension in mind.
The sweet spot turns out to be a combination of data compression, forcing the network to be selective about what it preserves, and domain knowledge, using physics-motivated labels to guide which background structures get organized. Supervised labels without compression risk encoding too much background detail. Compression without guidance can lose signal-relevant structure. Put the two together and the embeddings land in a useful middle ground.
The LHC’s Run 3 is generating data at unprecedented rates, and the High-Luminosity LHC upgrade will push this further. Model-independent searches are essential complements to targeted searches, but their power depends entirely on the quality of the representation feeding into them. A single trained embedding can simultaneously search for diverse signals across different final states, covering the full complexity of LHC collision data at once.
Open questions remain. How do these embeddings behave under detector systematics and real experimental noise? Can self-supervised versions be deployed in real-time trigger systems to catch anomalies before data is even written to disk?
Bottom Line: By learning compact, anomaly-preserving representations of LHC collision events through contrastive training, the team has built a more sensitive and broadly applicable tool for new physics searches, one that doesn’t need to know what it’s looking for.
IAIFI Research Highlights
This work brings contrastive representation learning from computer vision and NLP into high-energy physics, adapting both self-supervised and supervised objectives to the particular structure of LHC collision data.
The study exposes a trade-off between background classification performance and anomaly sensitivity in learned representations. This tension is not specific to particle physics; it applies to any ML system doing open-ended anomaly detection.
The contrastive embeddings substantially improve discovery power for rare new physics signals at the LHC, giving model-independent searches a sharper tool for probing physics beyond the Standard Model.
Next steps include deploying these embeddings in real-time LHC trigger systems and testing them under realistic experimental conditions; the full study is available at [arXiv:2502.15926](https://arxiv.org/abs/2502.15926).
Original Paper Details
Anomaly preserving contrastive neural embeddings for end-to-end model-independent searches at the LHC
2502.15926
Kyle Metzger, Lana Xu, Mia Sodini, Thea K. Arrestad, Katya Govorkova, Gaia Grosso, Philip Harris
Anomaly detection - identifying deviations from Standard Model predictions - is a key challenge at the Large Hadron Collider due to the size and complexity of its datasets. This is typically addressed by transforming high-dimensional detector data into lower-dimensional, physically meaningful features. We tackle feature extraction for anomaly detection by learning powerful low-dimensional representations via contrastive neural embeddings. This approach preserves potential anomalies indicative of new physics and enables rare signal extraction using novel machine learning-based statistical methods for signal-independent hypothesis testing. We compare supervised and self-supervised contrastive learning methods, for both MLP- and Transformer-based neural embeddings, trained on the kinematic observables of physics objects in LHC collision events. The learned embeddings serve as input representations for signal-agnostic statistical detection methods in inclusive final states. We achieve significant improvement in discovery power for both rare new physics signals and rare Standard Model processes across diverse final states, demonstrating its applicability for efficiently searching for diverse signals simultaneously. We study the impact of architectural choices, contrastive loss formulations, supervision levels, and embedding dimensionality on anomaly detection performance. We show that the optimal representation for background classification does not always maximize sensitivity to new physics signals, revealing an inherent trade-off between background structure preservation and anomaly enhancement. We demonstrate that combining compression with domain knowledge for label encoding produces the most effective data representation for statistical discovery of anomalies.