Product Manifold Machine Learning for Physics
Authors
Nathaniel S. Woodward, Sang Eon Park, Gaia Grosso, Jeffrey Krupa, Philip Harris
Abstract
Physical data are representations of the fundamental laws governing the Universe, hiding complex compositional structures often well captured by hierarchical graphs. Hyperbolic spaces are endowed with a non-Euclidean geometry that naturally embeds those structures. To leverage the benefits of non-Euclidean geometries in representing natural data we develop machine learning on $\mathcal P \mathcal M$ spaces, Cartesian products of constant curvature Riemannian manifolds. As a use case we consider the classification of "jets", sprays of hadrons and other subatomic particles produced by the hadronization of quarks and gluons in collider experiments. We compare the performance of $\mathcal P \mathcal M$-MLP and $\mathcal P \mathcal M$-Transformer models across several possible representations. Our experiments show that $\mathcal P \mathcal M$ representations generally perform equal or better to fully Euclidean models of similar size, with the most significant gains found for highly hierarchical jets and small models. We discover significant correlation between the degree of hierarchical structure at a per-jet level and classification performance with the $\mathcal P \mathcal M$-Transformer in top tagging benchmarks. This is a promising result highlighting a potential direction for further improving machine learning model performance through tailoring geometric representation at a per-sample level in hierarchical datasets. These results reinforce the view of geometric representation as a key parameter in maximizing both performance and efficiency of machine learning on natural data.
Concepts
The Big Picture
Imagine trying to draw a family tree on a flat piece of paper. The further back you go, the more ancestors you have, and branches multiply exponentially. Sooner or later, the page runs out of room. Physicists face exactly this problem when teaching machines to understand particle jets: the geometry of the data doesn’t fit the geometry of the math.
When a quark or gluon gets knocked loose in a collider like the Large Hadron Collider, it doesn’t travel alone. It breaks apart in a chain reaction, splitting into daughter particles, which split again and again, producing a tight spray of hundreds of particles called a jet. The process is hierarchical, like a family tree written in subatomic ink.
Standard machine learning operates in flat Euclidean space, the ordinary geometry of straight lines and right angles, and it struggles to capture branching, tree-like data. A team from MIT’s Laboratory for Nuclear Science and IAIFI decided to fix that mismatch at its root. Rather than forcing jet data into flat geometry, they built models that operate on curved spaces matched to the data’s natural shape. Matching geometry to structure, it turns out, makes a measurable difference in classification performance.
Key Insight: Embedding jet data in curved, non-Euclidean spaces that accommodate hierarchical structure yields better classification performance with smaller models, especially for the most complex, deeply branching jets.
How It Works
The core idea is the product manifold (PM) space: a combination of several constant-curvature geometric spaces, mixed and matched to represent different aspects of the data. Think of it as blending a saddle-shaped surface (hyperbolic space, which expands exponentially and fits tree-like data), a flat plane (Euclidean space, for local structure), and a sphere (positive curvature, for cyclic relationships). Each component handles different features of the jet.
Hyperbolic space does the heavy lifting. Available room grows exponentially with distance from any point, mirroring how a branching tree grows: at each level, the number of branches multiplies. Fitting a deep tree into flat space requires enormous distortion. Hyperbolic space accommodates it almost for free.
The researchers quantify how tree-like a dataset is using Gromov-δ hyperbolicity. Small δ means highly tree-like; large δ means more tangled.

The team built two architectures that operate natively in curved spaces:
- PM-MLP: A multilayer perceptron where each layer’s operations are generalized to product manifolds, using non-Euclidean analogs of addition, distance, and aggregation.
- PM-Transformer: A transformer (the same family behind large language models) extended to product manifold representations. It processes each particle individually, then aggregates into a jet-level representation.
Both were tested on JetClass, a large-scale dataset of simulated jets spanning ten classes: top quarks, Higgs bosons decaying to quark pairs, quark/gluon jets, and more. The experiments varied PM combinations from pure Euclidean to pure hyperbolic to mixed, looking for which geometry worked best for which jet type.

PM representations matched or outperformed Euclidean models of similar parameter count. The largest gains appeared for the most hierarchical jet types and for small models. When you’re tight on compute, curved space does more work with fewer parameters.
Why It Matters
The real payoff isn’t that PM spaces help on average. It’s when they help. The researchers measured the Gromov-δ hyperbolicity of individual jets and found a clear correlation: jets with lower δ (more tree-like, more hierarchical) are classified more accurately by the PM-Transformer than by its Euclidean counterpart. The geometry of the model and the geometry of the data are genuinely aligned.

If the benefit of non-Euclidean geometry tracks the hierarchical structure of individual data points, future models could adapt their geometry on the fly, weighting different manifold components based on how tree-like each jet is. That would be a new kind of inductive bias: not architecture or training data, but the shape of the mathematical space itself, tuned per input.
The same principle applies outside particle physics. Biological data like protein interaction networks, evolutionary trees, and neural connectomes are all hierarchical. So are social networks and language structures. Any domain where data branches and stratifies stands to gain from this kind of geometric matching. Jets just happen to be a clean test case because the branching physics is well understood.
Bottom Line: Matching the geometry of a machine learning model to the hierarchical geometry of physical data isn’t just mathematically elegant. It measurably improves performance, and the improvement is largest exactly where it matters most: deeply branching jets and tight computational budgets.
IAIFI Research Highlights
This work connects differential geometry and Riemannian manifold theory to deep learning for experimental particle physics. Abstract mathematical structure translates directly into better classification on one of collider physics' core problems.
The PM-Transformer shows that transformer architectures can be extended to non-Euclidean product manifolds without losing generality. The same approach could apply to any hierarchical dataset, from particle jets to biological networks.
Better jet classification directly improves sensitivity in searches for new physics at the LHC, including Higgs boson measurements and signatures of dark matter production, by more accurately distinguishing signal jets from QCD background.
The per-sample correlation between classification accuracy and Gromov-δ hyperbolicity points toward adaptive-geometry models that tune their manifold representation per input. The full paper is available at [arXiv:2412.07033](https://arxiv.org/abs/2412.07033).
Original Paper Details
Product Manifold Machine Learning for Physics
2412.07033
Nathaniel S. Woodward, Sang Eon Park, Gaia Grosso, Jeffrey Krupa, Philip Harris