Power Counting Energy Flow Polynomials

The Big Picture

Imagine trying to identify a person in a crowd using every possible facial measurement: the distance between each pair of freckles, every angle of every feature. You’d drown in data. Now imagine a physicist’s version of that problem, applied to the chaos of particle collisions at the Large Hadron Collider.

Every time protons smash together at CERN, the resulting debris sprays into narrow streams of particles called jets. The internal patterns of these jets, their substructure, carry fingerprints of the fundamental interactions that created them.

For decades, physicists have built specialized measurements to read those fingerprints. The field has since pushed toward complete mathematical toolkits: collections of measurements that, in principle, capture everything about a jet. One such toolkit, energy flow polynomials (EFPs), forms an overcomplete linear basis for jet observables. Each EFP probes a different combination of angular relationships between particles in the jet, organized by their degree of complexity.

Completeness comes at a steep price, though. By degree six there are 314 distinct EFPs. Using all of them is computationally painful and conceptually murky.

This paper by Pedro Cal, Jesse Thaler, and Wouter J. Waalewijn cuts through that excess. Using power counting, a classical physics technique for organizing calculations by importance, they show that the EFP toolkit contains enormous redundancy for realistic jets and that a much smaller subset performs just as well.

Key Insight: By applying power counting arguments appropriate for quark and gluon jets, the authors reduce the EFP basis by more than a factor of six at degree six (from 212 down to 22 elements), without sacrificing machine learning performance.

How It Works

EFPs rest on a clean idea: represent each observable as a graph. Nodes correspond to particles in the jet; edges encode angular correlations between them. The degree of an EFP controls how many angular correlations it probes. Computing all EFPs up to degree six means evaluating 314 such graphs over every particle pair, triplet, or quartet in a jet, and computational cost scales steeply with graph complexity.

Power counting comes from effective field theory, the physicist’s standard approach for isolating effects that matter at a given scale while systematically dropping the rest. Not all contributions are equally important. By identifying what’s “big” and what’s “small” in typical jet configurations, you can rank observables by relative importance and discard subleading terms. The authors apply three power counting schemes:

Strongly-ordered expansion: Assumes jet emissions are hierarchically ordered in both energy and angle, a realistic approximation for leading-logarithmic calculations.
1-collinear expansion: Keeps one energetic collinear emission (a particle flying nearly parallel to the jet axis) with full angular information, expanding around that configuration.
2-collinear expansion: Extends to two energetic collinear emissions, capturing next-to-leading logarithmic physics at finer precision.

Within each scheme, power counting reveals linear relationships between EFPs. Two graphically distinct EFPs may be numerically equivalent, to the precision of the approximation, once you account for how jets actually form. The authors derive these relations analytically, then test them on millions of simulated jets from Pythia, a standard Monte Carlo simulator for particle collisions.

So how much trimming does power counting buy you? For the strongly-ordered basis, the degree-six EFP count drops from 212 to just 22. The 2-collinear basis needs 37 elements, retaining more angular information, but that’s still tiny compared to the full set.

The predicted linear relations hold up in Pythia with excellent numerical agreement.

Why It Matters

The immediate payoff is practical. The authors run quark/gluon tagging (distinguishing jets initiated by quarks from those initiated by gluons) using logistic regression on both the full EFP set and their reduced bases. The reduced bases match full-set performance using a fraction of the inputs. For regression tasks predicting continuous jet properties, the story is the same.

But there’s a deeper point. EFPs were always intended as a complete basis, yet completeness without structure is just a list. Power counting gives that list a hierarchy: it tells you which observables matter most, and for which physics.

Machine learning alone can’t give you that understanding. A neural network might implicitly learn to ignore redundant EFPs; power counting explains why they’re redundant and when that redundancy breaks down.

There’s a computational bonus in the 1-collinear case, too. Computing an N-point EFP on M particles naively costs O(M^N), so the cost balloons as both particle count and observable complexity grow. Power counting lets the authors “cut open” high-complexity graphs and express them as products of simpler ones, reducing the tree-width of the underlying computation. Fewer floating-point operations per jet matter at the LHC, where jets are reconstructed millions of times per second.

The authors focus on single-prong quark and gluon jets here. Multi-prong jets from top quarks or W bosons, and whether these basis reductions could inform better neural network architectures for jet physics, remain open questions they flag for future work.

Bottom Line: Power counting transforms the EFP basis from an unwieldy 314-element list into a compact, physically motivated toolkit of 22 to 37 elements, enabling faster computation, cleaner interpretability, and undiminished machine learning performance on jet classification tasks.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work connects effective field theory techniques from high-energy physics with modern machine learning approaches to jet substructure, showing that physics-informed basis reduction matches brute-force completeness in performance while being far more compact.

Impact on Artificial Intelligence
Principled physical reasoning compresses high-dimensional feature spaces by over 6x without performance loss, offering a template for physics-guided feature selection in scientific machine learning.

Impact on Fundamental Interactions
Power counting reveals previously unrecognized linear dependencies among jet observables, providing new analytic insight into the structure of infrared-and-collinear safe observables at the LHC.

Outlook and References
Future extensions to multi-prong jets (top quarks, W/Z bosons) could expand the scope of this framework; the full paper is available at [arXiv:2205.06818](https://arxiv.org/abs/2205.06818).

Original Paper Details

Title
Power Counting Energy Flow Polynomials

arXiv ID
2205.06818

Authors
Pedro Cal, Jesse Thaler, Wouter J. Waalewijn

Abstract
Power counting is a systematic strategy for organizing collider observables and their associated theoretical calculations. In this paper, we use power counting to characterize a class of jet substructure observables called energy flow polynomials (EFPs). EFPs provide an overcomplete linear basis for infrared-and-collinear safe jet observables, but it is known that in practice, a small subset of EFPs is often sufficient for specific jet analysis tasks. By applying power counting arguments, we obtain linear relationships between EFPs that hold for quark and gluon jets to a specific order in the power counting. We test these relations in the parton shower generator Pythia, finding excellent agreement. Power counting allows us to truncate the basis of EFPs without affecting performance, which we corroborate through a study of quark-gluon tagging and regression.

Authors

Abstract

Concepts

The Big Picture

How It Works

Why It Matters

IAIFI Research Highlights

Original Paper Details