← Back to Timeline

B-jet Tagging Using a Hybrid Edge Convolution and Transformer Architecture

Experimental Physics

Authors

Diego F. Vasquez Plaza, Vidya Manian

Abstract

Jet flavor tagging plays an important role in precise Standard Model measurement enabling the extraction of mass dependence in jet-quark interaction and quark-gluon plasma (QGP) interactions. They also enable inferring the nature of particles produced in high-energy particle collisions that contain heavy quarks. The classification of bottom jets is vital for exploring new Physics scenarios in proton-proton collisions. In this research, we present a hybrid deep learning architecture that integrates edge convolutions with transformer self-attention mechanisms, into one single architecture called the Edge Convolution Transformer (ECT) model for bottom-quark jet tagging. ECT processes track-level features (impact parameters, momentum, and their significances) alongside jet-level observables (vertex information and kinematics) to achieve state-of-the-art performance. The study utilizes the ATLAS simulation dataset. We demonstrate that ECT achieves 0.9333 AUC for b-jet versus combined charm and light jet discrimination, surpassing ParticleNet (0.8904 AUC) and the pure transformer baseline (0.9216 AUC). The model maintains inference latency below 0.060 ms per jet on modern GPUs, meeting the stringent requirements for real-time event selection at the LHC. Our results demonstrate that hybrid architectures combining local and global features offer superior performance for challenging jet classification tasks. The proposed architecture achieves good results in b-jet tagging, particularly excelling in charm jet rejection (the most challenging task), while maintaining competitive light-jet discrimination comparable to pure transformer models.

Concepts

jet physics b-jet tagging graph neural networks transformers hybrid graph-transformer attention mechanisms particle tracking classification collider physics secondary vertex trigger systems geometric deep learning feature extraction

The Big Picture

Imagine trying to identify a single ingredient in a bowl of soup after it’s already been blended. That’s roughly the challenge physicists face when a proton-proton collision at the Large Hadron Collider sprays thousands of particles outward in a fraction of a second. These cascades, called jets, carry fingerprints of the fundamental particles that created them. Reading those fingerprints in real time, while the LHC generates millions of collisions per second, is one of the hardest data problems in modern physics.

The flavor of a jet matters enormously. Jets born from bottom quarks (b-jets) are central to Higgs boson physics, top quark studies, and searches for supersymmetry, a theoretical framework predicting a heavier partner for every known particle. But b-jets look disturbingly similar to jets from charm quarks (c-jets), making the two notoriously hard to tell apart.

The key clue is a tiny displacement. Particles containing bottom quarks travel about 460 micrometers before decaying, leaving tracks that don’t quite point back to the original collision point. That’s roughly half the width of a human hair. Charm particles leave a similar signature at about 150 micrometers, so the difference is small and the confusion is real.

Diego Vasquez Plaza and Vidya Manian from the University of Puerto Rico Mayaguez built a hybrid deep learning model that combines two powerful architectural ideas into one, outperforming both individually while hitting the speed requirements for real-time LHC deployment.

Key Insight: By fusing local geometric reasoning (EdgeConv) with global pattern recognition (transformer attention) into a single architecture, the ECT model pushes b-jet tagging accuracy beyond what either approach achieves alone, especially on the hardest task: rejecting charm jets.

How It Works

The new model is called the Edge Convolution Transformer (ECT). It gives the algorithm two complementary senses: one that notices fine local structure, like feeling the texture of a surface, and one that sees the big picture all at once, like stepping back to view a whole painting.

The local sense comes from EdgeConv blocks, borrowed from the ParticleNet architecture. EdgeConv builds a K-nearest neighbor graph in the η-φ plane (a coordinate system mapping where particles land in the detector) by connecting each particle to its closest neighbors and applying convolutions along those connections. The model learns local vertex topology: which tracks cluster together, how their displacements correlate, what the nearby geometry looks like. This is exactly the kind of reasoning that helps spot a displaced secondary vertex, the point where a bottom-quark-containing particle decays slightly off-center from the original collision.

The global sense comes from multi-head self-attention, the core mechanism of transformer architectures. After the EdgeConv layers extract local features, every particle’s representation gets compared against every other particle’s simultaneously. A learned class token, a special vector that aggregates information from the whole jet, is then fused with jet-level features before the final classification.

Figure 1

The model processes two input streams:

  • Track-level features: impact parameter components and their significances, track momentum, and angular coordinates (η, φ)
  • Jet-level features: secondary vertex multiplicity, invariant mass of tracks from secondary vertices, and overall jet kinematics

Training used ATLAS simulation data across three binary classification tasks: b vs. light jets, b vs. charm jets, and b vs. the combined charm-plus-light pool.

The numbers tell a clear story. ECT achieves 0.9333 AUC on the combined b vs. (charm + light) task. AUC, or Area Under the Curve, is an accuracy metric running from 0 to 1, where 1.0 means perfect discrimination. ParticleNet, a well-established graph neural network used in CMS trigger systems, reaches only 0.8904 AUC on the same task. A pure transformer baseline, the Particle Transformer (ParT), scores 0.9216 AUC. ECT beats both by a meaningful margin.

Figure 2

The most striking gains appear on charm jet rejection, exactly where physicists need the most help. Charm jets share so much topological similarity with b-jets that pure attention models struggle to draw the boundary sharply. The EdgeConv layers give ECT a local geometric vocabulary that pure transformers lack, and that vocabulary proves decisive for disentangling subtle secondary vertex differences.

ECT doesn’t sacrifice speed for accuracy either. Inference runs in under 0.060 milliseconds per jet on modern GPUs, fast enough for real-time deployment in the LHC high-level trigger.

Figure 3

Why It Matters

The lesson here goes beyond b-jet tagging. In many scientific domains, data carries both local structure (nearby points that cluster in meaningful ways) and global structure (long-range correlations across the whole system). Graph neural networks are good at the former; transformers are good at the latter. The usual approach has been to pick one.

ECT shows that fusing both paradigms into a unified architecture, not stacking them loosely but integrating EdgeConv and attention together, captures both scales of structure without paying a latency penalty.

For the LHC physics program, better b-tagging directly enables more precise Higgs measurements, sharper searches for new particles, and cleaner studies of the quark-gluon plasma, an exotic state of matter briefly recreated in high-energy collisions. As the LHC moves toward its high-luminosity phase, an upcoming upgrade that will dramatically increase collisions per second, the combination of accuracy and speed becomes even more valuable.

Future work could extend ECT to multi-class tagging (b vs. c vs. light simultaneously), incorporate flavor-tagging uncertainties for systematic studies, or adapt the architecture for other jet substructure tasks where local-global feature fusion might pay similar dividends.

Bottom Line: ECT’s hybrid architecture achieves 0.9333 AUC on b-jet discrimination, surpassing both ParticleNet and the Particle Transformer, while running fast enough for real-time LHC trigger deployment. Combining local graph convolutions with global self-attention turns out to be a winning strategy for particle physics classification.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work connects graph neural network theory and transformer architectures with experimental particle physics, deploying a hybrid AI model on ATLAS simulation data to address a core LHC reconstruction challenge.
Impact on Artificial Intelligence
The ECT architecture shows that integrating EdgeConv local feature extraction with transformer self-attention into a unified model outperforms either paradigm alone, offering a general template for tasks that require simultaneous local and global reasoning.
Impact on Fundamental Interactions
Improved b-jet tagging with real-time capability enhances precision measurements of the Higgs boson, top quark properties, and searches for physics beyond the Standard Model in proton-proton collisions at 13 TeV.
Outlook and References
Future extensions include multi-class jet flavor tagging and deployment in the LHC high-luminosity era; the full paper is available at [arXiv:2603.21326](https://arxiv.org/abs/2603.21326).

Original Paper Details

Title
B-jet Tagging Using a Hybrid Edge Convolution and Transformer Architecture
arXiv ID
2603.21326
Authors
Diego F. Vasquez Plaza, Vidya Manian
Abstract
Jet flavor tagging plays an important role in precise Standard Model measurement enabling the extraction of mass dependence in jet-quark interaction and quark-gluon plasma (QGP) interactions. They also enable inferring the nature of particles produced in high-energy particle collisions that contain heavy quarks. The classification of bottom jets is vital for exploring new Physics scenarios in proton-proton collisions. In this research, we present a hybrid deep learning architecture that integrates edge convolutions with transformer self-attention mechanisms, into one single architecture called the Edge Convolution Transformer (ECT) model for bottom-quark jet tagging. ECT processes track-level features (impact parameters, momentum, and their significances) alongside jet-level observables (vertex information and kinematics) to achieve state-of-the-art performance. The study utilizes the ATLAS simulation dataset. We demonstrate that ECT achieves 0.9333 AUC for b-jet versus combined charm and light jet discrimination, surpassing ParticleNet (0.8904 AUC) and the pure transformer baseline (0.9216 AUC). The model maintains inference latency below 0.060 ms per jet on modern GPUs, meeting the stringent requirements for real-time event selection at the LHC. Our results demonstrate that hybrid architectures combining local and global features offer superior performance for challenging jet classification tasks. The proposed architecture achieves good results in b-jet tagging, particularly excelling in charm jet rejection (the most challenging task), while maintaining competitive light-jet discrimination comparable to pure transformer models.