← Back to Timeline

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

Foundational AI

Authors

Ho Fung Tsoi, Vladimir Loncar, Sridhara Dasu, Philip Harris

Abstract

Compact symbolic expressions have been shown to be more efficient than neural network models in terms of resource consumption and inference speed when implemented on custom hardware such as FPGAs, while maintaining comparable accuracy~\cite{tsoi2023symbolic}. These capabilities are highly valuable in environments with stringent computational resource constraints, such as high-energy physics experiments at the CERN Large Hadron Collider. However, finding compact expressions for high-dimensional datasets remains challenging due to the inherent limitations of genetic programming, the search algorithm of most symbolic regression methods. Contrary to genetic programming, the neural network approach to symbolic regression offers scalability to high-dimensional inputs and leverages gradient methods for faster equation searching. Common ways of constraining expression complexity often involve multistage pruning with fine-tuning, which can result in significant performance loss. In this work, we propose $\tt{SymbolNet}$, a neural network approach to symbolic regression specifically designed as a model compression technique, aimed at enabling low-latency inference for high-dimensional inputs on custom hardware such as FPGAs. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing symbolic regression methods that struggle with datasets containing more than $\mathcal{O}(10)$ inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).

Concepts

symbolic regression dynamic pruning sparse models model compression interpretability jet physics feature extraction scalability trigger systems regression loss function design collider physics

The Big Picture

Imagine you’re a physicist at CERN, and a particle collision happens every 25 nanoseconds. Your detector must decide in real time whether to keep or discard that event, with the compute budget of a calculator, not a supercomputer. Deep learning models are too slow, too hungry. What you want is a compact formula written on a napkin.

Symbolic regression tries to find that formula automatically. Instead of fitting data to a predefined equation, it searches the space of all possible mathematical expressions (addition, multiplication, sine, square roots) to discover the one that fits best. It’s a modern version of what Max Planck did in 1900: stare at data from glowing hot objects, find a formula that fits the curve, and accidentally invent quantum mechanics.

The problem? Classical symbolic regression algorithms choke when data has more than roughly ten input variables. Physics datasets at the LHC have hundreds or thousands.

SymbolNet is a neural network framework that does symbolic regression on high-dimensional inputs while pruning away unnecessary variables, operators, and connections, all in a single training run.

SymbolNet treats symbolic regression as a compression problem, using adaptive dynamic pruning to optimize both model accuracy and expression compactness. This scales symbolic regression to thousands of input features for the first time.

How It Works

The standard approach to symbolic regression is genetic programming, an evolutionary algorithm that breeds and mutates mathematical formulas across generations, selecting the fittest survivors. It works for small problems but becomes intractably slow as input dimensionality grows. SymbolNet replaces the evolutionary search with a neural network whose architecture produces human-readable symbolic expressions by construction.

Figure 1

Each neuron applies one function from a library of activation functions: addition, multiplication, square, sine, absolute value, and so on. The network learns which operators to keep and which to simplify. A complex sine function might collapse to a linear term if the data doesn’t need it, reducing complexity with no hand-tuning. This is operator pruning.

Three things get pruned at once during a single training pass:

  • Model weights — individual connection strengths, the standard form of network sparsity
  • Input features — entire input variables, performing automatic feature selection
  • Mathematical operators — complex functions downgraded to simpler arithmetic

Each prunable element carries a trainable threshold that determines what survives. A weight stays if its magnitude exceeds its threshold; otherwise it’s masked to zero. The network continuously renegotiates what to keep based on its impact on accuracy.

Figure 2

A self-adaptive regularization term prevents the network from ignoring sparsity. This penalty adjusts its own strength during training: if the expression is still too complex, the penalty increases; if sparsity is already at target, it relaxes. The user specifies a desired sparsity ratio (say, keep only 10% of weights) and training converges there automatically, without manual coefficient tuning.

Why It Matters

The team tested SymbolNet on three datasets at very different scales. For LHC jet tagging (16 inputs), it matched or outperformed baseline neural symbolic regression methods while producing sparser expressions. On MNIST (784 pixel inputs) and binary SVHN (3,072 inputs, street house numbers from photographs), it produced compact symbolic expressions for image-scale inputs, a regime where prior symbolic regression tools could not operate.

The payoff for compactness is speed. When the extracted expressions were synthesized onto an FPGA (a reconfigurable hardware chip used in LHC trigger systems), inference ran in nanoseconds with a fraction of the resource usage of a conventional neural network. For jet tagging, the FPGA implementation achieved latency comparable to hls4ml-compressed networks (hls4ml converts neural networks into hardware firmware) but with a far simpler, auditable expression underneath.

At the LHC, the first-level hardware trigger must process 40 million collisions per second and reduce that stream to a manageable size in microseconds. Every nanosecond counts, and every FPGA lookup table is a scarce resource shared across an entire detector.

But speed isn’t the only advantage. A physicist can read these expressions, check them against physical intuition, and trust them in a way that a 50-layer neural network never allows. Symbolic regression has long promised a route to AI-assisted discovery, with machines uncovering physical laws the way Planck or Kepler did by hand. Scalability was always the missing piece.

The results on 3,072-dimensional image data show that neural symbolic regression is no longer confined to toy problems. Whether the expressions it finds on real physics datasets encode genuinely new physical insight remains an open question, and exactly the right question to be asking next.

SymbolNet makes symbolic regression practical for high-dimensional real-world datasets by combining neural networks with single-phase adaptive pruning. The result: compact mathematical expressions that run at nanosecond latency on FPGA hardware, directly useful for physics experiments at the LHC and beyond.

IAIFI Research Highlights

Interdisciplinary Research Achievement
SymbolNet sits at the intersection of machine learning and experimental particle physics, turning neural symbolic regression into a hardware-deployable compression technique validated on LHC jet tagging tasks.
Impact on Artificial Intelligence
The adaptive dynamic pruning framework simultaneously optimizes weights, input features, and operators with self-adjusting regularization, scaling neural symbolic regression to datasets with thousands of inputs.
Impact on Fundamental Interactions
Real-time symbolic inference on FPGAs at nanosecond latency directly supports the LHC's hardware trigger pipeline, where ultra-fast, resource-efficient models are essential for capturing rare collision events.
Outlook and References
Future work includes extending SymbolNet to multiclass tasks and exploring whether discovered expressions encode novel physical principles. The paper is available as [arXiv:2401.09949](https://arxiv.org/abs/2401.09949).

Original Paper Details

Title
SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression
arXiv ID
2401.09949
Authors
Ho Fung Tsoi, Vladimir Loncar, Sridhara Dasu, Philip Harris
Abstract
Compact symbolic expressions have been shown to be more efficient than neural network models in terms of resource consumption and inference speed when implemented on custom hardware such as FPGAs, while maintaining comparable accuracy~\cite{tsoi2023symbolic}. These capabilities are highly valuable in environments with stringent computational resource constraints, such as high-energy physics experiments at the CERN Large Hadron Collider. However, finding compact expressions for high-dimensional datasets remains challenging due to the inherent limitations of genetic programming, the search algorithm of most symbolic regression methods. Contrary to genetic programming, the neural network approach to symbolic regression offers scalability to high-dimensional inputs and leverages gradient methods for faster equation searching. Common ways of constraining expression complexity often involve multistage pruning with fine-tuning, which can result in significant performance loss. In this work, we propose $\tt{SymbolNet}$, a neural network approach to symbolic regression specifically designed as a model compression technique, aimed at enabling low-latency inference for high-dimensional inputs on custom hardware such as FPGAs. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing symbolic regression methods that struggle with datasets containing more than $\mathcal{O}(10)$ inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).