AI Feynman: a Physics-Inspired Method for Symbolic Regression
Authors
Silviu-Marian Udrescu, Max Tegmark
Abstract
A core challenge for both physics and artificial intellicence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult test set, we improve the state of the art success rate from 15% to 90%.
Concepts
The Big Picture
Johannes Kepler spent four years and made 40 failed attempts before he realized that Mars traces an ellipse around the sun. He had the data, precise astronomical tables compiled by Tycho Brahe, but extracting the underlying equation from raw numbers was brutally hard. Today, scientists face the same challenge millions of times over, staring at experimental data and asking: what formula is hiding in here?
This is symbolic regression: discovering a mathematical expression that exactly matches a dataset. Not just a curve that fits the data, but the actual equation, written in symbols, that could appear in a textbook. It’s a different problem from what most machine learning solves.
A neural network that predicts planetary positions with 99.9% accuracy is useful. But it doesn’t tell you the orbit is an ellipse. Kepler’s law, written in four symbols, does.
The trouble is that the space of possible mathematical expressions grows exponentially with length. There are more candidate formulas than atoms in the observable universe. No brute-force approach could ever work. Researchers at MIT, led by Silviu-Marian Udrescu and Max Tegmark, took a different approach: instead of searching blindly through that exponential space, they asked what physicists know about how equations tend to behave, and built those insights directly into an algorithm called AI Feynman.
Key Insight: By encoding physics heuristics (symmetry detection, dimensional analysis, separability) into a recursive neural network framework, AI Feynman discovered all 100 equations from the Feynman Lectures on Physics and pushed the success rate on a harder benchmark from 15% to 90%.
How It Works
The equations physicists care about aren’t random. They have structure. They respect units. They decompose into simpler pieces and exhibit symmetry. AI Feynman builds on this observation with a recursive algorithm that chips away at complex equations by exploiting whichever simplifications apply.
The algorithm works like this:
-
Dimensional analysis first. If the variables have known physical units, the algorithm applies the Buckingham Pi theorem, a rule from physics that lets you combine variables into unit-free ratios, reducing the number of independent variables you need to track. Newton’s law of gravity, with 9 variables, can collapse to 6 such ratios. Fewer variables means a much simpler search.
-
Neural network fitting is the workhorse. A standard feedforward neural network is trained on the mystery data. The network itself isn’t the answer; it’s a probe. Once trained, the algorithm uses it to test for hidden structure.
-
Symmetry detection uses the trained network to check whether the function remains unchanged when variables are shifted or scaled. If adding a constant to $x_3$ doesn’t change the output, then $x_3$ only appears in the formula as part of a difference, and one variable disappears. This kind of translational symmetry detection can recursively strip variables from the problem.
-
Separability detection checks whether the function factors into a product or sum of two parts with no shared variables. If $f(x_1, x_2, x_3) = g(x_1) \cdot h(x_2, x_3)$, the problem splits in two. The algorithm tests this by checking whether the network’s partial derivatives respect a factored structure.
-
Polynomial fitting handles the case where the function, or a simplified sub-function, is a polynomial. This reduces to solving a linear system: fast and exact.
-
Brute-force symbolic search is the last resort for small, simple sub-expressions: try all formulas up to some length using a library of elementary functions.
Newton’s gravitational law illustrates how these steps chain together. Starting with 9 variables, dimensional analysis reduces the problem to 6 unit-free combinations. The neural network then detects two translational symmetries (the force depends only on differences of coordinates, not absolute positions), dropping the count to 4 variables. Multiplicative separability splits the 4-variable problem into two smaller ones.
Each sub-problem gets solved independently, one by polynomial fitting after a simple inversion. The original 9-variable problem is cracked without ever searching through formulas with 9 arguments.
Why It Matters
AI Feynman recovers all 100 equations from the Feynman Lectures on Physics. The previous best publicly available software, Eureqa (based on genetic algorithms), found only 71. On a harder test set, the gap widens: 90% versus 15%.
What matters more is how the algorithm wins. The right way to bring AI into physics isn’t to throw a generic optimizer at the problem and hope it converges. It’s to encode what physicists already know about units, symmetries, and compositional structure, then let the AI search within that constrained space.
Neural networks here aren’t black-box predictors. They’re probes for detecting hidden structure in data. The trained network is interrogated, not trusted: does the function have a symmetry? Does it factorize? The answers guide the decomposition. Physical intuition sets the constraints; machine learning handles the search within them.
That same recipe applies well beyond physics. Any field where underlying laws might be compact and structured, from materials science to fluid dynamics, could benefit from symbolic regression that knows how to decompose before it searches.
Bottom Line: AI Feynman doesn’t just fit data better. It uses physics-inspired tricks to recursively decompose hard symbolic regression problems into solvable pieces, jumping from 15% to 90% on challenging benchmarks where previous methods stalled.
IAIFI Research Highlights
AI Feynman combines neural networks with physics-inspired techniques, using symmetry, dimensional analysis, and separability to recover real equations from data. It sits at the boundary of machine learning and theoretical physics.
On hard symbolic regression benchmarks, the method jumps from 15% to 90% success by replacing brute-force formula search with recursive decomposition guided by neural network probes.
The method automatically rediscovers equations from the Feynman Lectures, including multi-variable laws like Newton's gravitation. If it can find known laws from data, it may eventually find unknown ones.
The natural next targets are noisy data, larger equation spaces, and problems where the governing equations are truly unknown. Published in *Science Advances* (2020); the benchmark dataset is publicly available. See [arXiv:1905.11481](https://arxiv.org/abs/1905.11481).
Original Paper Details
AI Feynman: a Physics-Inspired Method for Symbolic Regression
[1905.11481](https://arxiv.org/abs/1905.11481)
Silviu-Marian Udrescu, Max Tegmark