← Back to Timeline

Man, Machine, and Mathematics

Foundational AI

Authors

Akshunna S. Dogra

Abstract

Nonlinear models and optimization methods have successfully tackled a rapidly growing set of problems in recent years. Indeed, a relatively small toolbox of such models and methods can provide sufficient performance across a large landscape of tasks: deep learning alone has made significant recent contributions in scientific modelling, natural language processing, visual analysis, etc. A similar relationship exists between physical theories and phenomena, where many applications and observations emerge neatly from remarkably minimal foundations. It is natural to wonder if sparse unified frameworks could be built to steer discussion and discovery in the fields concerned with learning, optimization, and modelling. In this work, we posit and examine a possible outline for such a unified theory, interpreting the notion of ''learning'' in a broad sense. In particular, we pursue our goals by viewing learning as an inter-connected process on multiple levels: problem setup, choosing methods, and the analysis of their interplay via imposed optimisation dynamics. We begin by proposing a precise yet versatile definition for ''solvable'' problems. We then define the ''parametrised methods'' by which their solution(s) may be ''learned''. Our goal is to sketch a ''universal convergence theorem'', specifying how and when solvable problems become amenable to the methods chosen for them. We find these constructions reduce the study of learning down to remarkably few ideas and tools - many of which are simply adapted from existing ones in dynamical systems theory, geometry, and fundamental physics.

Concepts

universal convergence theorem parametrized learning methods loss function design lojasiewicz inequality sparse models hamiltonian systems geometric deep learning kolmogorov-arnold networks lagrangian methods manifold learning optimal transport stochastic processes

The Big Picture

Physics has a power that machine learning has long envied: unification. Electromagnetism, optics, and radio waves, wildly different phenomena, all fall out of four compact equations. General relativity describes the bending of starlight and the orbit of Mercury from a single geometric principle.

Machine learning sits on a sprawling collection of techniques (neural networks, support vector machines, gradient descent, Kolmogorov-Arnold networks) each developed for different problems, connected more by analogy than by theorem. A researcher training a language model and a biologist studying how grid cells in a rat’s brain encode space are both doing something called “learning,” but there’s no shared mathematical roof over their heads.

Akshunna S. Dogra, an IAIFI researcher, wants to build that roof. His paper “Man, Machine, and Mathematics” proposes the skeleton of a unified theory of learning, one rigorous enough to produce theorems, broad enough to cover biological and machine intelligence, and sparse enough to feel like physics.

The paper doesn’t deliver a finished theory. It delivers something arguably more useful: a precise map of what such a theory must contain, and a first serious attempt to draw its outlines using dynamical systems theory, differential geometry, and fundamental physics.

Key Insight: Learning, whether by a neural network, a mammal navigating space, or a numerical solver, can be formalized as a gradient flow problem on a structured space. The conditions for that flow to converge are governed by the same mathematics that physicists use to study stability in dynamical systems.

How It Works

Dogra organizes the theory around three cornerstones that also supply the paper’s title:

  • “Man” — the problem to be solved
  • “Machine” — the parametrized method: a specific algorithm or model with adjustable numerical settings
  • “Mathematics” — the formal analysis of how and when the method works on the problem

Figure 1

The starting point is defining what it means for a problem to be “solvable.” A problem is a map F between metric spaces, spaces where you can measure the distance between any two points. A problem counts as well-behaved if there exists a solution Φ, a neighborhood around it, and a loss functional L measuring distance from the solution, such that a gradient flow can be defined, runs without blowing up, and converges to Φ. Gradient flow is the continuous-time version of gradient descent: instead of taking discrete steps downhill on the loss landscape, you follow the steepest descent direction at every instant.

That structure underlies nearly all of modern machine learning. Training a transformer? Gradient descent on a loss. Solving a partial differential equation numerically with neural networks? Same schema. Even biological neural firing patterns can be cast in this form.

But there’s a catch. Even if a gradient flow converges in principle, the space G where solutions live is often infinite-dimensional. You can’t run infinite-dimensional gradient descent on a laptop.

This is where architectures come in. An architecture maps a finite set of M real-valued parameters into G, carving out a finite-dimensional surface G_M within the larger space. Training a neural network means optimizing over G_M rather than all of G. How many layers a network has, how its components connect: these choices determine the shape of that surface.

A linear architecture produces at most an M-dimensional flat subspace. A nonlinear architecture can fold that same M-parameter family through the space far more efficiently, the way a crumpled sheet of paper occupies a volume that a flat sheet cannot. This is the deep mathematical reason nonlinear models, and deep networks in particular, outperform linear ones. They exploit geometry rather than fighting it.

The final piece specifies when optimization on G_M actually converges. Dogra’s central tool is the Łojasiewicz inequality, a condition on the loss near a solution that guarantees gradient flows will converge rather than stall. It requires that the gradient of L grows at least as fast as a fractional power of the loss gap, preventing the optimizer from decelerating forever before reaching the minimum. This is one of the weakest known sufficient conditions for convergence, which makes it close to ideal for a general theorem.

The end goal is a universal convergence theorem: a result specifying how and when a problem solvable in principle becomes tractable to a chosen method in practice. Think of it as the learning-theory analog of the universal approximation theorem, which tells you that neural networks can represent any continuous function, but going further and also telling you how to get there.

Why It Matters

Machine learning currently advances largely through empirical trial and error. Practitioners discover that certain architectures work, certain optimizers behave well, certain learning rate schedules converge faster, and the community accumulates lore. A unified convergence theory would turn that lore into deductive science, letting researchers derive which architectures suit which problems instead of finding out through expensive compute.

The physics connection goes deeper than analogy. Dogra draws parallels between gradient flows and equations from quantum mechanics, between Łojasiewicz-type stability conditions and equilibrium analysis in classical mechanics, between the folding of parameter spaces and the differential geometry used in general relativity. Many of the tools physicists built to understand how systems evolve toward equilibrium (Lyapunov functions, desingularization techniques, results from immersion theory) are precisely what’s needed to understand convergence in learning. As the paper puts it: “the tools needed for learning about our universe often being the tools that describe how we learn.”

Open questions remain. Fully rigorous results will require extending Łojasiewicz-type analysis to infinite-dimensional spaces, characterizing which architectures produce well-initialized parameter flows, and connecting the abstract framework to practical optimizers like Adam and stochastic gradient descent. But the scaffolding is now visible in a way it wasn’t before.

Bottom Line: By casting learning as a gradient flow problem governed by conditions from dynamical systems and physics, Dogra outlines a unified mathematical framework that could eventually explain why machine learning works, not just observe that it does.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work connects AI and physics by importing tools from dynamical systems theory, differential geometry, and mechanics to formalize when and why learning algorithms converge.
Impact on Artificial Intelligence
The paper proposes precise formal definitions of "solvable problems" and "parametrized methods," sketching a universal convergence theorem that could unify the theoretical foundations of deep learning, numerical PDE solvers, and beyond.
Impact on Fundamental Interactions
By drawing structural parallels between gradient flows and equations from quantum mechanics, and between stability conditions in optimization and in physics, the work suggests that the mathematical language of fundamental physics may be the natural home for a theory of learning.
Outlook and References
Future work must extend the convergence analysis to fully infinite-dimensional settings and connect the abstract framework to specific practical optimizers. The paper is available as a preprint ([arXiv:2604.27052](https://arxiv.org/abs/2604.27052), May 2026) from IAIFI researcher Akshunna S. Dogra (adogra@nyu.edu).

Original Paper Details

Title
Man, Machine, and Mathematics
arXiv ID
[arXiv:2604.27052](https://arxiv.org/abs/2604.27052)
Authors
Akshunna S. Dogra
Abstract
Nonlinear models and optimization methods have successfully tackled a rapidly growing set of problems in recent years. Indeed, a relatively small toolbox of such models and methods can provide sufficient performance across a large landscape of tasks: deep learning alone has made significant recent contributions in scientific modelling, natural language processing, visual analysis, etc. A similar relationship exists between physical theories and phenomena, where many applications and observations emerge neatly from remarkably minimal foundations. It is natural to wonder if sparse unified frameworks could be built to steer discussion and discovery in the fields concerned with learning, optimization, and modelling. In this work, we posit and examine a possible outline for such a unified theory, interpreting the notion of ''learning'' in a broad sense. In particular, we pursue our goals by viewing learning as an inter-connected process on multiple levels: problem setup, choosing methods, and the analysis of their interplay via imposed optimisation dynamics. We begin by proposing a precise yet versatile definition for ''solvable'' problems. We then define the ''parametrised methods'' by which their solution(s) may be ''learned''. Our goal is to sketch a ''universal convergence theorem'', specifying how and when solvable problems become amenable to the methods chosen for them. We find these constructions reduce the study of learning down to remarkably few ideas and tools - many of which are simply adapted from existing ones in dynamical systems theory, geometry, and fundamental physics.