← Back to Timeline

Renormalization Group flow, Optimal Transport and Diffusion-based Generative Model

Foundational AI

Authors

Artan Sheshmani, Yi-Zhuang You, Baturalp Buyukates, Amir Ziashahabi, Salman Avestimehr

Abstract

Diffusion-based generative models represent a forefront direction in generative AI research today. Recent studies in physics have suggested that the renormalization group (RG) can be conceptualized as a diffusion process. This insight motivates us to develop a novel diffusion-based generative model by reversing the momentum-space RG flow. We establish a framework that interprets RG flow as optimal transport gradient flow, which minimizes a functional analogous to the Kullback-Leibler divergence, thereby bridging statistical physics and information theory. Our model applies forward and reverse diffusion processes in Fourier space, exploiting the sparse representation of natural images in this domain to efficiently separate signal from noise and manage image features across scales. By introducing a scale-dependent noise schedule informed by a dispersion relation, the model optimizes denoising performance and image generation in Fourier space, taking advantage of the distinct separation of macro and microscale features. Experimental validations on standard datasets demonstrate the model's capability to generate high-quality images while significantly reducing training time compared to existing image-domain diffusion models. This approach not only enhances our understanding of the generative processes in images but also opens new pathways for research in generative AI, leveraging the convergence of theoretical physics, optimal transport, and machine learning principles.

Concepts

diffusion models optimal transport renormalization generative models fourier-space diffusion scale-dependent noise schedule quantum field theory spectral methods score-based models stochastic processes likelihood estimation

The Big Picture

Imagine trying to describe a painting by zooming out, layer by layer, until all you see is a blur of average colors. That’s roughly what physicists do when they apply the renormalization group, a mathematical procedure for simplifying a system by progressively averaging over fine details. The technique was invented to tame the runaway infinities of quantum field theory, and for decades it belonged almost exclusively to particle physics and statistical mechanics.

A team from Harvard, MIT, UC San Diego, and USC has now turned that procedure inside out and used it to build a faster, more principled image-generation AI. The core insight: the mathematical machinery physicists use to erase small-scale variables from a physical system is equivalent to the forward diffusion process that corrupts images in modern generative AI. Run the renormalization group forward, and you destroy structure. Run it backward, and you create it.

The result is the Fourier-Domain Diffusion Model (FDDM), a generative model that works in frequency space rather than pixel space. Just as a sound can be broken into individual musical notes, FDDM decomposes images into their frequency components. It trains faster than conventional diffusion models and ties together statistical physics, information theory, and machine learning with actual math rather than loose analogy.

Key Insight: By interpreting the renormalization group as an optimal transport process in Fourier space, the team built a diffusion model that separates image features by scale. Broad strokes and fine details get handled at the right stages of generation, rather than treating all pixels equally.

How It Works

Getting from raw physics to a working AI model involves three conceptual steps.

Step 1: RG as Optimal Transport. The paper first establishes a formal equivalence between RG flow and optimal transport, a mathematical theory originally developed to solve logistics problems. (How do you move a pile of sand to fill a hole while minimizing total work?) A central quantity here is the Wasserstein distance, which measures how much “effort” it takes to morph one probability distribution into another. RG flow, the authors show, can be described as a process that minimizes a functional closely related to the Kullback-Leibler (KL) divergence, the standard information-theoretic measure of how different two probability distributions are. This isn’t a metaphor; it’s a precise mathematical statement.

Figure 1

Step 2: Move everything to Fourier space. Rather than adding and removing noise pixel by pixel, FDDM operates in the Fourier domain, the space of spatial frequencies that compose an image. The motivation comes directly from physics: in quantum field theory, the renormalization group eliminates high-frequency (small-scale) components first, then progressively moves toward low-frequency (large-scale) structure. Natural images share this multi-scale character. Most of their energy sits in relatively few frequency components, making them sparse in Fourier space. High frequencies encode sharp edges and fine textures; low frequencies encode overall composition and color.

By diffusing in Fourier space, the model selectively corrupts and recovers different scales at the right times:

  • Forward diffusion (renormalization): High-frequency components are noised first. Fine details get destroyed before coarse structure, just as RG eliminates microscopic degrees of freedom before macroscopic ones.
  • Reverse diffusion (generation): The model denoises low frequencies first, establishing the large-scale layout, then progressively recovers fine details. The big picture comes before the brushstrokes.

Figure 2

Step 3: A physics-informed noise schedule. Conventional diffusion models use a noise schedule that is largely empirical. FDDM replaces this with a dispersion relation, a concept from wave physics that describes how different frequencies decay at different rates. Each frequency mode gets noised at a rate appropriate to its scale: high-frequency components decay rapidly, low-frequency components evolve slowly.

Figure 3

On standard image benchmarks, FDDM produced competitive image quality while cutting training time compared to pixel-domain diffusion models. The gains trace back to the sparsity of natural images in Fourier space.

Why It Matters

The practical payoff (faster training, good image quality) is real, but the deeper significance is conceptual. For years, researchers have noticed suggestive similarities between deep learning and physics: neural networks as statistical field theories, attention mechanisms and tensor networks, diffusion models and RG flows. Most of these analogies have stayed at the level of inspiration. This paper takes the analogy seriously enough to derive a working algorithm from it, and the algorithm performs well.

That changes the conversation. The mathematical structures physicists have refined over decades aren’t just metaphors for machine learning. They can directly produce better algorithms. The optimal transport connection matters here because it clarifies what diffusion models are actually optimizing. That kind of clarity could guide future architectural choices well beyond FDDM itself.

The authors point to several open directions: extending the framework to other data modalities (audio, molecular structures, scientific data) and testing whether alternative RG schemes yield further gains.

Bottom Line: FDDM proves that the renormalization group, a tool forged in quantum field theory, can be reverse-engineered into a faster, more principled image generator. Physics didn’t just inspire the model; it built it.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work draws a formal mathematical line from renormalization group theory in quantum field theory, through optimal transport in mathematics, to diffusion-based generative models in machine learning, turning a physics abstraction into a working AI algorithm.
Impact on Artificial Intelligence
The Fourier-Domain Diffusion Model matches existing image generation quality while cutting training time, by working in the sparse frequency-space representation of natural images with a physics-derived, scale-dependent noise schedule.
Impact on Fundamental Interactions
The paper formalizes RG flow as an optimal transport gradient flow minimizing a KL-divergence-like functional, giving physicists a new information-theoretic lens on coarse-graining in statistical and quantum field theory.
Outlook and References
Future work may extend the FDDM framework to scientific data modalities and test alternative RG schemes for further performance gains. The paper is available as [arXiv:2402.17090](https://arxiv.org/abs/2402.17090), from researchers at Harvard, MIT, UC San Diego, and USC.

Original Paper Details

Title
Renormalization Group flow, Optimal Transport and Diffusion-based Generative Model
arXiv ID
2402.17090
Authors
Artan Sheshmani, Yi-Zhuang You, Baturalp Buyukates, Amir Ziashahabi, Salman Avestimehr
Abstract
Diffusion-based generative models represent a forefront direction in generative AI research today. Recent studies in physics have suggested that the renormalization group (RG) can be conceptualized as a diffusion process. This insight motivates us to develop a novel diffusion-based generative model by reversing the momentum-space RG flow. We establish a framework that interprets RG flow as optimal transport gradient flow, which minimizes a functional analogous to the Kullback-Leibler divergence, thereby bridging statistical physics and information theory. Our model applies forward and reverse diffusion processes in Fourier space, exploiting the sparse representation of natural images in this domain to efficiently separate signal from noise and manage image features across scales. By introducing a scale-dependent noise schedule informed by a dispersion relation, the model optimizes denoising performance and image generation in Fourier space, taking advantage of the distinct separation of macro and microscale features. Experimental validations on standard datasets demonstrate the model's capability to generate high-quality images while significantly reducing training time compared to existing image-domain diffusion models. This approach not only enhances our understanding of the generative processes in images but also opens new pathways for research in generative AI, leveraging the convergence of theoretical physics, optimal transport, and machine learning principles.