Towards Understanding Grokking: An Effective Theory of Representation Learning
Authors
Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams
Abstract
We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion. We find on transformers the grokking phase stays closer to the memorization phase (compared to the comprehension phase), leading to delayed generalization. The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.
Concepts
The Big Picture
Imagine studying for an exam by memorizing answers without understanding the concepts behind them. You ace every practice test, but when the real exam asks a slightly different question, you’re lost. Then, weeks later, something clicks: you finally understand the material. That’s roughly what happens inside a neural network during a phenomenon called grokking. The model memorizes training data almost instantly, seems stuck, and then, sometimes thousands of training steps later, abruptly figures out the actual rule.
Grokking was first reported in 2021, and nobody had a good explanation for it. Why would a model that has already memorized its training data bother learning to generalize? Why does it sometimes take so long?
These questions go straight to the mechanics of how neural networks learn. A team from MIT’s Institute for AI and Fundamental Interactions went after the problem with an unlikely toolkit: theoretical physics.
Their answer is cleaner than you’d expect. Generalization isn’t magic; it’s geometry. And the timing depends on where a model sits in a phase diagram, a map of possible learning behaviors, like the ones physicists use to show when water becomes ice or steam.
Key Insight: Grokking happens when a neural network is forced to discover structured, geometric representations of its input. Delayed generalization occurs because the network is trapped in a memorization phase before finding that structure.
How It Works
The analysis works on two levels: a microscopic view using what physicists call an effective theory, and a macroscopic view using phase diagrams. Both tell the same story.

Start with the microscopic picture. The team studied a toy model learning modular addition: given two numbers a and b, predict a + b mod p. Each number gets a learnable embedding vector (a point in some geometric space), and a small network called a decoder reads off the sum from those points.
What do the embeddings look like when a model generalizes versus when it memorizes? In the memorization regime, embeddings are scattered randomly. Each number gets an arbitrary point in space, and the decoder learns to look up answers from a giant internal table. When the model generalizes, the embeddings snap into structure: they form parallelograms.
If the embedding of 3 plus the embedding of 5 equals the embedding of 2 plus the embedding of 6 (since 3 + 5 = 2 + 6 = 8), the geometry enforces the arithmetic. For modular addition, the structure is even more specific: the embeddings arrange themselves on a circle.

The team formalized this with a Representation Quality Index (RQI) that measures how parallelogram-like the embeddings are. An RQI near zero signals memorization; near one signals generalization. From there, they derived an effective theory (replace a complicated system with a simpler model that captures the same behavior) that accurately predicts how RQI evolves during training and how much data is needed to force structured representations.
Now the macroscopic picture. By running a grid search over hyperparameters like learning rate and weight decay, the team mapped out a phase diagram with four distinct learning phases:
- Comprehension: The model generalizes quickly and efficiently.
- Grokking: The model eventually generalizes, but only after a long detour through memorization.
- Memorization: The model fits training data but never generalizes.
- Confusion: The model fails even to fit the training data.
Structured representations only show up in what the authors call the Goldilocks zone: the region covering comprehension and grokking, sandwiched between memorization and confusion. Regularization (techniques that discourage a network from fitting training data too literally) is the main control knob. Too little, and the network takes the easy path of memorization. Too much, and it can’t learn at all.
The grokking phase sits right at the edge of memorization. Generalization is delayed because the model is barely nudged away from pure memorization; it takes a long time for training to push it across the boundary.
Why It Matters
The paper borrows an analogy from biology: “intelligence from starvation,” inspired by Darwinian evolution. When resources are scarce, organisms are forced toward efficient solutions. Something similar happens in neural networks. When training data is limited and regularization applies gentle pressure, the network discovers compact, structured rules rather than memorizing individual examples. The Goldilocks zone is where this pressure works: enough to prevent lazy memorization, not so much that learning collapses.
Grokking can also be sped up. By navigating the phase diagram with informed hyperparameter choices, you can move a model from the slow grokking phase into the fast comprehension phase. Knowing why grokking happens tells you how to avoid it.
There’s a broader argument here, too. Physics-inspired tools (effective theories, phase diagrams, representation analysis) are underused in machine learning. The same methods physicists apply to phase transitions in magnets or quantum materials turn out to work for phase transitions in learning.
Open questions remain. The effective theory works cleanly in the toy setting; extending it to full transformers and real-world tasks is still an open problem. The precise mechanism by which regularization pushes embeddings toward structured configurations hasn’t been fully worked out.
Bottom Line: Grokking isn’t a mystery anymore. It’s a phase transition in representation space, and the theoretical tools now exist to predict when it happens, why it’s delayed, and how to speed it up.
IAIFI Research Highlights
This work takes two foundational tools from theoretical physics, effective theories and phase diagrams, and uses them to explain a core puzzle in deep learning. Methods from statistical physics translate directly to neural network dynamics.
The study explains grokking's delayed generalization as a consequence of representation learning, giving practitioners a concrete framework for tuning hyperparameters to achieve faster, more reliable generalization.
By showing how structured geometric representations emerge during learning and connecting this to phase transitions and symmetry in embedding space, the work draws a direct line between physics and the theory of intelligence.
Future work will extend the effective theory to larger transformer architectures and real-world tasks; the full paper is available at [arXiv:2205.10343](https://arxiv.org/abs/2205.10343).
Original Paper Details
Towards Understanding Grokking: An Effective Theory of Representation Learning
2205.10343
Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams
We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion. We find on transformers the grokking phase stays closer to the memorization phase (compared to the comprehension phase), leading to delayed generalization. The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.