Learning Linear Groups in Neural Networks

The Big Picture

Imagine designing a robot arm without knowing in advance how many joints it needs, then watching it figure out its own structure just by handling thousands of different objects. That’s roughly the challenge facing modern equivariant neural networks, and until recently, nobody had a clean solution.

Neural networks benefit enormously from symmetry. When a network “knows” that rotating an image of a cat still shows a cat, it can share what it learned across orientations, train faster, and generalize better. This property, equivariance (the network’s output transforms predictably when its input is transformed), underlies everything from image recognition to physics simulators.

The catch: someone has to tell the network which symmetries matter. For images, it’s obvious that shifting a photo left or right shouldn’t change what’s in it. For molecular dynamics, knowing a molecule looks the same from any angle helps. But for a novel dataset where the relevant symmetries are unknown? You’re stuck guessing, or ignoring symmetry altogether and leaving performance on the table.

A team at Harvard and Amazon Web Services took a different approach. Their framework, Linear Group Networks (LGNs), lets a network discover its own symmetries directly from data, with no prior knowledge of what those symmetries might be. What it finds maps onto operations that researchers already use.

Key Insight: LGNs automatically learn the symmetry structure hidden in a dataset by discovering elements of the general linear group acting on neural network weights. The learned groups correspond to recognizable operations like rotations and median filtering.

How It Works

Everything builds on the general linear group, GL_d(K): the set of all invertible d×d matrices over a field K. Think of it as square grids of numbers where every transformation can be undone. Rotations, reflections, shears, and many other transformations all live inside it as subgroups. Rather than picking a subgroup in advance, LGNs learn a single matrix generator and use it to build a cyclic group, a set of transformations obtained by repeatedly applying that generator.

The construction works like this:

The network learns a matrix A (the group generator) alongside its normal weights.
A finite cyclic group is constructed as {I, A, A², A³, …, A^(n-1)}, where A^n = I (the identity).
This group acts on the weight space of the network, transforming the filters themselves rather than the input data directly.
The result is a set of filters that are geometrically related to each other by the learned transformation.

Applying the group transformation to learned filters rather than to raw input images is a deliberate design choice. Keeping the transformations in this moderate-dimensional space makes the network both interpretable and computationally tractable.

The authors use unfolded networks: architectures built by unrolling an iterative problem-solving algorithm into a fixed sequence of layers. These networks reconstruct an approximation of the input at every layer, which keeps the filters tethered to the data space where human inspection is possible.

Training is entirely self-supervised with respect to the group. The network simultaneously learns the weights and the group generator, with no labels indicating what symmetries are present. The only signal comes from the downstream task itself.

So what does the network actually discover? On natural image datasets, the learned group actions cluster into recognizable categories:

Skew-symmetric actions: transformations with a strong rotational character.
Toeplitz matrix structures, arising in convolution and sliding-window operations.
Multi-scale actions that capture coarse-to-fine structure reminiscent of wavelet decompositions.

Ablation studies show that certain filter sets have group actions strongly correlated with compositions of rotations and median filtering. That second operation is closely related to pooling in standard deep learning architectures.

One of the paper’s sharpest findings: the learned group structure depends on both the data distribution and the task. Training the same architecture on the same images for different objectives (reconstruction versus classification) produces different groups. Symmetry isn’t a fixed property of a dataset. It’s a joint property of the data and what you’re trying to do with it.

Why It Matters

The payoff extends well past image classification. Across particle physics, cosmology, protein folding, and materials discovery, researchers embed known symmetries into neural architectures to improve sample efficiency and physical plausibility. But known symmetries are the easy case.

The harder, more common case is exactly what LGNs target: datasets where the relevant symmetry structure is unknown, approximate, or emergent from the interaction of domain and measurement process.

Because the learned groups are finite matrices, you can inspect them. You can correlate them with known operations. You can ask whether the symmetry the network found corresponds to something physically meaningful, and if it does, that’s a genuine scientific insight, not just an engineering convenience.

The authors envision LGNs not just as a better architecture, but as a probe for understanding what symmetries matter in real-world data. Open questions remain: scaling to very high-dimensional weight spaces, handling continuous rather than discrete groups, and whether discovered symmetries transfer across datasets. But the core idea, that symmetry can be learned rather than imposed, is on solid footing.

Bottom Line: Linear Group Networks show that neural networks can discover their own symmetries from scratch, and those symmetries are interpretable, mapping onto operations researchers already know and use. This reframes symmetry not as a design choice baked into architecture, but as something a sufficiently flexible model can learn on its own.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work connects mathematical group theory with practical deep learning, showing that abstract algebraic structures (cyclic subgroups of GL_d) can be learned end-to-end and mapped onto physically interpretable operations.

Impact on Artificial Intelligence
LGNs sidestep a long-standing limitation of equivariant deep learning: the need to specify symmetry groups by hand. This opens equivariant architectures to domains where the relevant symmetries aren't known in advance.

Impact on Fundamental Interactions
The framework gives physicists a tool to probe hidden symmetries in scientific datasets, with direct applications to problems where the symmetry group of the data is itself an open question.

Outlook and References
Future work may extend LGNs to continuous groups and higher-dimensional weight spaces, enabling symmetry discovery in large-scale physics simulations. The paper by Theodosis, Helwani, and Ba is available at [arXiv:2305.18552](https://arxiv.org/abs/2305.18552).

Original Paper Details

Title
Learning Linear Groups in Neural Networks

arXiv ID
2305.18552

Authors
Emmanouil Theodosis, Karim Helwani, Demba Ba

Abstract
Employing equivariance in neural networks leads to greater parameter efficiency and improved generalization performance through the encoding of domain knowledge in the architecture; however, the majority of existing approaches require an a priori specification of the desired symmetries. We present a neural network architecture, Linear Group Networks (LGNs), for learning linear groups acting on the weight space of neural networks. Linear groups are desirable due to their inherent interpretability, as they can be represented as finite matrices. LGNs learn groups without any supervision or knowledge of the hidden symmetries in the data and the groups can be mapped to well known operations in machine learning. We use LGNs to learn groups on multiple datasets while considering different downstream tasks; we demonstrate that the linear group structure depends on both the data distribution and the considered task.

Learning Linear Groups in Neural Networks

Authors

Abstract

Concepts

The Big Picture

How It Works

Why It Matters

IAIFI Research Highlights

Original Paper Details