Rapid Locomotion via Reinforcement Learning

The Big Picture

Watch a cheetah run. It isn’t fast because someone programmed every muscle twitch and paw placement. Millions of years of trial and error shaped instincts that adapt on the fly to slick grass, loose sand, uneven rock. Now imagine giving a robot that same kind of fluid, instinctive speed.

Legged robots have long promised to go where wheeled robots can’t: uneven terrain, stairs, the messy real environments that humans navigate daily. But moving fast changes everything.

At walking speeds, small modeling errors are forgiving. At sprinting speeds, the physics is not. Turning too quickly builds up outward forces that can topple the robot, motors hit their physical limits, and the impact of each footfall spikes sharply. The gap between a careful walk and an all-out sprint isn’t just about speed. It’s a qualitatively harder control problem.

Researchers at MIT’s Improbable AI Lab and IAIFI broke through that barrier. They trained a single neural network controller entirely in simulation, then deployed it on the MIT Mini Cheetah, where it hit 3.9 m/s (a record for the platform) across natural terrains from grass to ice.

Key Insight: By combining an adaptive training schedule with real-time terrain sensing, a reinforcement learning agent can learn to sprint and spin on natural terrain without ever practicing on real ground.

How It Works

The core idea is reinforcement learning (RL): an agent tries things, collects rewards for good outcomes, and gradually improves. Here, the agent is a neural network that watches sensor data (just a gyroscope and joint encoders) and outputs joint position commands 50 times per second. No cameras, no LiDAR, no explicit terrain model. Just raw sensor readings mapped to leg movements.

Training happens entirely in simulation. The researchers randomize everything that might differ between virtual and real: ground friction (from near-frictionless ice to sticky rubber), ground restitution, payload mass, motor strength, even the robot’s center of mass. This technique, called domain randomization, forces the policy to develop strategies that work across a wide range of conditions rather than in the idealized simulation alone.

Earlier approaches hit a wall here. When you ask an RL agent to learn the full range of speeds at once, from slow walking through a full sprint, training collapses. The high-speed tasks are too hard to get reward on from the start, so the agent never makes progress. The fix is straightforward:

Start narrow. Begin training with only low-speed, easily achievable velocity commands.
Expand dynamically. As the agent masters those tasks, the curriculum automatically widens the commanded speed range, but only to velocities the agent is nearly ready to handle.
Respect physics. The curriculum tracks which combinations of linear speed and turning rate are actually achievable given the robot’s dynamics, never wasting training on physically impossible targets.

This adaptive velocity curriculum is one half of the approach. The other is online system identification: inferring the robot’s properties and terrain conditions in real time while it moves.

The implementation uses a teacher-student setup. During simulation training, a “teacher” policy has access to privileged information: exact terrain parameters, true friction coefficients, precise mass. A “student” policy learns to mimic the teacher using only the limited sensor suite available on the real robot.

The student also includes an adaptation module, a small network that reads recent sensor history and infers a compact representation of terrain and robot properties on the fly.

When dropped onto real-world terrain it has never physically experienced, the policy figures out what kind of ground it’s on and adjusts within milliseconds. No fine-tuning after deployment.

Why It Matters

The numbers: 3.9 m/s sustained sprint, 3.4 m/s average over a 10-meter outdoor dash through grass, 5.7 rad/s spinning. All from a single neural network running on an onboard computer with nothing but a gyroscope and leg encoders. But the raw speed isn’t the most interesting part.

Look at what the researchers didn’t design. They didn’t hand-engineer gait patterns or specify when legs should lift or how to compensate for ice. The policy figured these out on its own. It also developed behaviors nobody programmed in: automatic recovery from tripping, real-time compensation for a malfunctioning motor. These aren’t edge cases somebody added to the reward function. They fell out of the training process naturally.

The lesson: better robots come from richer simulations and smarter training, not more elaborate hand-coded rules.

The minimal sensor requirements matter too. Since the controller only needs sensors already built into most quadrupeds, it could plausibly work on platforms beyond this one prototype. Agile locomotion might become a software upgrade rather than a hardware redesign.

Bottom Line: A neural network trained entirely in simulation, using an adaptive speed curriculum and on-the-fly terrain inference, breaks the speed record for the MIT Mini Cheetah and runs reliably on grass, ice, and gravel. RL can deliver agile, adaptable locomotion that was once thought to require hand-crafted physics models.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work connects reinforcement learning and classical mechanics by encoding physical constraints (centrifugal force limits, actuator dynamics, contact force regulation) directly into the training curriculum. Physical laws guide the learning process instead of blocking it.

Impact on Artificial Intelligence
The adaptive velocity curriculum and teacher-student architecture extend beyond locomotion. Any multi-task RL problem where tasks vary in difficulty along physically constrained axes could use the same approach.

Impact on Fundamental Interactions
Zero-shot sim-to-real transfer at record agility levels shows that simulated training can capture real-world physics well enough to need no real-world practice at all.

Outlook and References
Extensions to more complex body plans, richer sensing, and higher-level navigation are natural next steps. The full system and videos are available at [arXiv:2205.02824](https://arxiv.org/abs/2205.02824) alongside the IROS 2022 proceedings.

Original Paper Details

Title
Rapid Locomotion via Reinforcement Learning

arXiv ID
2205.02824

Authors
Gabriel B Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal

Abstract
Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot's behaviors are available at: https://agility.csail.mit.edu/

Rapid Locomotion via Reinforcement Learning

Authors

Abstract

Concepts

The Big Picture

How It Works

Why It Matters

IAIFI Research Highlights

Original Paper Details