Learn to Teach

Sample-efficient privileged learning for humanoid locomotion over diverse terrains.

Overview

Learn to Teach (L2T) targets a practical bottleneck in humanoid reinforcement learning: policies can become robust in simulation, but training them across enough terrain, contact, and disturbance variation usually consumes enormous numbers of simulator samples.

The paper reframes teacher-student learning as a one-stage privileged learning problem. Rather than train a privileged teacher first and distill it into a deployable student afterward, L2T learns the teacher and student together. The two policies share the same simulated dynamics, so samples collected for one side can keep teaching the other. This synchronizes the learning trajectories, recycles simulator data, and reduces both sample complexity and training time.

On the Digit humanoid, the RL variant (L2T-RL) demonstrates zero-shot sim-to-real transfer. The hardware videos cover outdoor and indoor walking, concrete, gravel, grass, sand, rocky terrain, slippery surfaces, elevation changes, payload carrying, turning, and recovery from pushes and pulls.

Why it matters

Project video

Hardware videos

Outdoor environments

Outdoor campus Robust walking in an outdoor campus environment.
Concrete Stable walking on a flat concrete surface with strong wind.
Indoor corridors Navigation through indoor corridors and open spaces.

Challenging terrains

Rocky terrain Traversing uneven rocky terrain.
Sand Walking on loose sand.
Gravel pavement Stable walking on gravel pavement.
Grass Walking on natural grass.

Dynamic behaviors

Turning Smooth turning behavior.
Football field Forward walking on a football field.
Football field turning Turning on a football field.

External forces

Forward push Recovery from a forward push.
Backward push Recovery from a backward push.
Pulling force Recovery from a pulling force.

Special conditions

Slippery surface demo Demonstrating the slippery surface.
Slippery surface with L2T Walking on a slippery surface with the L2T policy.
Slippery surface baseline Walking on a slippery surface with the Agility Robotics company controller.
Payload Carrying payload while walking.
Elevation Walking on an elevated surface.

Publications

Feiyang Wu, Xavier Nal, Jaehwi Jang, Wei Zhu, Zhaoyuan Gu, Anqi Wu, Ye Zhao
RA-L 2025, ICRA 2026 Oral
Feiyang Wu, Zhaoyuan Gu, Hanran Wu, Anqi Wu, Ye Zhao
ICRA 2024

Links