quadruped-dog-rl

May 16, 2026 · View on GitHub

Quadruped robot dog simulation, walking control, and reinforcement learning policy training workspace.

Supports: Unitree Go2, Boston Dynamics Spot, MIT Mini Cheetah, ANYmal B/C, Mini Pupper.

Unitree Go2 in RViz2

Unitree Go2 in Gazebo Garden

Repository Structure

quadruped-dog-rl/
├── urdf/                    # Robot URDF and mesh files
│   ├── go1_config/          # Unitree Go1
│   ├── go2_unitree/         # Unitree Go2 (with DAE meshes)
│   ├── spot_config/         # Boston Dynamics Spot
│   ├── mini_cheetah_config/ # MIT Mini Cheetah
│   ├── mini_pupper_config/  # Mini Pupper
│   ├── anymal_b_config/     # ANYmal B (ETH Zurich)
│   └── anymal_c_config/     # ANYmal C (ETH Zurich)
├── ros2/                    # ROS2 packages (CHAMP framework, ros2 branch)
│   ├── champ/               # Core locomotion controller
│   ├── champ_base/          # Hardware abstraction layer
│   ├── champ_bringup/       # Launch files
│   ├── champ_config/        # Robot-specific configs
│   ├── champ_description/   # URDF loading
│   ├── champ_gazebo/        # Gazebo simulation
│   ├── champ_navigation/    # Navigation stack
│   ├── champ_teleop/        # Keyboard/joystick teleoperation
│   └── robots/              # Pre-configured robot packages
├── launch/                  # Top-level launch files
│   ├── view_go2.launch.py   # View Go2 URDF in RViz2
│   ├── gazebo_go2.launch.py # Spawn Go2 in Gazebo Garden
│   ├── gazebo_sim.launch.py # Generic Gazebo sim launcher (CHAMP)
│   ├── rviz_view.launch.py  # Generic RViz2 viewer
│   └── policy_deploy.launch.py # Deploy trained RL policy (MuJoCo)
├── scripts/                 # Shell scripts for common tasks
│   ├── train_policy.sh      # Train walking policy
│   ├── play_policy.sh       # Visualize trained policy
│   ├── launch_sim.sh        # Launch CHAMP Gazebo sim
│   ├── spawn_go2_gazebo.sh  # Direct Gazebo spawning
│   └── make_go2_stand.py    # Convert URDF → standing SDF
├── training/                # RL policy training
│   ├── legged_gym/          # Isaac Gym PPO environments (original)
│   ├── envs/                # MuJoCo + Gazebo Gymnasium environments
│   │   ├── go2_mujoco_env.py   # Go2 MuJoCo env (SB3 PPO)
│   │   ├── go2_gazebo_env.py   # Go2 Gazebo env (ROS2 bridge)
│   │   └── go2_scene.xml       # MuJoCo MJCF scene
│   ├── train_mujoco.py      # MuJoCo training script
│   ├── train_gazebo.py      # Gazebo training script
│   ├── teleop_mujoco.py     # Keyboard teleop in MuJoCo
│   ├── launch/              # ROS2 launch files for Gazebo RL
│   ├── deploy/              # Policy deployment (MuJoCo / real robot)
│   └── setup.py
├── intelligence/            # Higher-level autonomy stack
│   ├── gait/                # Gait scheduler
│   ├── perception/          # Terrain estimator
│   ├── navigation/          # Waypoint navigator (ROS2)
│   ├── terrain/             # Adaptive controller
│   └── llm_commander/       # Natural language → robot commands
├── description/             # Robot description docs and joint conventions
└── interfaces/              # Custom ROS2 msgs, srvs, actions

System Requirements

Ubuntu 22.04
ROS2 Humble
Gazebo Garden (gz-sim7) — works with ros_gz_sim
Python 3.8+
NVIDIA GPU with 10GB+ VRAM for RL training

Build ROS2 Packages

cd ros2
source /opt/ros/humble/setup.bash
colcon build --symlink-install --cmake-args -DBUILD_TESTING=OFF
source install/setup.bash

View Go2 in RViz2

source /opt/ros/humble/setup.bash
ros2 launch launch/view_go2.launch.py

Opens RViz2 with the full Go2 mesh and a joint slider GUI to pose the legs.

Spawn Go2 in Gazebo Garden

Terminal 1 — Launch simulation

source /opt/ros/humble/setup.bash
ros2 launch launch/gazebo_go2.launch.py

Starts Gazebo Garden, spawns the Go2, bridges topics to ROS2, and opens RViz2 alongside it.

Terminal 2 — Control the robot

Publish a single velocity command:

ros2 topic pub /cmd_vel geometry_msgs/msg/Twist \
  "{linear: {x: 0.5, y: 0.0, z: 0.0}, angular: {z: 0.0}}" --once

Drive continuously (stream at 10 Hz):

ros2 topic pub /cmd_vel geometry_msgs/msg/Twist \
  "{linear: {x: 0.3}, angular: {z: 0.2}}" --rate 10

Useful commands:

Action	Command
Move forward	`linear.x = 0.3`
Move backward	`linear.x = -0.3`
Strafe left	`linear.y = 0.2`
Turn left	`angular.z = 0.5`
Turn right	`angular.z = -0.5`
Stop	all zeros

Keyboard teleoperation (CHAMP):

# In a second terminal (after sourcing both ROS2 and ros2/install/setup.bash)
source /opt/ros/humble/setup.bash
source ros2/install/setup.bash
ros2 launch champ_teleop teleop.launch.py

Use arrow keys / WASD to drive.

CHAMP Locomotion Simulation

For the full walking gait controller using CHAMP:

source /opt/ros/humble/setup.bash
source ros2/install/setup.bash
ros2 launch ros2/champ_config/launch/gazebo.launch.py

Then in a second terminal:

source /opt/ros/humble/setup.bash
source ros2/install/setup.bash
ros2 launch champ_teleop teleop.launch.py

RL Policy Training

Three backends are supported. Use the unified helper script:

./scripts/train_policy.sh [backend] [options]

MuJoCo backend (default — no Isaac Gym needed)

Trains directly in MuJoCo using Gymnasium + Stable-Baselines3 PPO. Headless, fast, CUDA-accelerated.

Features enabled by default:

Domain randomization — body mass ±15%, floor friction ±30%, motor kp ±15% each episode
Curriculum learning — command velocity starts slow (0.3 m/s max) and scales to 1.2 m/s as the policy improves
Foot contact observations — 4 touch sensor readings in the 49-dim observation vector
Richer reward — velocity tracking + base height + orientation + foot contact + action smoothness (8 terms)
VecNormalize — running obs + reward normalisation across all parallel envs
TensorBoard — each reward term logged separately under reward/lin, reward/contact, etc.
Tuned PPO — lr=3e-4, n_steps=2048, n_epochs=10

# Install deps once
pip install -r requirements.txt

# Train Go2 (default 2M steps, 8 parallel envs)
./scripts/train_policy.sh mujoco

# Custom run
./scripts/train_policy.sh mujoco --timesteps 5000000 --n_envs 16 --cmd 1.0 0.0 0.0

# Resume from checkpoint (VecNormalize stats auto-loaded from checkpoints/ dir)
./scripts/train_policy.sh mujoco --resume training/logs/mujoco/checkpoints/go2_mujoco_500000_steps.zip

Output: training/logs/mujoco/ — checkpoints + vecnorm_<steps>_steps.pkl every 50k steps.

# View reward curves in TensorBoard
tensorboard --logdir training/logs/mujoco

Gazebo backend (Gazebo Harmonic + ROS2)

Trains with real Gazebo Harmonic physics via ROS2 topics. Uses JointPositionController plugins for PD control, bridged via ros_gz_bridge.

source /opt/ros/humble/setup.bash
source ros2/install/setup.bash

# Build ROS2 workspace first (once)
cd ros2 && colcon build --symlink-install --cmake-args -DBUILD_TESTING=OFF && cd ..

# Train (auto-launches Gazebo headlessly)
./scripts/train_policy.sh gazebo

# Use an already-running Gazebo (no auto-launch)
./scripts/train_policy.sh gazebo --no-launch

# Launch Gazebo headlessly standalone
ros2 launch training/launch/gazebo_rl.launch.py

Robot URDF variants:

urdf/go2_unitree/urdf/go2.urdf — base model
urdf/go2_unitree/urdf/go2_gz.urdf — with Gazebo Harmonic joint controllers (for RL training)

Isaac Gym backend (requires NVIDIA Isaac Gym)

# Download from https://developer.nvidia.com/isaac-gym
pip install -e training/
./scripts/train_policy.sh isaac go2
./scripts/train_policy.sh isaac go2 --headless

Registered Isaac tasks: go2, h1, h1_2, g1

Headless IK Controller (no RL)

Run the Go2 immediately without a trained policy using a pure IK trot/walk/bound controller. Gait switches automatically with speed via the GaitScheduler:

Speed (m/s)	Gait
0 – 0.05	Stand
0.05 – 0.4	Walk
0.4 – 1.5	Trot
1.5 – 2.5	Canter
2.5 – 4.0	Bound
4.0+	Pronk

pip install -r requirements.txt

# Run interactive viewer
python3 training/headless_control.py

# Record a video
python3 training/headless_control.py --record out.mp4

Key	Action
W / S	Forward / Backward
A / D	Strafe Left / Right
Q / E	Yaw Left / Right
Space	Stop
R	Reset simulation
ESC	Quit

Play Trained Policy (OpenCV viewer)

Runs a trained checkpoint in the same headless OpenCV viewer as the IK controller. VecNormalize stats are auto-detected from the checkpoint directory.

# Auto-detect vecnorm stats from the same directory as the model
python3 training/play_policy.py --model training/logs/mujoco/best_model.zip

# Explicit vecnorm path or custom command velocity
python3 training/play_policy.py --model best_model.zip --vecnorm vecnorm_final.pkl --cmd 0.8 0 0

# Record a video
python3 training/play_policy.py --model best_model.zip --record policy_demo.mp4

HUD shows: commanded velocity, actual velocity, per-step reward, action magnitude, episode count.

Key	Action
R	Reset episode
ESC	Quit

Keyboard Teleop (MuJoCo, with RL policy)

Control the Go2 interactively with a trained policy or random actions:

# With trained model
python3 training/teleop_mujoco.py --model training/logs/mujoco/best_model.zip

# Without model (random actions, for testing the sim)
python3 training/teleop_mujoco.py

Key	Action
W / S	Forward / Backward
A / D	Strafe Left / Right
Q / E	Yaw Left / Right
R	Reset episode
ESC	Quit

Deploy Trained Policy in MuJoCo

# For H1/H1_2/G1 with pre-trained weights
python3 training/deploy/deploy_mujoco/deploy_mujoco.py h1.yaml

# Via ROS2 launch (Go2)
ros2 launch launch/policy_deploy.launch.py checkpoint:=/path/to/policy.pt task:=go2

Available Robots

Robot	URDF Path	RL Task
Unitree Go2	`urdf/go2_unitree/urdf/go2.urdf`	`go2`
Unitree H1	—	`h1`, `h1_2`
Unitree G1	—	`g1`
Boston Dynamics Spot	`urdf/spot_config/`	—
MIT Mini Cheetah	`urdf/mini_cheetah_config/`	—
ANYmal B	`urdf/anymal_b_config/`	—
ANYmal C	`urdf/anymal_c_config/`	—
Mini Pupper	`urdf/mini_pupper_config/`	—

Intelligence Modules

Higher-level autonomy stack built on top of the base simulation and RL policy.

intelligence/
├── locomotion_manager.py       # ROS2 node — fuses all modules into one running stack
├── gait/
│   └── gait_scheduler.py       # Auto-select gait (walk/trot/canter/bound) by speed
├── perception/
│   └── terrain_estimator.py    # Classify terrain (flat/slope/stairs/rough) from IMU + foot forces
├── navigation/
│   └── waypoint_navigator.py   # Autonomous waypoint following via pure pursuit (ROS2 node)
├── terrain/
│   └── adaptive_controller.py  # Fuse terrain + gait into safe velocity commands
└── llm_commander/
    └── llm_commander.py        # Natural language -> robot commands via Claude API

Gait Scheduler

Auto-selects the right gait based on commanded speed:

Speed (m/s)	Gait	Foot pattern
0 – 0.05	Stand	All feet down
0.05 – 0.4	Walk	One foot at a time
0.4 – 1.5	Trot	Diagonal pairs (FL+RR, FR+RL)
1.5 – 2.5	Canter	Three-beat
2.5 – 4.0	Bound	Front pair then rear pair
4.0+	Pronk	All four feet airborne

Terrain Estimator

Classifies terrain from IMU and foot contact forces, outputs recommended speed limit and foot clearance:

from intelligence.perception.terrain_estimator import TerrainEstimator
estimator = TerrainEstimator()
result = estimator.estimate(imu_roll=0.1, imu_pitch=0.05, contacts=[120, 115, 118, 122])
# TerrainEstimate(terrain_type=flat, slope_deg=6.38, recommended_speed_limit=3.0)

Waypoint Navigator (ROS2)

Autonomous point-to-point navigation using pure pursuit. Run directly as a Python node:

source /opt/ros/humble/setup.bash
python3 intelligence/navigation/waypoint_navigator.py \
    --ros-args -p waypoints:="[[2.0,0.0],[2.0,2.0],[0.0,2.0],[0.0,0.0]]" \
               -p linear_speed:=0.5

LLM Commander (Natural Language)

Control the robot with plain English using Claude API:

export ANTHROPIC_API_KEY=your_key
python3 intelligence/llm_commander/llm_commander.py

Then publish commands:

ros2 topic pub /natural_language_cmd std_msgs/msg/String "data: 'trot forward at medium speed'"
ros2 topic pub /natural_language_cmd std_msgs/msg/String "data: 'turn left slowly'"
ros2 topic pub /natural_language_cmd std_msgs/msg/String "data: 'stop'"

Adaptive Controller

Combines terrain estimation + gait scheduling into a single safe command output:

from intelligence.terrain.adaptive_controller import AdaptiveController
ctrl = AdaptiveController()
cmd = ctrl.adapt(desired_speed=1.2, imu_pitch=0.12, contacts=[110,115,108,120])
# AdaptedCommand(linear_x=1.0, gait='trot', terrain='slope', foot_clearance=0.08)

Locomotion Manager (ROS2 node)

Wires all three modules into one running ROS2 node. Subscribes to IMU + foot forces + raw velocity commands; publishes safe adapted commands and JSON status.

/cmd_vel_raw  (Twist)              →┐
/imu          (Imu)                →┤  LocomotionManager  →  /cmd_vel (Twist)
/foot_forces  (Float32MultiArray)  →┘                     →  /locomotion_status (String, JSON)

source /opt/ros/humble/setup.bash
python3 intelligence/locomotion_manager.py

# With custom params
python3 intelligence/locomotion_manager.py --ros-args -p max_speed:=1.2 -p update_rate:=50.0

Monitor the adapted output:

ros2 topic echo /cmd_vel
ros2 topic echo /locomotion_status

The /locomotion_status JSON payload includes:

{"terrain": "slope", "gait": "trot", "speed": 0.8, "angular": 0.0, "slope_deg": 12.5, "foot_clearance": 0.08}

Pipe WaypointNavigator → LocomotionManager → gait controller for a fully autonomous stack:

# Terminal 1 — locomotion manager (terrain-aware speed clamping)
python3 intelligence/locomotion_manager.py

# Terminal 2 — waypoint navigator (publishes to /cmd_vel_raw)
python3 intelligence/navigation/waypoint_navigator.py \
    --ros-args -p waypoints:="[[2.0,0.0],[2.0,2.0],[0.0,0.0]]" \
               -r /cmd_vel:=/cmd_vel_raw

References

CHAMP Framework — ROS2 locomotion controller
Unitree RL Gym — PPO policy training
legged_gym (ETH Zurich) — original RL gym
Isaac Lab — modern GPU training framework