Learning whole-body control for locomotion and arm motions in a single policy has challenges, as the two tasks have conflicting goals. For instance, efficient locomotion typically favors a horizontal base orientation, while end-effector tracking may benefit from base tilting to extend reachability. Additionally, current Reinforcement Learning (RL) approaches using a pose-based task specification lack the ability to directly control the end-effector velocity, making smoothly executing trajectories very challenging. To address these limitations, we propose an RL-based framework that allows for dynamic, velocity-aware whole-body end-effector control. Our method introduces a multi-critic actor architecture that decouples the reward signals for locomotion and manipulation, simplifying reward tuning and allowing the policy to resolve task conflicts more effectively. Furthermore, we design a twist-based end-effector task formulation that can track both discrete poses and motion trajectories. We validate our approach through a set of simulation and hardware experiments using a quadruped robot equipped with a robotic arm. The resulting controller can simultaneously walk and move its end-effector and shows emergent whole-body behaviors, where the base assists the arm in extending the workspace, despite a lack of explicit formulations.
Given a randomly sampled start and goal pose, the desired end-effector and base twist, along with the desired foot height, are provided as commands to the policy by the command generator. Rewards are categorically computed and consumed by separate critics, leading to individual value functions. The advantage is estimated per critic, normalized, and summed to compute the total advantage. Policy optimization is performed using Proximal Policy Optimization (PPO) to optimize the teacher policy.
We evaluate our approach in both simulation and on a real quadruped robot with a robotic arm. The policy is trained in simulation using the proposed multi-critic architecture and twist-based end-effector control. We assess the performance of the learned controller in various tasks, including static pose tracking, dynamic trajectory following, and obstacle negotiation. The results demonstrate that our method achieves superior end-effector tracking accuracy while maintaining stable locomotion compared to baseline approaches. Additionally, we observe emergent behaviors where the robot utilizes its base to assist the arm in reaching targets beyond its nominal workspace.
To assess the tracking performance of the policy, we evaluate its ability to track the following trajectory types:
We demonstrate precise end-effector tracking during simultaneous locomotion by performing a end-effector trajectory tracking task during locomotion.
We demonstrate precise end-effector tracking during simultaneous locomotion by performing a chicken head control task.
@article{vijayan2025multi,
author = {Vijayan, Aravind Elanjimattathil and Cramariuc, Andrei and Risiglione, Mattia and Gehring, Christian and Hutter, Marco},
title = {Multi-critic Learning for Whole-body End-effector Twist Tracking},
journal = {Conference on Robot Learning (CoRL)},
year = {2025},
}