CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control

Anonymous11, Anonymous21, Anonymous31, Anonymous41, Anonymous51, Anonymous61,*
1Affiliation, *Corresponding author

Abstract

Autonomous bicycles offer a promising agile solution for urban mobility and last-mile logistics, however, conventional control strategies often struggle with their underactuated nonlinear dynamics, suffering from sensitivity to model mismatches and limited adaptability to real-world uncertainties. To address this, this paper presents CycleRL, the first sim-to-real deep reinforcement learning framework designed for robust autonomous bicycle control. Our approach trains an end-to-end neural control policy within the high-fidelity NVIDIA Isaac Sim environment, leveraging Proximal Policy Optimization (PPO) to circumvent the need for an explicit dynamics model. The framework features a composite reward function tailored for concurrent balance maintenance, velocity tracking, and steering control. Crucially, systematic domain randomization is employed to bridge the simulation-to-reality gap and facilitate direct transfer. In simulation, CycleRL achieves considerable performance, including a 99.90% balance success rate, a low steering tracking error of 1.15°, and a velocity tracking error of 0.18 m/s. These quantitative results, coupled with successful hardware transfer, validate DRL as an effective paradigm for autonomous bicycle control, offering superior adaptability over traditional methods. Video demonstrations are available at https://anony6f05.github.io/CycleRL/.

Video

CycleRL: A robust sim-to-real Deep Reinforcement Learning framework for autonomous bicycle control. Our approach achieves direct policy transfer from high-fidelity simulation to physical hardware without explicit dynamic modeling. The video demonstrates rigorous validation across scenarios that challenge traditional control stability, ranging from handling physical variations (payload, abnormal tire pressure and varying speed) to agile maneuvering on diverse terrains like asphalt road, gravel, uneven lawn, slope, speed bump and lateral perturbations. It concludes with demonstrations of autonomous lane tracking and long-duration verification, confirming the system's exceptional reliability.