Autonomous bicycles offer a promising agile solution for urban mobility and last-mile logistics. However, conventional control strategies often struggle with underactuated nonlinear dynamics, suffering from sensitivity to model mismatches and limited adaptability to real-world uncertainties. To address this, we develop CycleRL, a comprehensive sim-to-real framework for robust autonomous bicycle control. Our approach establishes a direct perception-to-action mapping within the high-fidelity NVIDIA Isaac Sim environment, leveraging Proximal Policy Optimization (PPO) to optimize the control policy. The framework features a composite reward function tailored for concurrent balance maintenance, velocity tracking, and steering control. Crucially, systematic domain randomization is employed to reduce the reliance on precise system modeling, bridge the simulation-to-reality gap and facilitate direct transfer. In simulation, CycleRL achieves promising performance, including a 99.90% balance success rate, a heading tracking error of 1.15°, and a velocity tracking error of 0.18 m/s. These quantitative results, coupled with successful hardware deployment, validate DRL as an effective paradigm for autonomous bicycle control, offering superior adaptability over traditional methods. Video demonstrations are available at https://anony6f05.github.io/CycleRL/.
CycleRL: A robust sim-to-real Deep Reinforcement Learning framework for autonomous bicycle control. Our approach achieves direct policy transfer from high-fidelity simulation to physical hardware without explicit dynamic modeling. The video demonstrates rigorous validation across scenarios that challenge traditional control stability, ranging from handling physical variations (payload, abnormal tire pressure and varying speed) to agile maneuvering on diverse terrains such as asphalt road, gravel, uneven lawn, slope, speed bump and lateral perturbations. It concludes with demonstrations of autonomous lane tracking, long-duration verification and cross-platform validation, confirming the system's exceptional reliability and generalizability.