BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO

1Universitat Pompeu Fabra, 2ICREA 3PyTorch Team 4Meta

Abstract

We present BricksRL, a platform designed to democratize access to robotics for reinforcement learning research and education. BricksRL facilitates the creation, design, and training of custom LEGO robots in the real world by interfacing them with the TorchRL library for reinforcement learning agents. The integration of TorchRL with the LEGO hubs, via Bluetooth bidirectional communication, enables state-of-the-art reinforcement learning training on GPUs for a wide variety of LEGO builds. This offers a flexible and cost-efficient approach for scaling and also provides a robust infrastructure for robot-environment-algorithm communication. We present various experiments across tasks and robot configurations, providing built plans and training results. Furthermore, we demonstrate that inexpensive LEGO robots can be trained end-to-end in the real world to achieve simple tasks, with training times typically under 120 minutes on a normal laptop. Moreover, we show how users can extend the capabilities, exemplified by the successful integration of non-LEGO sensors. By enhancing accessibility to both robotics and reinforcement learning, BricksRL establishes a strong foundation for democratized robotic learning in research and educational settings.

RoboArm

The videos display the evaluation of two SAC agents: one trained entirely in the real-world environment, RoboArm-v0 (left), and another in the simulation environment, RoboArmSim-v0 (right). Displayed at the top of the video are four goal positions, selected to evaluate the effectiveness of the trained policies, which the agent must reach to demonstrate task completion.

RoboArm Combined

RoboArm Mixed Observation

RoboArm Combined

This video (left) presents sequences of successive evaluation trials of the SAC agent in the RoboArm-mixed-v0 environment, which integrates direct sensor information of the robot arm angles and image inputs. The objective is to move the red ball, held by the robotic arm, to a randomly sampled target position indicated by a green dot in the image. Training performance (right) of the RoboArm robot in the RoboArm_mixed-v0 environment, showing both the reward and the number of episode steps required to reach the target location.

Walker

This example features a DroQ agent trained in the Walker-v0 environment, which is directly implemented in the real world (left), and a DroQ agent trained in the WalkerSim-v0 simulation (right). The comparison demonstrates a successful simulation-to-reality (sim2real) transfer of the policy, where both agents are tasked with learning a forward-moving walking gait.

RoboArm Combined

2Wheeler

Evaluating a trained SAC agent in the RunAway-v0 environment (left) and a TD3 agent in the Spinning-v0 environment (right), each performing simple yet distinct tasks. In RunAway-v0, the agent's objective is to maximize the distance measured by an Ultrasonic sensor. Meanwhile, in Spinning-v0, the agent's task is to turn the 2Wheeler left or right based on indicators provided to the agent.

RoboArm Combined