pydrake.gym

Drake Gym

THIS FEATURE IS EXPERIMENTAL. As per our [guidelines](https://drake.mit.edu/stable.html) for experimental code, development is ongoing and no guarantees against deprecation are provided for any file under this directory.

Drake Gym is an implementation of Farama’s “Gymnasium” interface for reinforcement learning which uses a Drake Simulator as a backend. The Gym interface provided by [the python gym module](https://pypi.org/project/gymnasium/) simply models a time-stepped process with an action space, a reward function, and some form of state observation.

Note that pydrake.gym is an optional component of pydrake, and will only work when the gymnasium package is also installed. As such, the DrakeGymEnv and related code is not available for import as part of pydrake.all.

A note on dependencies

In order to use a Gym, code must implement a gymnasium.Env (in this case the actual Simulator wrapped via DrakeGymEnv) and must run the Gym within an RL engine of some sort (that’s usually an algorithm chosen from [Stable Baselines 3] (https://stable-baselines3.readthedocs.io/en/master/index.html) but could also be from nevergrad or some other source of gradient-free optimizers).

Stable Baselines3 iteself is too large and too heavy of a dependency tree for Drake to require itself; as such you will need to provide it yourself to use these examples. If you are going to train drake gym examples on your machine, you should install Stable Baselines 3 (for instance, pip install stable_baselines3 inside of a virtual environment). The training examples will not run without it, and drake does not come with it (Drake does include a subset of stable_baselines3 for testing purposes, but not enough to perform training).

class pydrake.gym.DrakeGymEnv(simulator: Union[pydrake.systems.analysis.Simulator, Callable[[pydrake.common.RandomGenerator], pydrake.systems.analysis.Simulator]], time_step: float, action_space: gymnasium.spaces.space.Space, observation_space: gymnasium.spaces.space.Space, reward: Union[Callable[[pydrake.systems.framework.System, pydrake.systems.framework.Context], float], pydrake.systems.framework.OutputPortIndex, str], action_port_id: Optional[Union[pydrake.systems.framework.InputPort, pydrake.systems.framework.InputPortIndex, str]] = None, observation_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_rgb_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_mode: str = 'human', reset_handler: Optional[Callable[[pydrake.systems.analysis.Simulator, pydrake.systems.framework.Context], None]] = None, info_handler: Optional[Callable[[pydrake.systems.analysis.Simulator], dict]] = None, hardware: bool = False)

Bases: gymnasium.core.Env

DrakeGymEnv provides a gym.Env interface for a Drake System (often a Diagram) using a Simulator.

__init__(simulator: Union[pydrake.systems.analysis.Simulator, Callable[[pydrake.common.RandomGenerator], pydrake.systems.analysis.Simulator]], time_step: float, action_space: gymnasium.spaces.space.Space, observation_space: gymnasium.spaces.space.Space, reward: Union[Callable[[pydrake.systems.framework.System, pydrake.systems.framework.Context], float], pydrake.systems.framework.OutputPortIndex, str], action_port_id: Optional[Union[pydrake.systems.framework.InputPort, pydrake.systems.framework.InputPortIndex, str]] = None, observation_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_rgb_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_mode: str = 'human', reset_handler: Optional[Callable[[pydrake.systems.analysis.Simulator, pydrake.systems.framework.Context], None]] = None, info_handler: Optional[Callable[[pydrake.systems.analysis.Simulator], dict]] = None, hardware: bool = False)

Parameters

simulator – Either a drake.systems.analysis.Simulator, or a function that produces a (randomized) Simulator.
time_step – Each call to step() will advance the simulator by time_step seconds.
action_space – Defines the gym.spaces.Space for the actions. If the action port is vector-valued, then passing None defaults to a gym.spaces.Box of the correct dimension with bounds at negative and positive infinity. Note: Stable Baselines 3 strongly encourages normalizing the action_space to [-1, 1].
observation_space – Defines the gym.spaces.Space for the observations. If the observation port is vector-valued, then passing None defaults to a gym.spaces.Box of the correct dimension with bounds at negative and positive infinity.
reward – The reward can be specified in one of two ways: (1) by passing a callable with the signature value = reward(system, context) or (2) by passing a scalar vector-valued output port of simulator’s system.
action_port_id – The ID of an input port of simulator’s system compatible with the action_space. Each Env must have an action port; passing None defaults to using the first input port (inspired by InputPortSelection.kUseFirstInputIfItExists).
observation_port_id – An output port of simulator’s system compatible with the observation_space. Each Env must have an observation port (it seems that gym doesn’t support empty observation spaces / open-loop policies); passing None defaults to using the first input port (inspired by OutputPortSelection.kUseFirstOutputIfItExists).
render_rgb_port_id – An optional output port of simulator’s system that returns an ImageRgba8U; often the color_image port of a Drake RgbdSensor.
render_mode – The render mode of the environment determined at initialization. Defaults to human which uses visualizers inside the System (e.g. MeshcatVisualizer, PlanarSceneGraphVisualizer, etc.). render_mode equal to rgb_array evaluates the render_rgb_port and ansi calls __repr__ on the system Context.
reset_handler – A function that sets the home state (plant, and/or env.) at reset(). The reset state can be specified in one of the two ways: (if reset_handler is None) setting random context using a Drake random_generator (e.g. joint.set_random_pose_distribution() using the reset() seed), (otherwise) using reset_handler().
info_handler – A function that returns a dict[str, Any] containing auxiliary diagnostic information (helpful for debugging, learning, and logging). Note: if step() terminates with a RuntimeError, then, to avoid unexpected behavior, info_handler()` will not be called and an empty info will be returned instead.
hardware – If True, it prevents from setting random context at reset() when using random_generator, but it does execute reset_handler() if given.

Notes (using env as an instance of this class):

You may set simulator/integrator preferences by using env.simulator directly.
The done condition returned by step() is always False by default. Use env.simulator.set_monitor() to use Drake’s monitor functionality for specifying termination conditions.
You may additionally wish to directly set env.reward_range and/or env.spec. See the docs for gym.Env for more details.

render()

Rendering in human mode is accomplished by calling ForcedPublish on system. This should cause visualizers inside the System (e.g. MeshcatVisualizer, PlanarSceneGraphVisualizer, etc.) to draw their outputs. To be fully compliant, those visualizers should set their default publishing period to np.inf (do not publish periodically).

Rendering in ascii mode calls __repr__ on the system Context.

Rendering in rgb_array mode is enabled by passing a compatible render_rgb_port to the class constructor.

reset(*, seed: Optional[int] = None, options: Optional[dict] = None): If a callable “simulator factory” was passed to the constructor, then a new simulator is created. Otherwise this method simply resets the simulator and its Context.

step(action)

Implements gym.Env.step to advance the simulation forward by one self.time_step.

Parameters: action – an element from self.action_space.