pydrake.gym
Drake Gym
THIS FEATURE IS EXPERIMENTAL. As per our [guidelines](https://drake.mit.edu/stable.html) for experimental code, development is ongoing and no guarantees against deprecation are provided for any file under this directory.
Drake Gym is an implementation of Farama’s “Gymnasium” interface for reinforcement learning which uses a Drake Simulator as a backend. The Gym interface provided by [the python gym module](https://pypi.org/project/gymnasium/) simply models a time-stepped process with an action space, a reward function, and some form of state observation.
Note that pydrake.gym is an optional component of pydrake, and will only work when the gymnasium package is also installed. As such, the DrakeGymEnv and related code is not available for import as part of pydrake.all.
A note on dependencies
In order to use a Gym, code must implement a gymnasium.Env (in this case the actual Simulator wrapped via DrakeGymEnv) and must run the Gym within an RL engine of some sort (that’s usually an algorithm chosen from [Stable Baselines 3] (https://stable-baselines3.readthedocs.io/en/master/index.html) but could also be from nevergrad or some other source of gradient-free optimizers).
Stable Baselines3 iteself is too large and too heavy of a dependency tree for Drake to require itself; as such you will need to provide it yourself to use these examples. If you are going to train drake gym examples on your machine, you should install Stable Baselines 3 (for instance, pip install stable_baselines3 inside of a virtual environment). The training examples will not run without it, and drake does not come with it (Drake does include a subset of stable_baselines3 for testing purposes, but not enough to perform training).
- class pydrake.gym.DrakeGymEnv(simulator: Union[pydrake.systems.analysis.Simulator, Callable[[pydrake.common.RandomGenerator], pydrake.systems.analysis.Simulator]], time_step: float, action_space: gymnasium.spaces.space.Space, observation_space: gymnasium.spaces.space.Space, reward: Union[Callable[[pydrake.systems.framework.System, pydrake.systems.framework.Context], float], pydrake.systems.framework.OutputPortIndex, str], action_port_id: Optional[Union[pydrake.systems.framework.InputPort, pydrake.systems.framework.InputPortIndex, str]] = None, observation_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_rgb_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_mode: str = 'human', reset_handler: Optional[Callable[[pydrake.systems.analysis.Simulator, pydrake.systems.framework.Context], None]] = None, info_handler: Optional[Callable[[pydrake.systems.analysis.Simulator], dict]] = None, hardware: bool = False)
Bases:
gymnasium.core.Env
DrakeGymEnv provides a gym.Env interface for a Drake System (often a Diagram) using a Simulator.
- __init__(simulator: Union[pydrake.systems.analysis.Simulator, Callable[[pydrake.common.RandomGenerator], pydrake.systems.analysis.Simulator]], time_step: float, action_space: gymnasium.spaces.space.Space, observation_space: gymnasium.spaces.space.Space, reward: Union[Callable[[pydrake.systems.framework.System, pydrake.systems.framework.Context], float], pydrake.systems.framework.OutputPortIndex, str], action_port_id: Optional[Union[pydrake.systems.framework.InputPort, pydrake.systems.framework.InputPortIndex, str]] = None, observation_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_rgb_port_id: Optional[Union[pydrake.systems.framework.OutputPortIndex, str]] = None, render_mode: str = 'human', reset_handler: Optional[Callable[[pydrake.systems.analysis.Simulator, pydrake.systems.framework.Context], None]] = None, info_handler: Optional[Callable[[pydrake.systems.analysis.Simulator], dict]] = None, hardware: bool = False)
- Parameters
simulator – Either a
drake.systems.analysis.Simulator
, or a function that produces a (randomized) Simulator.time_step – Each call to
step()
will advance the simulator bytime_step
seconds.action_space – Defines the
gym.spaces.Space
for the actions. If the action port is vector-valued, then passingNone
defaults to agym.spaces.Box
of the correct dimension with bounds at negative and positive infinity. Note: Stable Baselines 3 strongly encourages normalizing theaction_space
to [-1, 1].observation_space – Defines the
gym.spaces.Space
for the observations. If the observation port is vector-valued, then passingNone
defaults to agym.spaces.Box
of the correct dimension with bounds at negative and positive infinity.reward – The reward can be specified in one of two ways: (1) by passing a callable with the signature
value = reward(system, context)
or (2) by passing a scalar vector-valued output port ofsimulator
’s system.action_port_id – The ID of an input port of
simulator
’s system compatible with theaction_space
. Each Env must have an action port; passingNone
defaults to using the first input port (inspired byInputPortSelection.kUseFirstInputIfItExists
).observation_port_id – An output port of
simulator
’s system compatible with theobservation_space
. Each Env must have an observation port (it seems that gym doesn’t support empty observation spaces / open-loop policies); passingNone
defaults to using the first input port (inspired byOutputPortSelection.kUseFirstOutputIfItExists
).render_rgb_port_id – An optional output port of
simulator
’s system that returns anImageRgba8U
; often thecolor_image
port of a DrakeRgbdSensor
.render_mode – The render mode of the environment determined at initialization. Defaults to
human
which uses visualizers inside the System (e.g. MeshcatVisualizer, PlanarSceneGraphVisualizer, etc.).render_mode
equal torgb_array
evaluates therender_rgb_port
andansi
calls__repr__
on the system Context.reset_handler – A function that sets the home state (plant, and/or env.) at
reset()
. The reset state can be specified in one of the two ways: (ifreset_handler
is None) setting random context using a Drake random_generator (e.g.joint.set_random_pose_distribution()
using thereset()
seed), (otherwise) usingreset_handler()
.info_handler – A function that returns a
dict[str, Any]
containing auxiliary diagnostic information (helpful for debugging, learning, and logging). Note: ifstep()
terminates with aRuntimeError
, then, to avoid unexpected behavior, info_handler()` will not be called and an empty info will be returned instead.hardware – If True, it prevents from setting random context at
reset()
when usingrandom_generator
, but it does executereset_handler()
if given.
Notes (using
env
as an instance of this class):You may set simulator/integrator preferences by using
env.simulator
directly.The
done
condition returned bystep()
is always False by default. Useenv.simulator.set_monitor()
to use Drake’s monitor functionality for specifying termination conditions.You may additionally wish to directly set
env.reward_range
and/orenv.spec
. See the docs forgym.Env
for more details.
- render()
Rendering in
human
mode is accomplished by calling ForcedPublish onsystem
. This should cause visualizers inside the System (e.g. MeshcatVisualizer, PlanarSceneGraphVisualizer, etc.) to draw their outputs. To be fully compliant, those visualizers should set their default publishing period tonp.inf
(do not publish periodically).Rendering in
ascii
mode calls__repr__
on the system Context.Rendering in
rgb_array
mode is enabled by passing a compatiblerender_rgb_port
to the class constructor.
- reset(*, seed: Optional[int] = None, options: Optional[dict] = None)
If a callable “simulator factory” was passed to the constructor, then a new simulator is created. Otherwise this method simply resets the
simulator
and its Context.
- step(action)
Implements
gym.Env.step
to advance the simulation forward by oneself.time_step
.- Parameters
action – an element from
self.action_space
.