AMaze

November 22, 2023 ยท View on GitHub

๐Ÿงญ Partially-observable navigation in procedural mazes.

Maze Overview

The AMaze environment reproduces the MiniGrid-based, partially-observable maze navigation environments featured in previous works. Specifically AMaze provides feature-parity with respect to the previous reference implementation of the maze environment in facebookresearch/dcd.

Student environment

View source: envs/maze/maze.py

Static EnvParams

The table below summarizes the configurable static environment parameters of AMaze. The parameters that can be provided via minimax.train by default are denoted in the table below. Their corresponding command-line argument is the name of the parameter, preceded by the prefix maze, e.g. maze_n_walls for specifying n_walls.

Similarly, evaluation parameters can be specified via the prefix maze_eval, e.g. maze_eval_see_agent for specifying see_agent. Currently, minimax.train only accepts maze_eval_see_agent and maze_eval_normalize_obs.

Note that AMaze treats height and width as parameterizing only the portion of the maze grid that can vary, and thus excludes the 1-tile wall border surrounding each maze instance. Thus, a 15x15 maze in the prior MiniGrid-based implementation corresponds to an AMaze parameterization with height=13 and width=13.

ParameterDescriptionCommand-line support
heightHeight of mazeโœ…
widthWidth of mazeโœ…
n_wallsNumber of walls to place per mazeโœ…
agent_view_sizeSize of foward-facing partial observation see by agentโœ…
replace_wall_posWall positions are sampled with replacement if Trueโœ…
see_agentAgent sees itself in its partial observation if Trueโœ…
normalize_obsScale observation values to [0,1] if Trueโœ…
sample_n_wallsSample # walls placed between [0, n_walls] if Trueโœ…
obs_agent_posInclude agent_pos in the partial observationโœ…
max_episode_stepsMaximum # steps per episodeโœ…
singleton_seedFix the random seed to this value, making the environment a singleton

State space

VariableDescription
agent_posAgent's (x,y) position
agent_dirAgent's orientation vector
agent_dir_idxAgent's orientation enum
goal_posGoal (x,y) position
wall_mapH x W bool tensor, True in wall positions
maze_mapFull maze map with all objects for rendering
timeTime step
terminalTrue iff episode is done

Observation space

VariableDescription
imagePartial observation seen by agent
agent_dirAgent's orientation enum
agent_posAgent's (x,y) position (not included by default)

Action space

Action indexDescription
0Left
1Right
2Foward
3Pick up
4Drop
5Toggle
6Done

Note that the navigation environments only use actions 0 through 2, however all actions are included for parity with the original MiniGrid-based environments.

Teacher environment

View source: envs/maze/maze_ued.py

To support autocurricula generated by a co-adapting teacher policy (e.g. PAIRED), AMaze includes UEDMaze, which implements the teacher's MDP for designing Maze instances. By design, a pair of Maze and UEDMaze objects (corresponding to a specific setting of EnvParams) can be wrapped into a UEDEnvironment object for use in a training runner (see PAIREDRunner for an example).

The parameters that can be provided via minimax.train by default are denoted in the table below. Their corresponding command-line argument is the name of the parameter, preceded by the prefix maze_ued, e.g. maze_ued_n_walls for specifying n_walls. Note that when the corresponding maze_* and maze_ued_* arguments conflict, those specified in maze_* take precedent.

Static EnvParams

VariableDescriptionCommand-line support
heightHeight of mazeโœ…
widthWidth of mazeโœ…
n_wallsWall budgetโœ…
noise_dimSize of noise vector in the observationโœ…
replace_wall_posIf True, placing an object over an existing way replaces it. Otherwise, the object is placed in a random unused position.โœ…
fixed_n_wall_stepsFirst n_walls actions are wall positions if True. Otherwise, the first action only determines the fraction of wall budget to use.โœ…
first_wall_pos_sets_budgetFirst wall position also determines the fraction of wall budget to use (rather than using a separate first action to separately determine this fraction)โœ…
set_agent_dirIf True, the action in an extra last time step determines the agent's initial orientation indexโœ…
normalize_obsIf True, Scale observation values to [0,1]โœ…

State space

VariableDescription
encodingA 1D vector encoding the running action sequence of the teacher
timecurrent time step
terminalTrue if the episode is done

Observation space

VariableDescription
imageFull maze_map of the maze instance under construction
timeTime step
noiseA noise vector sampled from Uniform(0,1)

Action space

The action space corresponds to integers in [0,height*width]. Each action corresponds to a selected wall location in the flattened maze grid, with the exception of the last two actions, which correspond to the goal position and the agent's starting position. This interpretation of the action sequence can change based on the specific configuration of EnvParams:

  • If params.replace_wall_pos=True, the first action corresponds to the number of walls to place in the current episode.

  • If params.set_agent_dir=True, an additional step is appended to the episode, where the action corresponds to the agent's initial orientation index.

OOD test environments

The AMaze module includes the set of OOD, human-designed environments for testing zero-shot transfer from previous studies (See the figure above for a summary of these environments). Several of these environments are procedurally-generated:

  • Maze-SmallCorridor
  • Maze-LargeCorridor
  • Maze-FourRooms
  • Maze-Crossing
  • Maze-PerfectMaze*

The OOD maze environments are defined in envs/maze/maze_ood.py. They each subclass Maze and support customization via the EnvParams configuration, e.g. changing the default height or width values to generate larger or smaller instances.