Tasks

October 27, 2021 · View on GitHub

The available values for level_name are as follows:

'architecture_comparison/fast_map_three_objs'
'num_generalization/fast_map_eight_objs'
'num_generalization/fast_map_five_objs'
'num_generalization/fast_map_three_objs'
'new_obj_generalization/fast_map_three_objs_global_five'
'new_obj_generalization/fast_map_three_objs_global_ten'
'new_obj_generalization/fast_map_three_objs_global_three'
'new_obj_generalization/fast_map_three_objs_global_twenty'
'new_obj_generalization/fast_map_heldout_test_objs'
'intrinsic_motivation/fast_map_three_objs_no_shaping_reward'
'fast_slow/fast_map_three_objs'
'fast_slow/fast_map_three_objs_bed_tray'
'fast_slow/fast_map_three_objs_bed_tray_putting_near'
'fast_slow/fast_map_three_objs_bed_tray_putting_on'
'fast_slow/slow_learn_three_objs_bed_tray_lifting'
'fast_slow/slow_learn_three_objs_bed_tray_putting_near'
'fast_slow/slow_learn_three_objs_bed_tray_putting_on'
'fast_slow/two_phase_slow_learn_three_objs_bed_tray_putting_near'
'fast_slow/two_phase_slow_learn_three_objs_bed_tray_putting_on'
'fast_slow/test_holdout_fast_map_three_objs_bed_tray_putting_on'
'with_distractors/eval_fast_map_two_episodes_three_objs_five_distractor',
'with_distractors/eval_fast_map_three_episodes_three_objs_five_distractor',
'with_distractors/eval_fast_map_four_episodes_three_objs_no_distractor',
'with_distractors/eval_fast_map_four_episodes_three_objs_one_distractor',
'with_distractors/eval_fast_map_three_objs_ten_distractor',
'with_distractors/eval_fast_map_three_objs_twenty_distractor',
'with_distractors/fast_map_three_objs_no_distractor',
'with_distractors/fast_map_three_objs_one_distractor',
'with_distractors/fast_map_three_objs_two_distractor',

Experiments from "Grounded Language Learning: Fast and Slow"

These tasks correspond different experiments:

architecture_comparison (Table 1, Section 4.0):
- Train on 'architecture_comparison/fast_map_three_objs'
num_generalization (Figure 2, Section 4.1). E.g:
- Train on 'num_generalization/fast_map_three_objs'
- Test on 'num_generalization/fast_map_five_objs', 'num_generalization/fast_map_eight_objs'
new_obj_generalization (Figure 3, Section 4.1). E.g:
- Train on 'new_obj_generalization/fast_map_three_objs_global_ten'
- Test on 'new_obj_generalization/fast_map_heldout_test_objs'
instrinsic_motivation (Figure 5, Section 4.2):
- Train on 'intrinsic_motivation/fast_map_three_objs_no_shaping_reward'
fast_slow (Figure 6, Section 4.3). E.g. (unfamiliar objects, unfamiliar task):
- Train on 'fast_slow/slow_learn_three_objs_bed_tray_lifting', 'fast_slow/slow_learn_three_objs_bed_tray_putting_near', 'fast_slow/slow_learn_three_objs_bed_tray_putting_on', 'fast_slow/fast_map_three_objs', 'fast_slow/fast_map_three_objs_bed_tray'
- Test on 'fast_slow/test_holdout_fast_map_three_objs_bed_tray_putting_on'

The sections refer to the version of the paper hosted on arXiv on 1 November 2020 (arxiv). Note that we do not release the experiments involving ShapeNet assets (Figure 4, Section 4.1) for copyright reasons.

Experiments from "Towards mental time travel: a hierarchical memory for RL..."

The tasks prefixed with with_distractors are the rapid-word-learning tasks from Figure 5, Section 3.3:

Length generalization (Fig. 5d):
- Train on 'with_distractors/fast_map_three_objs_no_distractor' 'with_distractors/fast_map_three_objs_one_distractor' 'with_distractors/fast_map_three_objs_two_distractor'
- Test on 'with_distractors/eval_fast_map_three_objs_ten_distractor'
Generalization to multi-episode evaluation (Fig. 5e-f) with:
- Train on same as previous: 'with_distractors/fast_map_three_objs_no_distractor' 'with_distractors/fast_map_three_objs_one_distractor' 'with_distractors/fast_map_three_objs_two_distractor'
- Test on 'with_distractors/eval_fast_map_four_episodes_three_objs_no_distractor' 'with_distractors/eval_fast_map_two_episodes_three_objs_five_distractor'

The section and figure numbers refer to the updated paper version (arXiv) that was posted in October, 2021.

Actions

The environment provides the following actions:

STRAFE_LEFT_RIGHT
MOVE_BACK_FORWARD
LOOK_LEFT_RIGHT
LOOK_DOWN_UP
HAND_ROTATE_AROUND_RIGHT
HAND_ROTATE_AROUND_UP
HAND_ROTATE_AROUND_FORWARD
HAND_PUSH_PULL
HAND_GRIP

Each action is a double scalar, with an inclusive range of [-1.0, 1.0] except for HAND_GRIP, which is a binary action taking values 0 or 1. It is not compulsory to send a value for each action every step, but note that actions are "sticky", meaning an action's value will only change when a new value is provided. For example:

env = dm_fast_mapping.load_from_docker(settings)
env.reset()
env.step({'STRAFE_LEFT_RIGHT': -1.0}) # Result: strafe Left.
env.step({'MOVE_BACK_FORWARD': 1.0}) # Result: strafe left & move backward.

env.step({'STRAFE_LEFT_RIGHT': 0.0,
          'MOVE_BACK_FORWARD': 0.0}) # Result: stationary.

Note that when using the provided script human_agent.py to try the tasks, only the STRAFE_LEFT_RIGHT (keys a, d), MOVE_BACK_FORWARD (s, w), LOOK_LEFT_RIGHT (left_arrow, right_arrow), LOOK_DOWN_UP (down_arrow, up_arrow) and HAND_GRIP(spacebar) are available.

Observations

For the 8 Unity-based tasks, the environment provides the following observations:

RGB_INTERLEAVED: First person RGB camera observation. The width and height can be adjusted through the EnvironmentSettings, but the observation will always have a fixed 4:3 aspect ratio.
TEXT: A string indicating the instructions or language information provided by the environment.

Configurable environment settings

Required attributes:

seed: Seed to initialize the environment's RNG.
level_name: Name of the level to load.

Optional attributes:

width: Width (in pixels) of the desired RGB observation; defaults to 96.
height: Height (in pixels) of the desired RGB observation; defaults to 72.
episode_length_seconds: Maximum episode length (in seconds); defaults to 120.
num_action_repeats: Number of times to step the environment with the provided action in calls to step().