RoboSpatial Annotation Generation Logic
April 25, 2025 ยท View on GitHub
For those who wants to hack the codebase!
Annotation Generation Details
This section provides a more detailed overview of the logic used to generate each type of spatial annotation and highlights the key configuration parameters found in configs/example_config.yaml (and other configuration files) that control this process.
1. Object Grounding (spatial_analysis/grounding.py)
- Purpose: Generates a tight 2D axis-aligned bounding box (
clipped_bbox) encompassing all visible pixels of an object in the image. - Logic:
- Relies on a pre-calculated 2D boolean
occupancy_mapfor each object, which indicates the precise pixels covered by the object's 3D model when projected onto the image. - It finds the minimum and maximum
x(column) andy(row) coordinates within this occupancy map. - These bounds directly define the
[xmin, ymin, xmax, ymax]coordinates of the 2D bounding box.
- Relies on a pre-calculated 2D boolean
- Key Parameters: None directly in the configuration for this step; it depends on the accuracy of the input 3D models and the camera parameters used to generate the
occupancy_map.
2. Spatial Context (spatial_analysis/context/context.py)
- Purpose: Samples points in the empty space surrounding a reference object and categorizes them based on their spatial relationship (infront, behind, left, right) in three different frames:
objectcentric,cameracentric, andworldcentric. - Logic:
- Calculates empty space on the floor using a top-down 2D grid based on environment geometry.
- Identifies empty grid points within a specific distance (
threshold) from the reference object's 2D footprint. - Projects these candidate 3D points (at the object's base height) onto the image, filtering those outside the view or behind the camera.
- Checks if the projected 2D points are occluded by other objects using a pre-computed environment occupancy map.
- Categorizes the non-occluded points based on their position relative to the reference object in the three frames (using object orientation for object-centric, and pixel/depth coordinates for camera/world-centric).
- Randomly samples up to
num_samplesnon-occluded points for each valid category (frame + direction).
- Key Parameters (
configs/example_config.yaml->data_generation.generation_options):context_threshold: Maximum distance (in world units, e.g., meters) from the reference object's footprint to consider sampling points.context_grid_resolution: The size of each cell in the temporary top-down 2D grid used for finding nearby empty space. Smaller values are more precise but computationally more expensive.context_num_samples: The maximum number of points to sample and output for each valid category (e.g., max 10 points for 'camera_centric' 'left').
3. Spatial Configuration (spatial_analysis/configuration/configuration.py)
- Purpose: Determines the pairwise spatial relationship between two objects (
obj1relative toobj2) across the three reference frames (camera, world, object). Relationships include left/right, infront/behind, above/below, and overlapping. - Logic:
- Calculates various geometric metrics for both objects (projected 2D bounds, average visible depth, world Z bounds, etc.) using their individual pre-computed
occupancy_maps. - Camera/World-centric: Compares these metrics. The
strictnessparameter controls the comparison method:'strict': Uses the absolute min/max bounds. Requires clear separation; considers objects overlapping if their projected bounds intersect at all. Sensitive to partial occlusions.'lenient': Uses projected centers, average visible depths, and average Z coordinates. More robust to partial occlusion but might misclassify tightly packed objects.
- Object-centric: Uses the Separating Axis Theorem (SAT) on the 3D OBBs to check for overlap. If not overlapping, it determines the direction based on the relative position of
obj1's center projected ontoobj2's local forward and right axes. Above/below still uses world Z coordinates.
- Calculates various geometric metrics for both objects (projected 2D bounds, average visible depth, world Z bounds, etc.) using their individual pre-computed
- Key Parameters (
configs/example_config.yaml->data_generation.generation_options):spatial_configuration_strictness: ('strict'or'lenient') Selects the comparison logic for camera-centric and world-centric frames. Default is'lenient'.pairwise_relationship_mode: ('unique_categories_only'or'all_visible_objects') Determines which pairs of objects are considered for configuration analysis.'unique_categories_only'only considers pairs where each object is the only instance of its category visible, while'all_visible_objects'considers all permutations of visible objects.
4. Spatial Compatibility (spatial_analysis/compatibility/compatibility.py)
- Purpose: Assesses whether one object (
obj_a) could be placed in the empty space relative to another (obj_b) without collision. It checks directions like left, right, in front, behind, and specificallyon_top. - Logic:
- Samples potential placement points around
obj_busing the Spatial Context logic (get_point_in_space_relative_to_object), using a dynamic threshold based on the sizes ofobj_aandobj_b. - For each sampled point, it simulates placing
obj_ahorizontally centered at that point's 2D location. - It checks for collisions between the placed
obj_a(potentially with abuffer_ratio) and:- The static environment (using a 2D occupancy grid).
- The reference object
obj_b(maintaining amin_distance).
- A relationship (e.g., 'left') is considered compatible (
True) if any sampled point corresponding to that relationship allowsobj_ato fit. - A separate, simpler check (
can_fit_on_top) determines the 'on_top' relationship by comparing the horizontal dimensions ofobj_aandobj_b, but only ifobj_ais placeable andobj_bhas a flat surface.
- Samples potential placement points around
- Key Parameters (
configs/example_config.yaml->data_generation.generation_options):compatibility_grid_resolution: Resolution of the 2D grid used for collision checking against the environment.compatibility_num_samples: How many potential placement points to sample aroundobj_b.compatibility_min_distance: The minimum required distance (in world units) between the placedobj_aand the referenceobj_b.compatibility_buffer_ratio: A ratio applied toobj_a's dimensions during collision checks, effectively adding a safety margin. 0 means no buffer, 0.1 means 10% buffer.context_threshold: The base threshold used for sampling points (dynamically increased based on object sizes).
Project Structure
configs/: Contains YAML configuration files (e.g.,example_config.yaml).data_loader/: Modules for loading and interfacing with different 3D datasets. Includesembodiedscan_loader.pyand a README explaining how to add custom loaders.spatial_analysis/: Modules performing the core spatial reasoning and annotation generation logic.context/: Logic for spatial context (points relative to an object).configuration/: Logic for spatial configuration (relative position between objects).compatibility/: Logic for spatial compatibility (fitting assessment).grounding.py: Logic for 2D object grounding.relationships.py: High-level wrappers for spatial analysis functions.relationship_utils.py: Utility functions for geometry and projections.topdown_map.py: Functions for creating 2D top-down occupancy grids.obj_properties.py: Lists defining object properties (e.g.,items_with_face).
annotation_generator.py: Orchestrates the generation process for a single scene.run_generation.py: Main script to run annotation generation across datasets/scenes.
Output Files
<output_dir>/<dataset>/<scene_id>/<image_name>.annotations.json: The primary output. Contains the generated spatial annotations for a single image, structured by type (grounding, unary relations, pairwise relations).generation_progress.json: Stores a map of datasets to lists of scene names that have been successfully processed. Allows the script to resume if interrupted. Located in the directory whererun_generation.pyis executed.generation_stats.json: Contains aggregated statistics about the generated annotations (e.g., counts of each annotation type) overall and per-dataset. Located in the directory whererun_generation.pyis executed.