Implementing a Custom Data Loader
April 25, 2025 ยท View on GitHub
This document outlines the steps and requirements for implementing a custom data loader compatible with the RoboSpatial annotation generation pipeline.
Overview
The data loader is responsible for interfacing with your specific dataset format and providing the necessary information (scenes, images, object instances, metadata) to the generation pipeline. To ensure compatibility, your custom loader must inherit from the BaseLoader abstract base class (robospatial.data_loader.base_loader.BaseLoader) and implement its required methods.
BaseLoader Interface
Your custom loader class must implement the following methods:
__init__(self, config)
- Purpose: Initializes the data loader. This typically involves loading annotations, setting up paths, and potentially pre-processing metadata.
- Args:
config (dict): A dictionary containing thedata_loadingsection from the configuration file (e.g.,configs/embodiedscan.yaml). This allows access to dataset paths, annotation file locations, selected datasets, splits, and other relevant parameters.
- Implementation Notes:
- Use the
configdictionary to locate and load your dataset's annotation files. - Store necessary metadata, such as class labels and mappings, as instance variables.
- Organize the loaded data in a way that facilitates efficient retrieval by the other methods (e.g., nested dictionaries keyed by dataset and scene name, as seen in
EmbodiedScanLoader).
- Use the
list_scenes(self, dataset_list)
- Purpose: Provides a generator that yields information about each scene within the specified datasets.
- Args:
dataset_list (list): A list of dataset names (strings) requested by the pipeline (e.g.,['scannet', '3rscan']).
- Returns:
generator: Yields tuples of(dataset_name, scene_idx, scene_name).dataset_name (str): The name of the dataset the scene belongs to.scene_idx (int): A unique index for the scene within its dataset (can be a simple counter).scene_name (str): A unique identifier for the scene, often including the dataset prefix (e.g.,'scannet/scene0000_00'). This name is used in subsequent calls.
list_images(self, dataset_name, scene_name)
- Purpose: Lists all images (or viewpoints) associated with a specific scene.
- Args:
dataset_name (str): The name of the dataset.scene_name (str): The unique identifier of the scene (obtained fromlist_scenes).
- Returns:
dict: A dictionary where keys are unique image identifiers (e.g.,'<scene_name>/<image_filename>'or'<scene_name>/<frame_id>') and values are dictionaries containing image-specific annotations. Each image annotation dictionary must include:extrinsic(or equivalent): 4x4 Transformation matrix (e.g., NumPy array or list of lists) from camera coordinates to the global/world coordinate system of the scene.
# Example: [[ -0.9897, 0.1085, 0.0927, 1.2120], [ -0.0330, 0.4577, -0.8884, 0.3075], [ -0.1388, -0.8824, -0.4494, 1.4804], [ 0. , 0. , 0. , 1. ]]intrinsic: 4x4 Camera intrinsics matrix (e.g., NumPy array or list of lists).
# Example: [[ 1170.18, 0. , 647.75, 0. ], [ 0. , 1170.18, 483.75, 0. ], [ 0. , 0. , 1. , 0. ], [ 0. , 0. , 0. , 1. ]]img_path: Absolute or relative path to the image file.img_pathgets joined withimage_rootpath in the config file.- Any other metadata required by
list_objects(e.g.,visible_instance_idsinEmbodiedScanLoader).
list_objects(self, dataset_name, scene_name, image_ann)
- Purpose: Identifies and processes object instances visible from a specific viewpoint (image). It organizes objects based on visibility and category, handles duplicate categories, and calculates scene bounds.
- Args:
dataset_name (str): The name of the dataset.scene_name (str): The unique identifier of the scene.image_ann (dict): The annotation dictionary for a single image, obtained from the output oflist_images.
- Returns:
tuple: A 5-element tuple containing:vis_objs (dict): Dictionary of visible, non-environmental objects.- Keys: Object category name. If multiple instances of the same category are visible, append an index (e.g.,
'chair_0','chair_1'). Environmental objects like 'wall', 'floor', 'ceiling', and generic 'object' categories should be excluded. - Values: Instance annotation dictionaries. Each dictionary should contain at least:
category (str): The original object category label.name (str): The potentially indexed name used as the key invis_objs.bbox_3d(or equivalent, optional but recommended): The original 3D bounding box representation from your dataset (e.g., 9 DoF parameters: center, size, orientation). While the pipeline primarily uses theobbfor calculations, this originalbbox_3dis saved in the final annotations if provided.obb: The Open3DOrientedBoundingBoxrepresentation (open3d.geometry.OrientedBoundingBox). This is crucial for spatial relationship calculations. Yourlist_objectsimplementation is responsible for creating this, often by converting frombbox_3d(seeEmbodiedScanLoaderline ~241 for an example using_9dof_to_box) or by generating it directly if your dataset provides OBBs.
- Keys: Object category name. If multiple instances of the same category are visible, append an index (e.g.,
unique_vis_categories (set): A set of category names (strings) for objects that appear exactly once in thevis_objsdictionary (excluding environmental/generic categories).multi_vis_categories (set): A set of category names (strings) for objects that appear multiple times in thevis_objsdictionary (excluding environmental/generic categories).floor_bound (list): A list containing twonumpy.ndarrays representing the minimum and maximum coordinates[min_bound, max_bound]that encompass the floor and all non-environmental objects. This is often derived from the combined OBBs of relevant objects.all_objs (dict): Dictionary of all non-environmental objects associated with the scene (not just the current view), keyed by their potentially indexed name (e.g., 'chair_0'). Used for occupancy map generation or other downstream tasks. The structure mirrorsvis_objsbut includes objects not necessarily visible in the currentimage_ann. Each object dictionary must contain at leastcategory,name, andobb. Includingbbox_3dis recommended if available. Note: Depending on your dataset structure, you might populate this similarly tovis_objsbased onvisible_instance_idsor load all scene objects separately.
Configuration
To use your custom data loader, update the data_loading section in your configuration file (e.g., configs/example_config.yaml):
data_loading:
# ... other settings ...
loader_class: path.to.your.module.YourCustomLoaderClassName # Update this line
# Provide any custom keys your loader's __init__ needs
your_custom_annotation_path:
train: /path/to/your/train_annotations.pkl
val: /path/to/your/val_annotations.pkl
# ... other dataset-specific paths or parameters ...
- Set
loader_classto the fully qualified Python path of your custom loader class. - Ensure any necessary configuration parameters (like annotation file paths) needed by your loader's
__init__method are present in thedata_loadingsection.
Example
Refer to data_loader.embodiedscan_loader.EmbodiedScanLoader for a concrete implementation example using datasets like ScanNet, Matterport3D, and 3RScan.
Additionally, refer to data_loader.example_loader.py for a simpler implementation tailored specifically to the JSON annotation format found in the example_data/ directory. This loader demonstrates how to handle the example annotations provided for testing the pipeline.
Visualizing Your Loader Output
To verify that your custom data loader is producing the correct outputs (specifically the object instances with their 3D bounding boxes and camera parameters), you can use the provided visualization script: scripts/visualize_input.py.
Purpose:
This script takes an image file and a corresponding intermediate annotation JSON file (similar to those in example_data/annotations/, representing the data your loader would prepare for a single image) as input. It reads the camera parameters (extrinsic, intrinsic) and the object information (specifically bbox_3d) from the JSON. It then projects the 3D bounding boxes onto the 2D image and displays the result.
This helps you visually confirm:
- Camera parameters (
extrinsic,intrinsic) are correct. - Oriented object bounding boxes (derived from
bbox_3d) align with the objects in the image. - The data format your loader prepares is being interpreted correctly before passing it to the main pipeline.
Important Note:
The provided visualization script, scripts/visualize_input.py, is designed to help debug your custom loader's output before running the full generation pipeline. It reads an intermediate JSON file (like those in example_data/annotations/) which represents the data your loader passes for a single image.
Currently, this script expects the JSON to contain an objects array. For each object in this array, it specifically looks for a bbox_3d field containing a list with 9 DoF parameters (center, size, rotation) as its first element. It uses these parameters to generate an Open3D OrientedBoundingBox (obb) via the _9dof_to_box function for visualization.
- If your custom loader generates an intermediate JSON where the 3D bounding box information is stored differently (e.g., different format within
bbox_3d, different field name, or only providing a pre-computedobb), you will need to modify thevisualize_single_imagefunction inscripts/visualize_input.py(around line 195) to correctly parse your data and create theo3d_boxfor drawing.