LGT-Net

May 18, 2023 · View on GitHub

This is PyTorch implementation of our paper "LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network"(CVPR'22). [Supplemental Materials] [Video] [Presentation] [Poster]

network

Update

  • 2023.5.18 Update post-processing. If you want to reproduce the post-processing results of paper, please switch to the old commit. Check out the Post-Porcessing.md for more information.

Demo

Installation

Install our dependencies:

pip install -r requirements.txt

Preparing Dataset

MatterportLayout

Office MatterportLayout dataset is at here.

If you have problems using this dataset, attention to this issue.

Make sure the dataset files are stored as follows:

src/dataset/mp3d
|-- image
|   |-- 17DRP5sb8fy_08115b08da534f1aafff2fa81fc73512.png
|-- label
|   |-- 17DRP5sb8fy_08115b08da534f1aafff2fa81fc73512.json
|-- split
    |-- test.txt
    |-- train.txt
    |-- val.txt


Statistics

SplitAll4 Corners6 Corners8 Corners>=10 Corners
All22951210502309274
Train1647841371225210
Val190108462115
Test458261856349

ZInd

Office ZInd dataset is at here.

Make sure the dataset files are stored as follows:

src/dataset/zind
|-- 0000
|   |-- panos
|   |   |-- floor_01_partial_room_01_pano_14.jpg
|   |-- zind_data.json
|-- room_shape_simplicity_labels.json
|-- zind_partition.json

Statistics

SplitAll4 Corners5 Corners6 Corners7 Corners8 Corners9 Corners>=10 CornersManhattanNo-Manhattan(%)
All31132172931803730777422912381426266644468(14.35%)
Train24882138661507574564117911961136212283654(14.69%)
Val3080170215374581239221382647433(14.06%)
Test3170172514381752261201522789381(12.02%)

PanoContext and Stanford 2D-3D

We follow the same preprocessed pano/s2d3d proposed by HorizonNet. You also can directly download the dataset file in here.

Make sure the dataset files are stored as follows:

src/dataset/pano_s2d3d
|-- test
|   |-- img
|   |   |-- camera_0000896878bd47b2a624ad180aac062e_conferenceRoom_3_frame_equirectangular_domain_.png
|   |-- label_cor
|       |-- camera_0000896878bd47b2a624ad180aac062e_conferenceRoom_3_frame_equirectangular_domain_.txt
|-- train
|   |-- img
|   |-- label_cor
|-- valid
    |-- img
    |-- label_cor
     

Downloading Pre-trained Weights

We provide pre-trained weights on individual datasets at here.

Make sure the pre-trained weight files are stored as follows:

checkpoints
|-- SWG_Transformer_LGT_Net
|   |-- ablation_study_full
|   |   |-- best.pkl
|   |-- mp3d
|   |   |-- best.pkl
|   |-- pano
|   |   |-- best.pkl
|   |-- s2d3d
|   |   |-- best.pkl
|   |-- zind
|       |-- best.pkl

Evaluation

You can evaluate by executing the following command:

  • MatterportLayout dataset
    python main.py --cfg src/config/mp3d.yaml --mode test --need_rmse
    
  • ZInd dataset
    python main.py --cfg src/config/zind.yaml --mode test --need_rmse
    
  • PanoContext dataset
    python main.py --cfg src/config/pano.yaml --mode test --need_cpe --post_processing manhattan --force_cube
    
  • Stanford 2D-3D dataset
    python main.py --cfg src/config/s2d3d.yaml --mode test --need_cpe --post_processing manhattan --force_cube
    
    • --post_processing type of post-processing approach, we use DuLa-Net post-processing and optimize by adding occlusion detection (described in here ) to process manhattan constraint (manhattan_old represents the original method), use DP algorithm to process atalanta constraint, default is disabled.
    • --need_rmse need to evaluate root mean squared error and delta error, default is disabled.
    • --need_cpe need to evaluate corner error and pixel error, default is disabled.
    • --need_f1 need to evaluate corner metrics (Precision, Recall and F1_1-score) with 10 pixels as threshold(code from here), default is disabled.
    • --force_cube force cube shape when evaluating, default is disabled.
    • --wall_num different corner number to evaluate, default is all.
    • --save_eval save the visualization evaluating results of each panorama, the output results locate in the corresponding checkpoint directory (e.g., checkpoints/SWG_Transformer_LGT_Net/mp3d/results/test), default is disabled.

Training

Execute the following commands to train (e.g., MatterportLayout dataset):

python main.py --cfg src/config/mp3d.yaml --mode train

You can copy and modify the configuration in YAML file for other training.

Inference

We provide an inference script (inference.py) that you can try to predict your panoramas by executing the following command (e.g., using pre-trained weights of MatterportLayout dataset):

python inference.py --cfg src/config/mp3d.yaml --img_glob src/demo/demo1.png --output_dir src/output --post_processing manhattan

It will output json files(xxx_pred.json, format is the same as PanoAnnotator) and visualization images (xxx_pred.png) under output_dir. visualization image: pred

  • --img_glob a panorama path or directory path for prediction.

  • --post_processing If manhattan is selected, we will preprocess the panorama so that the vanishing points are aligned with the axes for post-processing. Note that after preprocessing our predicted results will not align with your input panoramas, you can use the output file (vp.txt) of vanishing points to reverse align them manually.

  • --visualize_3d 3D visualization of output results (need install dependencies and GUI desktop environment).

  • --output_3d output the object file of 3D mesh reconstruction.

Acknowledgements

The code style is modified based on Swin-Transformer.

Some components refer to the following projects:

Citation

If you use this code for your research, please cite

@InProceedings{jiang2022lgt,
    author    = {Jiang, Zhigang and Xiang, Zhongzheng and Xu, Jinhua and Zhao, Ming},
    title     = {LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2022}
}