README.md

May 28, 2025 · View on GitHub

MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image(CVPR 2025)

Paper

We release the code of the paper MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image in this repository.

In this work, we propose a novel single-image 3D reconstruction method called Mining Effective Semantic Cues for 3D Reconstruction from a Single Image (MESC-3D), which can actively mine effective semantic cues from entangled features. Specifically, we design an Effective Semantic Mining Module to establish connections between point clouds and image semantic attributes, enabling the point clouds to autonomously select the necessary information. Furthermore, to address the potential insufficiencies in semantic information from a single image, such as occlusions, inspired by the human ability to represent 3D objects using prior knowledge drawn from daily experiences, we introduce a 3DSPL. This module incorporates semantic understanding of spatial structures, enabling the model to interpret and reconstruct 3D objects with greater accuracy and realism, closely mirroring human perception of complex 3D environments. Extensive evaluations show that our method achieves significant improvements in reconstruction quality and robustness compared to prior works. Additionally, further experiments validate the strong generalization capabilities and excels in zero-shot preformance on unseen classes.

Method

Overview of MESC-3D. Our network is composed of two main components. (a) The 3DSPL align point cloud modality features with text features, aiming to capture the unique 3D geometric characteristics of each category. (b) The ESM establishes a connection between the semantic feature Fi and the 3D point cloud at ith stage, allowing each point to autonomously select the most valuable semantic information.

Installation

Clone this repository and install the required packages:

Install python Dependencies


git clone https://github.com/QINGQINGLE/MESC-3D.git
cd MESC-3D

conda create -n mesc3d python=3.9
conda activate mesc3d
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

pip install -r requirements.txt

Compile PyTorch 3rd-party modules.


cd package/Pointnet2_PyTorch-master/
pip install -e .
pip install pointnet2_ops_lib/.

cd -
cd package/KNN_CUDA-master/
make && make install

CLIP Usage The following step is the usage and modification of CLIP.


pip install git+https://github.com/openai/CLIP.git
Or
pip install clip

Inplace

def encode_text(self, text):
    x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]
    x = x + self.positional_embedding.type(self.dtype)
    ...

with

def encode_token(self, token):
    x = self.token_embedding(token)
    return x
def encode_text(self, text, token):
    #x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]
    x = text.type(self.dtype) + self.positional_embedding.type(self.dtype)
    ...

README.md

MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image(CVPR 2025)

Paper

Abstract

Method

Installation

Dataset

Pretrained-model

Training

Testing