README.md
May 28, 2025 · View on GitHub
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image(CVPR 2025)
Paper
We release the code of the paper MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image in this repository.
Abstract
In this work, we propose a novel single-image 3D reconstruction method called Mining Effective Semantic Cues for 3D Reconstruction from a Single Image (MESC-3D), which can actively mine effective semantic cues from entangled features. Specifically, we design an Effective Semantic Mining Module to establish connections between point clouds and image semantic attributes, enabling the point clouds to autonomously select the necessary information. Furthermore, to address the potential insufficiencies in semantic information from a single image, such as occlusions, inspired by the human ability to represent 3D objects using prior knowledge drawn from daily experiences, we introduce a 3DSPL. This module incorporates semantic understanding of spatial structures, enabling the model to interpret and reconstruct 3D objects with greater accuracy and realism, closely mirroring human perception of complex 3D environments. Extensive evaluations show that our method achieves significant improvements in reconstruction quality and robustness compared to prior works. Additionally, further experiments validate the strong generalization capabilities and excels in zero-shot preformance on unseen classes.
Method
Overview of MESC-3D. Our network is composed of two main components. (a) The 3DSPL align point cloud modality features with text features, aiming to capture the unique 3D geometric characteristics of each category. (b) The ESM establishes a connection between the semantic feature Fi and the 3D point cloud at ith stage, allowing each point to autonomously select the most valuable semantic information.
Installation
Clone this repository and install the required packages:
- Install python Dependencies
git clone https://github.com/QINGQINGLE/MESC-3D.git
cd MESC-3D
conda create -n mesc3d python=3.9
conda activate mesc3d
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
- Compile PyTorch 3rd-party modules.
cd package/Pointnet2_PyTorch-master/
pip install -e .
pip install pointnet2_ops_lib/.
cd -
cd package/KNN_CUDA-master/
make && make install
- CLIP Usage The following step is the usage and modification of CLIP.
pip install git+https://github.com/openai/CLIP.git
Or
pip install clip
Inplace
def encode_text(self, text):
x = self.token_embedding(text).type(self.dtype) # [batch_size, n_ctx, d_model]
x = x + self.positional_embedding.type(self.dtype)
...
with
def encode_token(self, token):
x = self.token_embedding(token)
return x
def encode_text(self, text, token):
#x = self.token_embedding(text).type(self.dtype) # [batch_size, n_ctx, d_model]
x = text.type(self.dtype) + self.positional_embedding.type(self.dtype)
...
Dataset
Pretrained-model
We provide the following pretrained models: BaseModel, ProModel, ULIPModel, etc. Please download them from Google Drive.
Training
Testing
The remaining code is on the way.