๐ Replication Package for RealClassEval
October 31, 2025 ยท View on GitHub
This repository contains the replication package for the paper:
Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation
Authors: Musfiqur Rahman, SayedHassan Khatoonabadi, Emad Shihab
Published at: arXiv.org, 2025
The goal of this repository is to ensure reproducibility of all experiments, figures, and results presented in the paper.
Clone the repository
git clone https://github.com/mrsumitbd/RealClassEval-Replication.git
cd RealClassEval-Replication
๐ Repository Structure
RealClassEval-Replication/
โโโ notebooks/ # Jupyter notebooks for experiments, analysis, figures
โ โโโ plot_generator.ipynb
โ
โโโ src/ # Python source code (modules, utilities, pipelines)
โ โโโ __init__.py
โ โโโ rag/
โ โโโ ...
โ โโโ utils.py
โ
โโโ data/ # Placeholder for datasets and metadata
โ โโโ functional_correctness_data/
โ โโโ generated_code/
โ
โโโ results/ # Output results (figures, metrics, etc.)
โ โโโ rq1/
โ โโโ ...
โ โโโ rq4/
โ
โโโ rag_experiments/ # Stores all files generated during running rag
โ
โโโ functional_correctness_test_folder/ # This is where the functional correctness test happened. Kept is separate for easier access and organization
โ
โโโ setup.sh # Setup script for Linux/macOS
โโโ .gitignore # Ignore unnecessary files
โโโ .env.example # Template for environment variables
โโโ requirements.txt # Python dependencies
โโโ environment.yml # Conda environment file
โโโ README.md # Documentation
โโโ LICENSE # License file
โ๏ธ Setup Instructions (Linux/macOS)
Option 1: Quick setup (recommended)
bash setup.sh
This will:
- Verify that Python 3.11 is installed
- Create a virtual environment in venv/
- Install all dependencies from requirements.txt
After running the script, activate the environment manually:
source venv/bin/activate
Option 2: Manual setup
1. Create a virtual environment (Python 3.11)
python3.11 -m venv venv
source venv/bin/activate # On macOS/Linux
2. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
Option 3: Using Conda
If you prefer Conda, use the provided environment.yml:
# Create the environment
conda env create -f environment.yml
# Activate it
conda activate OpenClassGen-replication
After setting up the environment (using one of the three options above), create a .env file in the root directory by copying the .env.example:
cp .env.example .env # Linux/macOS
๐ ๏ธ System Dependencies
In addition to the Python environment, some scripts require external tools.
cloc (Count Lines of Code)
This project uses cloc to count lines of code.
You need to install it separately on your system.
macOS
If you use Homebrew:
brew install cloc
Linux (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install cloc
Linux (Fedora/RHEL)
sudo dnf install cloc
Once installed, you can verify with:
cloc --version
๐ Datasets
Datasets are included in the data/ folder.
๐ Results
All results (figures, tables) are stored in the results/ directory.
Pre-generated results are provided for reference where possible.
๐ License
This project is licensed under the MIT License.
๐ Citation
If you use this replication package, please cite our paper:
@misc{rahman2025syntheticbenchmarksevaluatingllm,
title={Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation},
author={Musfiqur Rahman and SayedHassan Khatoonabadi and Emad Shihab},
year={2025},
eprint={2510.26130},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2510.26130},
}