BBScore calculation
March 11, 2025 ยท View on GitHub
This repository contains code for paper "BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence" published in AAAI 2024.
BBScore calculation
To calculate BBScore, follow these steps:
- Train a BB encoder to project text into a latent space.
- During inference, pass the training set first to estimate sigma.
- Input the text and sigma to generate BBScore.
Due to file size constraints, we provide a pre-trained encoder using Wikisection training data with a latent dimension of 8. If you use this encoder, you can skip directly to step 2 for BBScore calculation. However, since the current encoder is domain-specific, we encourage you to train a customized encoder using your own data for optimal performance.
1. Training the encoder
- create virtual environments
conda create -n bbscore python=3.10.10
- in the virtual environments, install necessary libraries
pip install -r requirements.txt
- in
src/encoder/config/config.yaml, change the path and other settings - run
src/encoder/train_encoder.pyto train a BB encoder from pretrained model (i.e. GPT2)
2. Compute BBScore
- run
scripts/get_latents.shto generate latents for the input text. Note: user will need to change the path to the trained encoder, the training corpus and the input text, as well as the output directory in the file. - Once the latents are calculated, run
src/scores/bbscore.pyand specify the latent directory and output directory to get the results. - The output would be a list of length , is the number of lines(documents) in the input text file.
@inproceedings{sheng2024bbscore,
title={BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence},
author={Sheng, Zhecheng and Zhang, Tianhao and Jiang, Chen and Kang, Dongyeop},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={13},
pages={14937--14945},
year={2024}
}