NLP4Code

April 25, 2023 · View on GitHub

Repository for the NLP4Code project at the LILY lab.

Installation

[Recommended] Create a virtualenv or conda enviroment

conda create -n nlp4code python=3.8
conda activate nlp4code

Then, install the dependencies:

pip install -r requirements.txt

(Optional) At any point, if you met with the Python import problem (e.g., ModuleNotFoundError), try doing this in the main (NLP4Code) directory:

export PYTHONPATH=`pwd`

To run LLAMA-based model, you need to install the development version of transformers library:

pip install git+https://github.com/huggingface/transformers

We use Wandb for experiment tracking. Please register ask Ansong for an invitation to the Wandb Yale-LILY team before running experiments. When you are ready to run the exps and log it to the cloud, do the following:

wandb login

Paste your API key and the login is complete. When start running experiments, you should see something like

wandb: Tracking run with wandb version 0.12.11
wandb: Run data is saved locally in /home/ansongni/Code/NLP4Code/wandb/run-20220309_150158-1ebacxm4
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run mathqa-gpt-finetuning
wandb: ⭐️ View project at https://wandb.ai/yale-lily/unified-codegen
wandb: 🚀 View run at https://wandb.ai/yale-lily/unified-codegen/runs/1ebacxm4

If you want to do some test runs without logging to the cloud, run wandb offline first as suggested above.

Naming of the experiments

In the $*.yaml$ configuration file, you should see a line like

default_root_dir: &exp_name results/mathqa-gpt_neo_1.3B-finetuning

We automatically get the experiment name by the string after /, the tags for the experiments are automatically generated by spliting that string by -. In this case, the experiment will be named mathqa-gpt_neo_1.3B-finetuning and the tags will be ["mathqa", "gpt_neo_1.3B", "finetuning"]. Please follow this convention so that we can write all of this in one place.

Fine-tuning

(Read the previous sections first if you are ready to run experiments) For fine-tuning, in the main directory, do:

python finetuning/trainer.py fit --config finetuning/training_configs/*.yaml

Testing

There are some basic tests in the tests folder, to run all the tests (follow this link for more): To run tests, do

python -m unittest discover <test_directory>
# or
python -m unittest discover -s <directory> -p '*_test.py'