README.txt

January 20, 2026 · View on GitHub

Directions on how to setup code.

Step 1 (Download Datasets):

Waterbirds Dataset: -- Backgrounds (We used the augmented backgrounds linked in the repo): https://github.com/mymakar/causally_motivated_shortcut_removal/tree/master/waterbirds -- Birds and Segmentations: https://www.vision.caltech.edu/datasets/cub_200_2011/ KOA Dataset: https://nda.nih.gov/oai Food Review: https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews

Step 2 (Fill in constants to link dataset files to experiment scripts):

Once the dataset files have been downloaded, you need to create directories to store the files for each dataset. Next, you need to fill out the the appropriate constants in the const.py file with the directory locations. The constants for the dataset files downloaded from the internet have "RAW_DATA_DIR" in the name. Additionally, another directory will need to be created for each dataset, that will be used to store the generated datasets. The constants for these directories have "DATASET_DIR" in the name.

Step 3 (Process datasets):

When the constants have been filled out, you will now need to process the raw data. For the KOA data, first run the extract_zip_data.py and dicom_to_png.py scripts.

Step 4 (Create the datasets):

create_datasets.py is used to create all of the datasets.

Step 5 (Run experiments):

teacher_cross_val.py: Is used for performing cross validation for the teacher models used in TIPMI. mm_cross_val.py: Is used for performing cross validation for the mediator models used in MBM. cross_val.py: Is used for performing cross validation for each model. evaluate.py: Is used to generate the final results for each model.