Preprocessing from raw data 从原始数据处理

February 20, 2024 · View on GitHub

The following preprocessing steps can be quite tedious. Please post issues if you cannot run the scripts.
datasets: Amazon
-- Rating file in Files/Small subsets for experimentation
-- Meta files in Per-category files, [metadata], [image features]

There has been an issue with the dataset site lately, as it automatically redirects to an updated version of the dataset. Keep pressing ESC to stop the redirecting action.

Step by step

Performing 5-core filtering, re-indexing - run 0rating2inter.ipynb
Train/valid/test data splitting - run 1spliting.ipynb
Reindexing feature IDs with generated IDs in step 1 - run 2reindex-feat.ipynb
Encoding text/image features - run 3feat-encoder.ipynb
Filling your data description file *.yaml under src/configs/dataset with the generated file names *.inter, *-feat.npy, etc.
Specifying your evaluated dataset by cmd: python -d sports -m BM3.

DualGNN requires additional operation to generate the u-u graph

Run dualgnn-gen-u-u-matrix.py on a dataset baby:
python dualgnn-gen-u-u-matrix.py -d baby
The generated u-u graph should be located in the same dir as the dataset.