Preprocessing from raw data 从原始数据处理
February 20, 2024 · View on GitHub
-
The following preprocessing steps can be quite tedious. Please post issues if you cannot run the scripts.
-
datasets: Amazon
-- Rating file inFiles/Small subsets for experimentation
-- Meta files inPer-category files, [metadata], [image features]
There has been an issue with the dataset site lately,
as it automatically redirects to an updated version of the dataset.
Keep pressing ESC to stop the redirecting action.
Step by step
- Performing 5-core filtering, re-indexing -
run 0rating2inter.ipynb - Train/valid/test data splitting -
run 1spliting.ipynb - Reindexing feature IDs with generated IDs in step 1 -
run 2reindex-feat.ipynb - Encoding text/image features -
run 3feat-encoder.ipynb - Filling your data description file
*.yamlundersrc/configs/datasetwith the generated file names*.inter,*-feat.npy, etc. - Specifying your evaluated dataset by cmd:
python -d sports -m BM3.
DualGNN requires additional operation to generate the u-u graph
- Run
dualgnn-gen-u-u-matrix.pyon a datasetbaby:
python dualgnn-gen-u-u-matrix.py -d baby - The generated u-u graph should be located in the same dir as the dataset.