README.md

April 29, 2022 · View on GitHub

RecLearn

简体中文 | English

RecLearn (Recommender Learning) which summarizes the contents of the master branch in Recommender System with TF2.0 is a recommended learning framework based on Python and TensorFlow2.x for students and beginners. Of course, if you are more comfortable with the master branch, you can clone the entire package, run some algorithms in example, and also update and modify the content of model and layer. The implemented recommendation algorithms are classified according to two application stages in the industry:

  • matching recommendation stage (Top-k Recmmendation)
  • ranking recommendeation stage (CTR predict model)

Update

04/23/2022: update all matching model.

Installation

Package

RecLearn is on PyPI, so you can use pip to install it.

pip install reclearn

dependent environment:

  • python3.8+
  • Tensorflow2.5-GPU+/Tensorflow2.5-CPU+
  • sklearn0.23+

Local

Clone Reclearn to local:

git clone -b reclearn git@github.com:ZiyaoGeng/RecLearn.git

Quick Start

In example, we have given a demo of each of the recommended models.

Matching

1. Divide the dataset.

Set the path of the raw dataset:

file_path = 'data/ml-1m/ratings.dat'

Please divide the current dataset into training dataset, validation dataset and test dataset. If you use movielens-1m, Amazon-Beauty, Amazon-Games and STEAM, you can call method data/datasets/* of RecLearn directly:

train_path, val_path, test_path, meta_path = ml.split_seq_data(file_path=file_path)

meta_path indicates the path of the metafile, which stores the maximum number of user and item indexes.

2. Load the dataset.

Complete the loading of training dataset, validation dataset and test dataset, and generate several negative samples (random sampling) for each positive sample. The format of data is dictionary:

data = {'pos_item':, 'neg_item': , ['user': , 'click_seq': ,...]}

If you're building a sequential recommendation model, you need to introduce click sequences. Reclearn provides methods for loading the data for the above four datasets:

# general recommendation model
train_data = ml.load_data(train_path, neg_num, max_item_num)
# sequence recommendation model, and use the user feature.
train_data = ml.load_seq_data(train_path, "train", seq_len, neg_num, max_item_num, contain_user=True)

3. Set hyper-parameters.

The model needs to specify the required hyperparameters. Now, we take BPR model as an example:

model_params = {
        'user_num': max_user_num + 1,
        'item_num': max_item_num + 1,
        'embed_dim': FLAGS.embed_dim,
        'use_l2norm': FLAGS.use_l2norm,
        'embed_reg': FLAGS.embed_reg
    }

4. Build and compile the model.

Select or build the model you need and compile it. Take 'BPR' as an example:

model = BPR(**model_params)
model.compile(optimizer=Adam(learning_rate=FLAGS.learning_rate))

If you have problems with the structure of the model, you can call the summary method after compilation to print it out:

model.summary()

5. Learn the model and predict test dataset.

for epoch in range(1, epochs + 1):
    t1 = time()
    model.fit(
        x=train_data,
        epochs=1,
        validation_data=val_data,
        batch_size=batch_size
    )
    t2 = time()
    eval_dict = eval_pos_neg(model, test_data, ['hr', 'mrr', 'ndcg'], k, batch_size)
    print('Iteration %d Fit [%.1f s], Evaluate [%.1f s]: HR = %.4f, MRR = %.4f, NDCG = %.4f'
          % (epoch, t2 - t1, time() - t2, eval_dict['hr'], eval_dict['mrr'], eval_dict['ndcg']))

Ranking

Waiting......

Results

The experimental environment designed by Reclearn is different from that of some papers, so there may be some deviation in the results. Please refer to Experiement for details.

Matching

Model ml-1m Beauty STEAM
HR@10MRR@10NDCG@10 HR@10MRR@10NDCG@10 HR@10MRR@10NDCG@10
BPR0.57680.23920.30160.37080.21080.24850.77280.42200.5054
NCF0.58340.22190.30600.54480.28310.34510.77680.42730.5103
DSSM0.54980.21480.2929------
YoutubeDNN0.67370.34140.4201------
MIND(Error)0.63660.25970.3483------
GRU4Rec0.79690.46980.54830.52110.27240.33120.85010.54860.6209
Caser0.79160.44500.52800.54870.28840.35010.82750.50640.5832
SASRec0.81030.48120.56050.52300.27810.33550.86060.56690.6374
AttRec0.78730.45780.53630.49950.26950.3229---
FISSA0.81060.49530.57130.54310.28510.34620.86350.56820.6391

Ranking

Model 500w(Criteo) Criteo
Log Loss AUC Log Loss AUC
FM0.47650.77830.47620.7875
FFM----
WDL0.46840.78220.46920.7930
Deep Crossing0.46700.78260.46930.7935
PNN-0.7847--
DCN-0.78230.46910.7929
NFM0.47730.77620.47230.7889
AFM0.48190.78080.46920.7871
DeepFM-0.78280.46500.8007
xDeepFM0.46900.78390.46960.7919

Model List

1. Matching Stage

Paper|ModelPublishedAuthor
BPR: Bayesian Personalized Ranking from Implicit Feedback|MF-BPRUAI, 2009Steffen Rendle
Neural network-based Collaborative Filtering|NCFWWW, 2017Xiangnan He
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data|DSSMCIKM, 2013Po-Sen Huang
Deep Neural Networks for YouTube Recommendations| YoutubeDNNRecSys, 2016Paul Covington
Session-based Recommendations with Recurrent Neural Networks|GUR4RecICLR, 2016Balázs Hidasi
Self-Attentive Sequential Recommendation|SASRecICDM, 2018UCSD
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding|CaserWSDM, 2018Jiaxi Tang
Next Item Recommendation with Self-Attentive Metric Learning|AttRecAAAAI, 2019Shuai Zhang
FISSA: Fusing Item Similarity Models with Self-Attention Networks for Sequential Recommendation|FISSARecSys, 2020Jing Lin

2. Ranking Stage

Paper|ModelPublishedAuthor
Factorization Machines|FMICDM, 2010Steffen Rendle
Field-aware Factorization Machines for CTR Prediction|FFMRecSys, 2016Criteo Research
Wide & Deep Learning for Recommender Systems|WDLDLRS, 2016Google Inc.
Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features|Deep CrossingKDD, 2016Microsoft Research
Product-based Neural Networks for User Response Prediction|PNNICDM, 2016Shanghai Jiao Tong University
Deep & Cross Network for Ad Click Predictions|DCNADKDD, 2017Stanford University|Google Inc.
Neural Factorization Machines for Sparse Predictive Analytics|NFMSIGIR, 2017Xiangnan He
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks|AFMIJCAI, 2017Zhejiang University|National University of Singapore
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction|DeepFMIJCAI, 2017Harbin Institute of Technology|Noah’s Ark Research Lab, Huawei
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems|xDeepFMKDD, 2018University of Science and Technology of China
Deep Interest Network for Click-Through Rate Prediction|DINKDD, 2018Alibaba Group

Discussion

  1. If you have any suggestions or questions about the project, you can leave a comment on Issue.
  2. wechat: