PyTorch RoFormer & RoFormer-V2

May 18, 2022 · View on GitHub

RoFormer模型和RoFormer-V2模型

更新

  • 2022/05/18

添加paddle版本RoFormerV2在分类任务上的训练结果。

  • 2022/05/11

感谢苏神提醒,添加了一个注释,其中RoFormerV2*表示未经多任务学习的RoFormerV2模型。

  • 2022/05/01

添加clue分类任务的代码和dev集结果,代码在examples/clue文件夹,缺少啥依赖安装啥,比如需要这个pip install -U accelerate

  • 2022/04/30

有个细节需要注意一下,苏神在微调时无论输入是text还是text pair类型时,token_type_id都置为了0。

如果想要使用与苏神保持一致,那么可以在tokenizer时候设置return_token_type_ids=False,这样模型会在内部处理。

否则对于text pair类型时,会返回与0,1两种类型的token_type_id

  • 2022/04/02

(1)修改RoFormerForCausalLM,支持roformer-sim并提供相关的例子,请见examples/test_sim.py

(2)修改apply_rotary实现方式,看起来更简单。

def apply_rotary(x, sinusoidal_pos=None):
    if sinusoidal_pos is None:
        return x
    sin, cos = sinusoidal_pos
    # x.shape [batch, seq_len, 2]
    x1, x2 = x[..., 0::2], x[..., 1::2]
    # [cos_nθ, -sin_nθ] [x1]
    # [sin_nθ,  cos_nθ] [x2]
    # => [x1 * cos_nθ - x2 * sin_nθ, x1 * sin_nθ + x2 * cos_nθ]
    # 苏神的rotary,使用了下面的计算方法。
    # return torch.stack([x1 * cos - x2 * sin, x1 * sin + x2 * cos], dim=-1).flatten(-2, -1)
    # 考虑到矩阵乘法torch.einsum("bhmd,bhnd->bhmn", q, k),因此可以直接在最后一个维度拼接(无需奇偶交错)
    return torch.cat([x1 * cos - x2 * sin, x1 * sin + x2 * cos], dim=-1)
  • 2022/03/21 添加roformer-v2的权重, 注:必须使用本仓库的代码,不能使用transformers仓库的代码!!!

安装

# v2版本
pip install roformer>=0.4.3
# v1版本(代码已经加入到huggingface仓库,请使用新版本的transformers)
pip install -U transformers

评测对比

CLUE-dev榜单分类任务结果,base+large版本。

iflytektnewsafqmccmnliocnliwsccslavg
BERT60.0656.8072.4179.5673.9378.6283.9372.19
RoBERTa60.6458.0674.0581.2476.0087.5084.5074.57
RoFormer60.9157.5473.5280.9276.0786.8484.6374.35
RoFormerV2*60.8756.5472.7580.3475.3680.9284.6773.06
GAU-α61.4157.7674.1781.8275.8679.9385.6773.8
RoFormer-pytorch(本仓库代码)60.6057.5174.4480.7975.6786.8484.7774.37
RoFormerV2-pytorch(本仓库代码)62.8759.0376.2080.8579.7387.8291.8776.91
GAU-α-pytorch(Adafactor)61.1857.5273.4280.9175.6980.5985.573.54
GAU-α-pytorch(AdamW wd0.01 warmup0.1)60.6857.9573.0881.0275.3681.2583.9373.32
RoFormerV2-large-pytorch(本仓库代码)61.7559.2176.1482.3581.7391.4591.577.73
Chinesebert-large-pytorch61.2558.6774.7082.6579.6387.8384.9775.67
RoFormerV2-base-paddle63.7659.5377.0681.5881.5687.8386.7376.87
RoFormerV2-large-paddle64.0260.0877.9282.8783.992.4386.8778.30

CLUE-1.0-test榜单分类任务结果,base+large版本。

iflytektnewsafqmccmnliocnliwsccslavg
RoFormer-pytorch(本仓库代码)59.5457.3474.4680.2373.6780.6984.5772.93
RoFormerV2-pytorch(本仓库代码)63.1558.2475.4280.5974.1783.7983.7374.16
GAU-α-pytorch(Adafactor)61.3857.0874.0580.3773.5374.8385.672.41
GAU-α-pytorch(AdamW wd0.01 warmup0.1)60.5457.6772.4480.3272.9776.5584.1372.09
RoFormerV2-large-pytorch(本仓库代码)61.8559.1376.3880.9776.2385.8684.3374.96
Chinesebert-large-pytorch61.5458.5774.881.9476.9379.6685.174.08
RoFormerV2-large-paddle64.2359.9976.8581.9776.5784.4883.3775.35

注:

  • 其中RoFormerV2*表示的是未进行多任务学习的RoFormerV2模型,该模型苏神并未开源,感谢苏神的提醒。
  • 其中不带有pytorch后缀结果都是从GAU-alpha仓库复制过来的。
  • 其中带有pytorch后缀的结果都是自己训练得出的。
  • 苏神代码中拿了cls标签后直接进行了分类,而本仓库使用了如下的分类头,多了2个dropout,1个dense,1个relu激活。
  • paddle版本的代码进行了grid search!
class RoFormerClassificationHead(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.out_proj = nn.Linear(config.hidden_size, config.num_labels)

        self.config = config

    def forward(self, features, **kwargs):
        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
        x = self.dropout(x)
        x = self.dense(x)
        x = ACT2FN[self.config.hidden_act](x) # 这里是relu
        x = self.dropout(x)
        x = self.out_proj(x)
        return x

Tips:

  • 实验环境RTX 3090

Leadborad截图

Roformer-sim测试例子

import torch
import numpy as np
from roformer import RoFormerForCausalLM, RoFormerConfig
from transformers import BertTokenizer

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# 可选以下几个。
# junnyu/roformer_chinese_sim_char_small, junnyu/roformer_chinese_sim_char_base
# junnyu/roformer_chinese_sim_char_ft_small, roformer_chinese_sim_char_ft_base
pretrained_model = "junnyu/roformer_chinese_sim_char_base"
tokenizer = BertTokenizer.from_pretrained(pretrained_model)
config = RoFormerConfig.from_pretrained(pretrained_model)
config.is_decoder = True
config.eos_token_id = tokenizer.sep_token_id
config.pooler_activation = "linear"
model = RoFormerForCausalLM.from_pretrained(pretrained_model, config=config)
model.to(device)
model.eval()

def gen_synonyms(text, n=100, k=20):
    ''''含义: 产生sent的n个相似句,然后返回最相似的k个。
    做法:用seq2seq生成,并用encoder算相似度并排序。
    '''
    # 寻找所有相似的句子
    r = []
    inputs1 = tokenizer(text, return_tensors="pt")
    for _ in range(n):
        inputs1.to(device)
        output = tokenizer.batch_decode(model.generate(**inputs1, top_p=0.95, do_sample=True, max_length=128), skip_special_tokens=True)[0].replace(" ","").replace(text, "") # 去除空格,去除原始text文本。
        r.append(output)
    
    # 对相似的句子进行排序
    r = [i for i in set(r) if i != text and len(i) > 0]
    r = [text] + r
    inputs2 = tokenizer(r, padding=True, return_tensors="pt")
    with torch.no_grad():
        inputs2.to(device)
        outputs = model(**inputs2)
        Z = outputs.pooler_output.cpu().numpy()
    Z /= (Z**2).sum(axis=1, keepdims=True)**0.5
    argsort = np.dot(Z[1:], -Z[0]).argsort()
    
    return [r[i + 1] for i in argsort[:k]]

out = gen_synonyms("广州和深圳哪个好?")
print(out)
# ['深圳和广州哪个好?',
#  '广州和深圳哪个好',
#  '深圳和广州哪个好',
#  '深圳和广州哪个比较好。',
#  '深圳和广州哪个最好?',
#  '深圳和广州哪个比较好',
#  '广州和深圳那个比较好',
#  '深圳和广州哪个更好?',
#  '深圳与广州哪个好',
#  '深圳和广州,哪个比较好',
#  '广州与深圳比较哪个好',
#  '深圳和广州哪里比较好',
#  '深圳还是广州比较好?',
#  '广州和深圳哪个地方好一些?',
#  '广州好还是深圳好?',
#  '广州好还是深圳好呢?',
#  '广州与深圳哪个地方好点?',
#  '深圳好还是广州好',
#  '广州好还是深圳好',
#  '广州和深圳哪个城市好?']

模型权重对照表

中文模型 roformer-v2

huggingface.cobert4keras
roformer_v2_chinese_char_smallchinese_roformer-v2-char_L-6_H-384_A-6.zip (download code:ttn4)
roformer_v2_chinese_char_basechinese_roformer-v2-char_L-12_H-768_A-12.zip (download code:pfoh)
roformer_v2_chinese_char_largechinese_roformer-v2-char_L-24_H-1024_A-16.zip (download code:npfv)

中文模型 roformer-v1

huggingface.cobert4keras
roformer_chinese_basechinese_roformer_L-12_H-768_A-12.zip (download code:xy9x)
roformer_chinese_smallchinese_roformer_L-6_H-384_A-6.zip (download code:gy97)
roformer_chinese_char_basechinese_roformer-char_L-12_H-768_A-12.zip (download code:bt94)
roformer_chinese_char_smallchinese_roformer-char_L-6_H-384_A-6.zip (download code:a44c)
roformer_chinese_sim_char_basechinese_roformer-sim-char_L-12_H-768_A-12.zip (download code:2cgz)
roformer_chinese_sim_char_smallchinese_roformer-sim-char_L-6_H-384_A-6.zip (download code:h68q)
roformer_chinese_sim_char_ft_basechinese_roformer-sim-char-ft_L-12_H-768_A-12.zip (download code:w15n)
roformer_chinese_sim_char_ft_smallchinese_roformer-sim-char-ft_L-6_H-384_A-6.zip (download code:gty5)

英文模型(使用electra的训练方法在openwebtext上训练的small模型(rotary value = True))

huggingface.co
roformer_small_generator
roformer_small_discriminator

Roformer-v2 MLM测试

import torch
import tensorflow as tf
from transformers import BertTokenizer
from roformer import RoFormerForMaskedLM, TFRoFormerForMaskedLM

text = "今天[MASK]很好,我[MASK]去公园玩。"
tokenizer = BertTokenizer.from_pretrained("junnyu/roformer_v2_chinese_char_base")
pt_model = RoFormerForMaskedLM.from_pretrained("junnyu/roformer_v2_chinese_char_base")
tf_model = TFRoFormerForMaskedLM.from_pretrained(
    "junnyu/roformer_v2_chinese_char_base", from_pt=True
)
pt_inputs = tokenizer(text, return_tensors="pt")
tf_inputs = tokenizer(text, return_tensors="tf")
# pytorch
with torch.no_grad():
    pt_outputs = pt_model(**pt_inputs).logits[0]
pt_outputs_sentence = "pytorch: "
for i, id in enumerate(tokenizer.encode(text)):
    if id == tokenizer.mask_token_id:
        tokens = tokenizer.convert_ids_to_tokens(pt_outputs[i].topk(k=5)[1])
        pt_outputs_sentence += "[" + "||".join(tokens) + "]"
    else:
        pt_outputs_sentence += "".join(
            tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True)
        )
print(pt_outputs_sentence)
# tf
tf_outputs = tf_model(**tf_inputs, training=False).logits[0]
tf_outputs_sentence = "tf: "
for i, id in enumerate(tokenizer.encode(text)):
    if id == tokenizer.mask_token_id:
        tokens = tokenizer.convert_ids_to_tokens(tf.math.top_k(tf_outputs[i], k=5)[1])
        tf_outputs_sentence += "[" + "||".join(tokens) + "]"
    else:
        tf_outputs_sentence += "".join(
            tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True)
        )
print(tf_outputs_sentence)
# small
# pytorch: 今天[的||,||是||很||也]很好,我[要||会||是||想||在]去公园玩。
# tf: 今天[的||,||是||很||也]很好,我[要||会||是||想||在]去公园玩。
# base
# pytorch: 今天[我||天||晴||园||玩]很好,我[想||要||会||就||带]去公园玩。
# tf: 今天[我||天||晴||园||玩]很好,我[想||要||会||就||带]去公园玩。
# large
# pytorch: 今天[天||气||我||空||阳]很好,我[又||想||会||就||爱]去公园玩。
# tf: 今天[天||气||我||空||阳]很好,我[又||想||会||就||爱]去公园玩。

Roformer-v1 MLM测试

import torch
import tensorflow as tf
from transformers import RoFormerForMaskedLM, RoFormerTokenizer, TFRoFormerForMaskedLM

text = "今天[MASK]很好,我[MASK]去公园玩。"
tokenizer = RoFormerTokenizer.from_pretrained("junnyu/roformer_chinese_base")
pt_model = RoFormerForMaskedLM.from_pretrained("junnyu/roformer_chinese_base")
tf_model = TFRoFormerForMaskedLM.from_pretrained(
    "junnyu/roformer_chinese_base", from_pt=True
)
pt_inputs = tokenizer(text, return_tensors="pt")
tf_inputs = tokenizer(text, return_tensors="tf")
# pytorch
with torch.no_grad():
    pt_outputs = pt_model(**pt_inputs).logits[0]
pt_outputs_sentence = "pytorch: "
for i, id in enumerate(tokenizer.encode(text)):
    if id == tokenizer.mask_token_id:
        tokens = tokenizer.convert_ids_to_tokens(pt_outputs[i].topk(k=5)[1])
        pt_outputs_sentence += "[" + "||".join(tokens) + "]"
    else:
        pt_outputs_sentence += "".join(
            tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True)
        )
print(pt_outputs_sentence)
# tf
tf_outputs = tf_model(**tf_inputs, training=False).logits[0]
tf_outputs_sentence = "tf: "
for i, id in enumerate(tokenizer.encode(text)):
    if id == tokenizer.mask_token_id:
        tokens = tokenizer.convert_ids_to_tokens(tf.math.top_k(tf_outputs[i], k=5)[1])
        tf_outputs_sentence += "[" + "||".join(tokens) + "]"
    else:
        tf_outputs_sentence += "".join(
            tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True)
        )
print(tf_outputs_sentence)
# pytorch: 今天[天气||天||心情||阳光||空气]很好,我[想||要||打算||准备||喜欢]去公园玩。
# tf:      今天[天气||天||心情||阳光||空气]很好,我[想||要||打算||准备||喜欢]去公园玩。

手动权重转换

python convert_roformer_original_tf_checkpoint_to_pytorch.py \
    --tf_checkpoint_path=xxxxxx/chinese_roformer_L-12_H-768_A-12/bert_model.ckpt \
    --bert_config_file=pretrained_models/chinese_roformer_base/config.json \
    --pytorch_dump_path=pretrained_models/chinese_roformer_base/pytorch_model.bin

tf与pytorch精度对齐

small版本
bert4keras vs pytorch
mean diff : tensor(5.9108e-07)
max diff : tensor(5.7220e-06)
bert4keras vs tf2.0
mean diff : tensor(4.5976e-07)
max diff : tensor(3.5763e-06)

base版本
python compare_model.py
bert4keras vs pytorch
mean diff : tensor(4.3340e-07)
max diff : tensor(5.7220e-06)
bert4keras vs tf2.0
mean diff : tensor(3.4319e-07)
max diff : tensor(5.2452e-06)

参考

https://github.com/pengming617/bert_classification

https://github.com/bojone/bert4keras

https://github.com/ZhuiyiTechnology/roformer

https://github.com/lonePatient/NeZha_Chinese_PyTorch

https://github.com/lonePatient/TorchBlocks

https://github.com/huggingface/transformers

Citation

Bibtex:


@misc{su2021roformer,
      title={RoFormer: Enhanced Transformer with Rotary Position Embedding}, 
      author={Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},
      year={2021},
      eprint={2104.09864},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@techreport{roformerv2,
  title={RoFormerV2: A Faster and Better RoFormer - ZhuiyiAI},
  author={Jianlin Su, Shengfeng Pan, Bo Wen, Yunfeng Liu},
  year={2022},
  url="https://github.com/ZhuiyiTechnology/roformer-v2",
}