Rectified Linear Attention
April 21, 2021 ยท View on GitHub
This repo contain pytorch implementation of Sparse Attention with Linear Units, this is not the official repo so some details might be vary from paper.
Citation:
@misc{zhang2021sparse,
title={Sparse Attention with Linear Units},
author={Biao Zhang and Ivan Titov and Rico Sennrich},
year={2021},
eprint={2104.07012},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
References:
- Transformer component and initial Attention code from lucidrain's vit-pytorch
- RMSNorm code is from this repo.