other_results.md

August 24, 2022 · View on GitHub

GLUE results

We also evalute the language understanding performance of Uni-Perceiver on GLUE benchmarks. The results are listed as below.

Dataset	MNLI	QNLI	QQP	RTE	SST-2	MRPC	CoLA
Metric	Acc	Acc	F1	Acc	Acc	F1	Acc
Uni-Perceiver_BASE	79.7	87.3	86.7	71.1	89.3	86.0	43.1
Uni-Perceiver-MoE_BASE	81.5	88.2	87.8	75.8	90.9	87.1	52.2
Uni-Perceiver_LARGE	82.5	89.2	87.7	73.7	91.2	90.2	52.0
Uni-Perceiver-MoE_LARGE	85.7	91.9	89.5	78.4	93.4	91.2	57.4

All fine-tuning experiments are performed on 1 GPU.
We use the hyper-parameters for GLUE tasks from fair-seq

Model	MNLI	QNLI	QQP	RTE	SST-2	MRPC	CoLA	STS-B
`--num-classes`	3	2	2	2	2	2	2	1
`--lr`	5e-6	1e-5	1e-5	1e-5	5e-6	2e-5	2e-5	2e-5
`bsz`	128	32	32	32	128	64	64	32
`--total-num-update`	30968	33112	113272	1018	5233	1148	1334	1799
`--warmup-updates`	1858	1986	6796	61	314	68	80	107
`--warmup-updates`	1858	1986	6796	61	314	68	80	107

Following RoBerta, we finetune RTE, STS and MRPC starting from the MNLI single-task model, rather than the baseline pretrained model.