BERT ICLR 2020 - Gatech-Flash/NLP GitHub Wiki

There are many other related submissions on ICLR 2020. I only choose the ones that I think are the most interesting.

BERT compression:

MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer pdf
Reweighted Proximal Pruning for Large-Scale Language Representation pdf
Extreme Language Model Compression with Optimal Subwords and Shared Projections pdf
Faster and Just As Accurate: A Simple Decomposition for Transformer Models pdf
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning pdf
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models pdf

BERT for Text Generation:

Improving BERT and GLUE SoA:

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding pdf
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations pdf
RoBERTa: A Robustly Optimized BERT Pretraining Approach pdf
FreeLB: Enhanced Adversarial Training for Language Understanding pdf
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators pdf
HUBERT Untangles BERT to Improve Transfer across NLP Tasks pdf
Understanding and Improving Information Transfer in Multi-Task Learning pdf

BERT for Long Document: