BERT ICLR 2020 - Gatech-Flash/NLP GitHub Wiki
There are many other related submissions on ICLR 2020. I only choose the ones that I think are the most interesting.
BERT compression:
- MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer pdf
- Reweighted Proximal Pruning for Large-Scale Language Representation pdf
- Extreme Language Model Compression with Optimal Subwords and Shared Projections pdf
- Faster and Just As Accurate: A Simple Decomposition for Transformer Models pdf
- Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning pdf
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models pdf
BERT for Text Generation:
- Incorporating BERT into Neural Machine Translation pdf
- Distilling the Knowledge of BERT for Text Generation pdf
- BERTScore: Evaluating Text Generation with BERT pdf
Improving BERT and GLUE SoA:
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding pdf
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations pdf
- RoBERTa: A Robustly Optimized BERT Pretraining Approach pdf
- FreeLB: Enhanced Adversarial Training for Language Understanding pdf
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators pdf
- HUBERT Untangles BERT to Improve Transfer across NLP Tasks pdf
- Understanding and Improving Information Transfer in Multi-Task Learning pdf
BERT for Long Document: