BERT ICLR 2020 - Gatech-Flash/NLP GitHub Wiki

There are many other related submissions on ICLR 2020. I only choose the ones that I think are the most interesting.

BERT compression:

  • MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer pdf
  • Reweighted Proximal Pruning for Large-Scale Language Representation pdf
  • Extreme Language Model Compression with Optimal Subwords and Shared Projections pdf
  • Faster and Just As Accurate: A Simple Decomposition for Transformer Models pdf
  • Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning pdf
  • Well-Read Students Learn Better: On the Importance of Pre-training Compact Models pdf

BERT for Text Generation:

  • Incorporating BERT into Neural Machine Translation pdf
  • Distilling the Knowledge of BERT for Text Generation pdf
  • BERTScore: Evaluating Text Generation with BERT pdf

Improving BERT and GLUE SoA:

  • StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding pdf
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations pdf
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach pdf
  • FreeLB: Enhanced Adversarial Training for Language Understanding pdf
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators pdf
  • HUBERT Untangles BERT to Improve Transfer across NLP Tasks pdf
  • Understanding and Improving Information Transfer in Multi-Task Learning pdf

BERT for Long Document:

  • Blockwise Self-Attention for Long Document Understanding pdf
  • BERT-AL: BERT for Arbitrarily Long Document Understanding pdf