transformer - AshokBhat/ml GitHub Wiki

About

Benefits vs RNN

  • Does not require in-order processing of sequential data
  • Higher parallelization than RNNs
  • Reduced training time

Adoption

  • Adopted widely in NLP, replacing older RNN models such as the LSTM
  • Enabled training on larger datasets

Pretrained Transformer Models - BERT, GPT

  • Led to the development of pre-trained systems such as BERT and GPT
  • They were trained with huge general language datasets and can be fine-tuned to specific language tasks.

Types of transformers

Comparative Analysis: BERT vs T5 vs GPT-3

Feature BERT (2018) T5 (2019) GPT-3 (2020)
Company Google AI Google AI OpenAI
Architecture Bidirectional Transformer Text-to-Text Transformer Generative Pre-trained Transformer 3
Encoder Bidirectional Bidirectional Unidirectional
Decoder None Autoregressive Autoregressive
Attention Mechanism Bidirectional self-attention Bidirectional self-attention Autoregressive self-attention
Pre-trained Masked Language Model (MLM) Text-to-Text (Unified) Diverse datasets, large-scale
Fine-tunable Yes Yes Yes
Contextual Understanding Captures bidirectional context Processes input and output as a sequence Contextual understanding through autoregressive training
Use of Pre-training Pre-trained for various NLP tasks Pre-trained for diverse NLP tasks Pre-trained for a wide range of tasks
Model Size Varied sizes Varied sizes Extremely large-scale model
Notable Achievements Introduced bidirectional context in pre-training Unified approach to NLP tasks Massive scale and versatility

FAQ

  1. How are transformers different?
  2. In which areas, have transformers achieved state-of-the-art performance?

See also

⚠️ **GitHub.com Fallback** ⚠️