Transformer - AshokBhat/ml GitHub Wiki

About

Benefits vs RNN

  • Does not require in-order processing of sequential data
  • Higher parallelization than RNNs
  • Reduced training time

Adoption

  • Adopted widely in NLP, replacing older RNN models such as the LSTM
  • Enabled training on larger datasets

Pretrained Transformer Models - BERT, GPT

  • Led to the development of pre-trained systems such as BERT and GPT
  • They were trained with huge general language datasets and can be fine-tuned to specific language tasks.

Types of transformers

Comparative Analysis: BERT vs T5 vs GPT-3

| Feature | [BERT]] (2018) ](/AshokBhat/ml/wiki/[[T5) (2019) | GPT-3 (2020) | |-----------------------|-------------------------------|-----------------------------------|------------------------------------| | Company | Google AI | Google AI | OpenAI | | Architecture | Bidirectional Transformer | Text-to-Text Transformer | Generative Pre-trained Transformer 3 | | Encoder | Bidirectional | Bidirectional | Unidirectional | | Decoder | None | Autoregressive | Autoregressive | | Attention Mechanism | Bidirectional self-attention | Bidirectional self-attention | Autoregressive self-attention | | Pre-trained | Masked Language Model (MLM) | Text-to-Text (Unified) | Diverse datasets, large-scale | | Fine-tunable | Yes | Yes | Yes | | Contextual Understanding | Captures bidirectional context | Processes input and output as a sequence | Contextual understanding through autoregressive training | | Use of Pre-training | Pre-trained for various NLP tasks | Pre-trained for diverse NLP tasks | Pre-trained for a wide range of tasks | | Model Size | Varied sizes | Varied sizes | Extremely large-scale model | | Notable Achievements | Introduced bidirectional context in pre-training | Unified approach to NLP tasks | Massive scale and versatility |

FAQ

  1. How are transformers different?
  2. In which areas, have transformers achieved state-of-the-art performance?

See also