Transformer - AshokBhat/ml GitHub Wiki

About

Deep learning model architecture
For handling sequential data
Introduced in 2017 in the field of NLP

Benefits vs RNN

Does not require in-order processing of sequential data
Higher parallelization than RNNs
Reduced training time

Adoption

Adopted widely in NLP, replacing older RNN models such as the LSTM
Enabled training on larger datasets

Pretrained Transformer Models - BERT, GPT

Led to the development of pre-trained systems such as BERT and GPT
They were trained with huge general language datasets and can be fine-tuned to specific language tasks.

Types of transformers

Vision - Vision Transformer
NLP - BERT, GPT

Comparative Analysis: BERT vs T5 vs GPT-3

| Feature | [BERT]] (2018) ](/AshokBhat/ml/wiki/[[T5) (2019) | GPT-3 (2020) | |-----------------------|-------------------------------|-----------------------------------|------------------------------------| | Company | Google AI | Google AI | OpenAI | | Architecture | Bidirectional Transformer | Text-to-Text Transformer | Generative Pre-trained Transformer 3 | | Encoder | Bidirectional | Bidirectional | Unidirectional | | Decoder | None | Autoregressive | Autoregressive | | Attention Mechanism | Bidirectional self-attention | Bidirectional self-attention | Autoregressive self-attention | | Pre-trained | Masked Language Model (MLM) | Text-to-Text (Unified) | Diverse datasets, large-scale | | Fine-tunable | Yes | Yes | Yes | | Contextual Understanding | Captures bidirectional context | Processes input and output as a sequence | Contextual understanding through autoregressive training | | Use of Pre-training | Pre-trained for various NLP tasks | Pre-trained for diverse NLP tasks | Pre-trained for a wide range of tasks | | Model Size | Varied sizes | Varied sizes | Extremely large-scale model | | Notable Achievements | Introduced bidirectional context in pre-training | Unified approach to NLP tasks | Massive scale and versatility |

FAQ

How are transformers different?
In which areas, have transformers achieved state-of-the-art performance?