Transformer - AshokBhat/ml GitHub Wiki
About
- Deep learning model architecture
- For handling sequential data
- Introduced in 2017 in the field of NLP
RNN
Benefits vs- Does not require in-order processing of sequential data
- Higher parallelization than RNNs
- Reduced training time
Adoption
- Adopted widely in NLP, replacing older RNN models such as the LSTM
- Enabled training on larger datasets
Pretrained Transformer Models - BERT, GPT
- Led to the development of pre-trained systems such as BERT and GPT
- They were trained with huge general language datasets and can be fine-tuned to specific language tasks.
Types of transformers
- Vision - Vision Transformer
- NLP - BERT, GPT
Comparative Analysis: BERT vs T5 vs GPT-3
| Feature | [BERT]] (2018) ](/AshokBhat/ml/wiki/[[T5) (2019) | GPT-3 (2020) | |-----------------------|-------------------------------|-----------------------------------|------------------------------------| | Company | Google AI | Google AI | OpenAI | | Architecture | Bidirectional Transformer | Text-to-Text Transformer | Generative Pre-trained Transformer 3 | | Encoder | Bidirectional | Bidirectional | Unidirectional | | Decoder | None | Autoregressive | Autoregressive | | Attention Mechanism | Bidirectional self-attention | Bidirectional self-attention | Autoregressive self-attention | | Pre-trained | Masked Language Model (MLM) | Text-to-Text (Unified) | Diverse datasets, large-scale | | Fine-tunable | Yes | Yes | Yes | | Contextual Understanding | Captures bidirectional context | Processes input and output as a sequence | Contextual understanding through autoregressive training | | Use of Pre-training | Pre-trained for various NLP tasks | Pre-trained for diverse NLP tasks | Pre-trained for a wide range of tasks | | Model Size | Varied sizes | Varied sizes | Extremely large-scale model | | Notable Achievements | Introduced bidirectional context in pre-training | Unified approach to NLP tasks | Massive scale and versatility |
FAQ
- How are transformers different?
- In which areas, have transformers achieved state-of-the-art performance?