transformer - AshokBhat/ml GitHub Wiki

About

Led to the development of pre-trained systems such as BERT and GPT
They were trained with huge general language datasets and can be fine-tuned to specific language tasks.

Feature	BERT (2018)	T5 (2019)	GPT-3 (2020)
Company	Google AI	Google AI	OpenAI
Architecture	Bidirectional Transformer	Text-to-Text Transformer	Generative Pre-trained Transformer 3
Encoder	Bidirectional	Bidirectional	Unidirectional
Decoder	None	Autoregressive	Autoregressive
Attention Mechanism	Bidirectional self-attention	Bidirectional self-attention	Autoregressive self-attention
Pre-trained	Masked Language Model (MLM)	Text-to-Text (Unified)	Diverse datasets, large-scale
Fine-tunable	Yes	Yes	Yes
Contextual Understanding	Captures bidirectional context	Processes input and output as a sequence	Contextual understanding through autoregressive training
Use of Pre-training	Pre-trained for various NLP tasks	Pre-trained for diverse NLP tasks	Pre-trained for a wide range of tasks
Model Size	Varied sizes	Varied sizes	Extremely large-scale model
Notable Achievements	Introduced bidirectional context in pre-training	Unified approach to NLP tasks	Massive scale and versatility