Links: Transformer Networks - touretzkyds/ai4k12 GitHub Wiki

Transformer networks are deep neural networks now widely used for neural natural language processing, including handling search queries, question answering, image captioning, and translating between languages.

Introductory Tutorials on Transformers

video (9:10) and text: Transformers Explained: Understand the Model Behind GPT, BERT, and T5
Transformer: A Novel Neural Network Architecture for Language Understanding, Google AI blog. Very accessible introduction.
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
How Transformers Work in Deep Learning and NLP
Getting Meaning From Text
A deep dive into BERT: How BERT launched a rocket into natural language understanding

More Technical Tutorials

Transformers From Scratch (Rohrer)
Transformers From Scratch (Bloem)
The Annotated Transformer
Transformer model for language understanding (TensorFlow tutorial on language translation)
Language modeling with nn.Transformer and TorchText (PyTorch tutorial)

Technical Videos on Transformers

The Narrated Transformer Language Model
Tensor2Tensor Transformers
GPT-3: Language Models are Few-Shot Learners (Paper Explained) (1:04:29)
A Visual Guide to Transformer Neural Networks (series):
- Episode 1: Position Embeddings
- Episode 2: Multi-Head & Self-Attention
- Episode 2: Decoder's Masked Attention
Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention

Question Answering Demos Using Transformers

Google BERT demo [direct link]
ML4K BERT Q&A model [direct link]

Text Generation Demos Using Transformers

Talk to Transformer [direct link]
TextSynth [direct link]

Important Papers

Attention Is All You Need, Vaswani et al. 2017.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al. 2019.
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Wu et al. 2016.

Capabilites of Large Language Models

Google's AI Is Something Even Stranger Than Conscious, Stephen Marche, The Atlantic, June 19, 2022
How Does ChatGPT Work? Tracing the Evolution of AIGC DTonomy, December 31, 2022

Other Resources

Simple Transformer Language Model (Python notebook in CoLab)
SQuAD: Stanford Question Answering Dataset used to train some BERT models