Transformers - BKJackson/BKJackson_Wiki GitHub Wiki

Notebook tutorials

Building a Transformer with PyTorch - DataCamp offers a comprehensive guide on building a transformer model with PyTorch, which includes detailed explanations and code examples4.
Compact Vision Transformer with CIFAR10 data

Transformers for time series

Timeseries classification with a Transformer model - project idea: try tuning the hyperparameters to improve model classification performance Keras Tune or Optuna

Probabilistic Time Series Forecasting with 🤗 Transformers - tutorial with code (Dec. 2022) Related: GluonTS

Making transformers faster, better

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention - published 2020
Abstract: Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input’s length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from O(N^2) to O(N), where N is the sequence length. We show that this formulation permits an iterative implementation that dramatically accelerates autoregressive transformers and reveals their relationship to recurrent neural networks. Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences.

Transformer Engine - (Github) Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.

Papers

Escaping the Big Data Paradigm with Compact Transformers - Hassani et al., 2022
Deep Residual Learning for Image Recognition - He et al, 2015, Microsoft Research

Articles

Steps for Training Your Own Transformer Models - Steps for Training Your Own Transformer Models
The Illustrated Transformer
The Annotated Transformer

Books with Code

Transformers for NLP and Computer Vision - Denis Rothman
Fast.ai Book - Book published as Jupyter Notebooks

Courses

Practical Deep Learning for Coders - Fast.ai
Fast.ai Course 2022 part 2 - autoencoders, diffusion, etc.

Videos

LLM Foundations - Sergey, May 11, 2023

Terms to know

Inductive Biases

"Transformers lack some of the inductive biases inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well when trained on insufficient amounts of data." Dosoviskiy et al, 2020

Translation Equivariance

Locality