Transformers - BKJackson/BKJackson_Wiki GitHub Wiki
Notebook tutorials
Building a Transformer with PyTorch - DataCamp offers a comprehensive guide on building a transformer model with PyTorch, which includes detailed explanations and code examples4.
Compact Vision Transformer with CIFAR10 data
Transformers for time series
Timeseries classification with a Transformer model - project idea: try tuning the hyperparameters to improve model classification performance Keras Tune or Optuna
Probabilistic Time Series Forecasting with 🤗 Transformers - tutorial with code (Dec. 2022) Related: GluonTS
Making transformers faster, better
Transformers are RNNs:
Fast Autoregressive Transformers with Linear Attention - published 2020
Abstract: Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input’s length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from O(N^2) to O(N), where N is the sequence length. We show that this formulation permits an iterative implementation that dramatically accelerates autoregressive transformers and reveals their relationship to recurrent neural networks. Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences.
Transformer Engine - (Github) Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
Papers
Escaping the Big Data Paradigm with
Compact Transformers - Hassani et al., 2022
Deep Residual Learning for Image Recognition - He et al, 2015, Microsoft Research
Articles
Steps for Training Your Own Transformer Models - Steps for Training Your Own Transformer Models
The Illustrated Transformer
The Annotated Transformer
Books with Code
Transformers for NLP and Computer Vision - Denis Rothman
Fast.ai Book - Book published as Jupyter Notebooks
Courses
Practical Deep Learning for Coders - Fast.ai
Fast.ai Course 2022 part 2 - autoencoders, diffusion, etc.
Videos
LLM Foundations - Sergey, May 11, 2023
Terms to know
Inductive Biases
"Transformers lack some of the inductive biases inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well when trained on insufficient amounts of data." Dosoviskiy et al, 2020