Transformer Book Notes - doraithodla/notes GitHub Wiki

Started 10/3

So what is it about transformers that changed the field almost overnight? Like many great scientific breakthroughs, it was the synthesis of several ideas, like attention, transfer learning, and scaling up neural networks, that were percolating in the research community at the time.

Notes

  • RNNs
  • Attention Mechanisms
  • Self Attention
  • Transfer Learning

Books

  • Hands-On Machine Learning with Scikit-Learn and TensorFlow, by Aurélien Géron (O’Reilly)

  • Deep Learning for Coders with fastai and PyTorch, by Jeremy Howard and Sylvain Gugger (O’Reilly)

    • Natural Language Processing with PyTorch, by Delip Rao and Brian McMahan (O’Reilly)
  • The Hugging Face Course, by the open source team at Hugging Face Transformers offers several layers of abstraction for using and training transformer models. We’ll start with the easy-to-use pipelines that allow us to pass text examples through the models and investigate the predictions in just a few lines of code. Then we’ll move on to tokenizers, model classes, and the Trainer API, which allow us to train models for our own use cases.

Online rsources Google Colaboratory

Kaggle Notebooks

Paperspace Gradient Notebooks

Transformer - a network architecture for sequence modeling.

Well known transformer:

  • Generative pretrained model (GPT)
  • Bidrectional encoder representation from transfomers

By combining architecture with unsupervised learning, these models removed the need to train task specific architectures from scratch.

Encoder Decoder Framework