Model Architecture - runtimerevolution/labs GitHub Wiki
When selecting a model for working with Large Language Models (LLMs), there are several types of existing models to consider, each with its own characteristics and use cases. Here are some common types of LLMs, along with guidance on how to pick one and examples for each type:
Transformer-based models, such as OpenAI's GPT (Generative Pre-trained Transformer) series, are among the most popular and powerful LLMs. They utilize self-attention mechanisms to process input data in parallel, allowing them to capture long-range dependencies and relationships in text effectively. Transformer-based models are suitable for a wide range of natural language processing tasks, including text generation, summarization, translation, and question answering. Consider factors such as model size, computational resources required for training and inference, and performance on specific tasks when choosing a transformer-based model.
![]() |
---|
![]() |
Transfomer model1 |
Examples:
- GPT-3 (Generative Pre-trained Transformer 3) by OpenAI
- BERT (Bidirectional Encoder Representations from Transformers) by Google https://www.coursera.org/articles/bert-model
- T5 (Text-to-Text Transfer Transformer) by Google https://huggingface.co/docs/transformers/en/model_doc/t5
RNN-based models, such as LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit), are capable of capturing sequential dependencies in text data over time. They are particularly effective for tasks involving sequential data, such as language modeling and text generation. RNN-based models are suitable for tasks where the temporal order of data is important, such as sequence prediction, language modeling, and text generation. Consider factors such as model architecture, training stability, and performance on sequential tasks when selecting an RNN-based model.
![]() |
---|
Recurrent Neural Network (RNN) Model2 |
Examples:
- LSTM (Long Short-Term Memory) models https://medium.com/@rebeen.jaff/what-is-lstm-introduction-to-long-short-term-memory-66bd3855b9ce
- GRU (Gated Recurrent Unit) models https://medium.com/@anishnama20/understanding-gated-recurrent-unit-gru-in-deep-learning-2e54923f3e2
- Seq2Seq (Sequence-to-Sequence) models
https://sh-tsang.medium.com/review-empirical-evaluation-of-gated-recurrent-neural-networks-on-sequence-modeling-gru-2adb86559257
![]() |
---|
Most used RNN models3 |
Hybrid models combine elements of transformers and convolutional neural networks (CNNs) to leverage their respective strengths. These models often incorporate CNNs for feature extraction from text data before feeding it into transformer layers for further processing. Hybrid models are suitable for tasks where both local and global dependencies in text data are important, such as text classification and sentiment analysis. Consider factors such as model architecture, computational efficiency, and performance on specific tasks when selecting a hybrid model.
Examples:
- BERT-CNN by Google https://huggingface.co/docs/transformers/en/model_doc/bert
https://blog.invgate.com/gpt-3-vs-bert#:~:text=However%2C%20due%20to%20their%20differences,for%20sentiment%20analysis%20or%20NLU.
- Transformer-XL by Google Brain https://huggingface.co/docs/transformers/en/model_doc/transfo-xl
Memory-augmented models, such as the Neural Turing Machine (NTM) and the Differentiable Neural Computer (DNC), incorporate external memory modules to store and retrieve information dynamically during processing. These models are capable of learning and reasoning over structured data more effectively. Memory-augmented models are suitable for tasks requiring complex reasoning and symbolic manipulation, such as question answering and algorithmic problem solving. Consider factors such as memory capacity, read/write operations, and performance on memory-intensive tasks when selecting a memory-augmented model.
Examples:
- Neural Turing Machine (NTM) by DeepMind https://medium.com/data-science-in-your-pocket/neural-turing-machines-explained-9acdbe8897de
- Differentiable Neural Computer (DNC) by DeepMind https://towardsdatascience.com/rps-intro-to-differentiable-neural-computers-e6640b5aa73a
Sparse attention models, such as Sparse Transformer and Linformer, use sparse attention mechanisms to reduce computational complexity while preserving performance. These models are designed to scale more efficiently to longer sequences and larger datasets. Sparse attention models are suitable for tasks involving long-range dependencies in text data, such as language modeling and text generation. Consider factors such as sparsity pattern, memory efficiency, and performance on large-scale tasks when selecting a sparse attention model.
![]() |
---|
Sparse Model4 |
Examples:
- Sparse Transformer by OpenAI https://openai.com/index/sparse-transformer/
- Linformer by Facebook AI https://serp.ai/linformer/#:~:text=Linformer%20is%20a%20linear%20Transformer%20model%20designed%20to%20make%20transformer,model%20while%20reducing%20computational%20costs.
When picking a specific model, it's essential to evaluate its performance on relevant tasks, consider the computational resources required for training and inference, and assess any pre-trained versions available for transfer learning. Additionally, consider factors such as model interpretability, robustness to adversarial attacks, and alignment with our project's requirements and constraints.
1: The Illustrated Transformer (https://jalammar.github.io/illustrated-transformer/)
2: Introduction to Recurrent Neural Network (https://www.geeksforgeeks.org/introduction-to-recurrent-neural-network/)
3: A systematic review and outlook on machine-learning-based methods for spam-filtering (https://www.researchgate.net/figure/There-are-three-most-used-RNN-models-as-shown-in-the-figure-GRU-LSTM-and-RNN_fig2_372210745)
4: The Rise of Sparse Mixtures of Experts: Switch Transformers (https://mlfrontiers.substack.com/p/the-rise-of-sparse-mixtures-of-experts)