Attention - AshokBhat/ml GitHub Wiki
About
- Attention models, or attention mechanisms
- Allows models to focus on specific bits of the complex input before understanding the whole picture
How do they work?
- Attention models, like humans, focus on one part of complex input first.
- Once that part is understood, they move on to the rest to get the whole picture
Limitations
- Higher computational cost
- Interpretability: It's hard to understand how attention models make their decisions
Steps involved in an attention mechanism
+-----------------+
| Input Data |
| (Sequence of |
| Vectors) |
+-----------------+
|
| Embedding
v
+------------------+
| Query, Key, Value|
| Computation |
+------------------+
|
| Similarity Scores
v
+------------------+
| Similarity Score |
| Calculation |
+------------------+
|
| Attention Weights
v
+------------------+
| Softmax Function|
| (Normalization)|
+------------------+
|
| Weighted Sum
v
+------------------+
| Weighted Sum of |
| Values |
+------------------+
|
| Integration
v
+------------------+
| Final Context |
| Vector |
+------------------+
FAQ
- What is an attention model?
- How does attention work?
- What are the benefits of using attention models?
- What are the limitations of attention models?
- What are some applications of attention models?
- What are the future directions of attention models?
See also