attention - AshokBhat/ml GitHub Wiki

About

  • Attention models, or attention mechanisms
  • Allows models to focus on specific bits of the complex input before understanding the whole picture

How do they work?

  • Attention models, like humans, focus on one part of complex input first.
  • Once that part is understood, they move on to the rest to get the whole picture

Limitations

  • Higher computational cost
  • Interpretability: It's hard to understand how attention models make their decisions

Steps involved in an attention mechanism

   +-----------------+
   |    Input Data   |
   |  (Sequence of   |
   |     Vectors)    |
   +-----------------+
              |
              |  Embedding
              v
     +------------------+
     | Query, Key, Value|  
     |  Computation     |
     +------------------+
              |
              |  Similarity Scores
              v
     +------------------+
     | Similarity Score | 
     |    Calculation   |
     +------------------+
              |
              |  Attention Weights
              v
     +------------------+
     |  Softmax Function| 
     |   (Normalization)|
     +------------------+
              |
              |  Weighted Sum
              v
     +------------------+
     |  Weighted Sum of | 
     |     Values       |
     +------------------+
              |
              |  Integration
              v
     +------------------+
     |  Final Context   | 
     |    Vector        |
     +------------------+

FAQ

  • What is an attention model?
  • How does attention work?
  • What are the benefits of using attention models?
  • What are the limitations of attention models?
  • What are some applications of attention models?
  • What are the future directions of attention models?

See also

⚠️ **GitHub.com Fallback** ⚠️