AI_LabReport_Week8 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

Add an entry to your dev diary containing your answers to the following questions and outputs of running your code above. Causal Attention Class Parameter Values

In Section 3.5.3, there is a code listing for a compact self-attention class.

On line 2 are the parameters to the constructor.
    When we call this class later in the section, what exact numbers do we give for each parameter value?
        d_in = 3
        d_out = 2
        context_length = 6
        dropout = 0.0

Give the values of the variables on line 16:

b = 2           // batch dimension
num_tokens = 6  // tokens per batch
d_in = 3        // dim in token embedding vec

Give the shape of the variables on lines 18-20:

keys = (b, num_tokens, d_out) / (2, 6, 2)
values = (b, num_tokens, d_out) / (2, 6, 2)
queries = (b, num_tokens, d_out) / (2, 6, 2)

Give the shape of the variable on line 22

  • attn_scores = queries @ keys.transpose(1, 2)
    • = (b, num_tokens, d_out) @ (b, d_out, num_tokens)
    • = (2, 6, 3) @ (2, 3, 6)
    • = (2, 6, 6)

and line 25

attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)

Screenshot from 2024-05-25 19-36-51

Human Writing

⚠️ **GitHub.com Fallback** ⚠️