AI‐Homework‐07 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki
Read Chapter 3 in the Raschka LLM book, typing and adapting the code samples to your own dataset (which we'll do as part of Lab 07)
Read Appendix A in the same book, from Section A.6, on PyTorch and tensors.
Attempt a response to the following questions in a dev diary entry.
How would you describe a one-hot encoding in English?

Hint: In MNIST, x
is a 784-pixel image, and y
is a single character label
from 0
to 9
.
text in a sliding window (or batches of words).
Match the following terms (A,B,C) with its definition below (1,2,3):
A. max_length
in
B. stride
C. batch size
i. the number of token IDs to "slide" forward from one (x,y)
training example to the next (x,y)
training example
ii. chunk size, or number of token IDs to group together into one x
or y
of a training example (x,y)
iii. the number of (x,y)
training examples returned in each call to next
of our dataloader's iterator.
Question 3. Because embedding is stored as a matrix, and we studied how neural network weights can also be stored
in a matrix, we can view the operation of transforming an input vector into an embedding as a two-layer neural network.
For example, this neural network has layers of [4,3]
meaning 4 nodes in the first layer and 3 nodes in the 2nd layer.

We ignore biases for now, or assume they are biases of all zeros.
The weights for the above neural network are a matrix that takes column vectors of size 4
to column vectors of size 3. The size of this matrix is 4x3
, or 4 rows by 3 columns.
What if the embeddings matrix took you from a vocabulary size of 7 to an output dimension of 128. What is the shape of that matrix?
token ID (as a one-hot encoding) that we wish to convert to its embedding (in a higher-dimensional feature space).
To embed a batch of 8 chunks, we form a matrix from the column vectors of each chunk, and multiply that by the embeddings matrix.
If the embeddings matrix goes from a vocabulary of size 6 to an output dimension of 12, what is the shape of the output matrix when we embed a batch of 8 chunks?
Question 5. Suppose your embedding matrix (created in Section 2.7 of the book) looks like the example below:

If you select and print out the embedding for a particular token ID, you get
tensor([[ 1.8960, -0.1750, 1.3689, -1.6033]], grad_fn=<EmbeddingBackward0>)
(Ignored the requires_grad
and grad_fn
parameters for now).
A) Which token ID did you get an embedding for? (Remember it is 0-based indexing) B) Which of the following is true? i) Your vocabulary has 4 token IDs in it, and the embedding output dimension is 7 ii) Your vocabulary has 7 token IDs in it, and the embedding output dimension is 4 iii) Both iv) Neither
Read the following two articles about energy usage in computing
Generative AI’s environmental costs are soaring — and mostly secret, by
In the same dev diary entry as above, write a response of at least 500 words addressing the following questions:
- How would you summarize the main point or thesis of each article?
- Try to summarize them each in one sentence.
- How would you divide and expand the thesis of each article above into two or three main parts? That is:
- for the permacomputing article, how would you summarize its main sections?
- for the AI energy article, how would you summarize its main sections?
- What are two pieces of evidence or arguments that each article provides to support their thesis?
- Provide two pieces of evidence or arguments for the permacomputing article.
- Provide another two pieces of evidence or arguments for the AI energy article.
- What is a related piece of evidence or arguments that you've found independently (e.g. through reading / watching the news, search engines)?
- Find one piece of evidence or argument that supports or refutes the permacomputing article.
- Find one piece of evidence or argument that supports or refutes the AI energy article.
- How are the two readings similar or different?
- How would you describe the overall attitude of each article?
- How would you describe the approach each article takes?
- To what extent do you agree or disagree with the thesis of each article, as you've stated it above?
- Do you find the pieces of evidence or arguments that you provide convincing? Why or why not?
If you use an AI chat to help develop your response, include a link to your chat and attempt to make it a 50%-50% chat: write prompts, questions, or your own summaries that are at least as long (in number of words) as the responses the AI gives you, or ask the AI to deliberately shorten its answers.
Note: Come up with your own thesis statement after reading the article before talking to anyone else, either a classmate or an AI. Do not ask another entity to develop ideas from scratch or come to a conclusion for you.
You may wish to use other sources only to check your own understanding, knowing that you should independently verify and do additional work outside of this conversation to make sure the contributions are usable, true, and support your main point.