Paul Pham Dev Diary - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

Paul Pham Dev Diary

2024-06-10

Here's an example of code that will save your PyTorch model file at the end of every epoch, instead of waiting for all epochs .

in the function train_model_simple.

Then you can Ctrl+C to kill your training job when the training loss stops improving (probably after 100 or 200 epochs) and not worry about losing your model.

2024-06-04

I tried to learn how to make something as seemingly simple as a binary search tree in Rust and was mostly utterly defeated.

This is the current score, with some lovely compiler feedback, showing how my ass is being handed to me.

I've also been reading through Too Many Linked Lists which seems to zero in on the issues I've been having, the tone is ... interesting, and the technical perspective seems unique in that none of the official Rust books talk about making data structures from scratch (which is kind of my main thing).

2024-06-01

I attempted to set up SSH forwarding to allow GPU access for classmates who want to train larger datasets for their GPT.

I created a systemd service that looks like this

[Unit]
Description=AutoSSH for AI
After=network.target

[Service]
Type=simple
ExecStart=/home/ppham/src/upper-division-cs/ai-24sp/scripts/autossh-ai.sh
RemainAfterExit=true

[Install]
WantedBy=default.target

with the script autossh-ai.sh looking like this using advice from this Stackoverflow answer

#!/bin/sh

# Allow logins on indira port 8888 to anti-villain 8888
autossh -M 30000 \
        -o "ServerAliveCountMax=3" \
        -o "ServerAliveInterval=60" \
        -o "ExitOnForwardFailure=yes" \
        -v -N -R 9999:localhost:8888 [email protected]

The ExitOnForwardFailure command-line option seems key for killing remote sshd server upon disconnect, which appears to happen soon after the systemd service and was preventing it from re-establishing a connection.

I was able to load a jupyter notebook server locally on anti-villain, my office computer image

and detect the GPU with PyTorch.

Unfortunately, even after opening two different TCP ports for my AWS EC2 instance, telnet to the public domain or the public IPv4 address resulted in a refused connection.

image

Running autossh-ai.sh manually appears to show the connection getting broken and re-established.

debug1: Local connections to LOCALHOST:30000 forwarded to remote address 127.0.0.1:30000
debug1: Local forwarding listening on ::1 port 30000.
debug1: channel 0: new [port listener]
debug1: Local forwarding listening on 127.0.0.1 port 30000.
debug1: channel 1: new [port listener]
debug1: Remote connections from LOCALHOST:30000 forwarded to local address 127.0.0.1:30001
debug1: Remote connections from LOCALHOST:9999 forwarded to local address localhost:8888
debug1: ssh_init_forwarding: expecting replies for 2 forwards
debug1: Requesting [email protected]
debug1: Entering interactive session.
debug1: pledge: filesystem
debug1: client_input_global_request: rtype [email protected] want_reply 0
debug1: client_input_hostkeys: searching /home/ppham/.ssh/known_hosts for indira.arcology.builders / (none)
debug1: client_input_hostkeys: searching /home/ppham/.ssh/known_hosts2 for indira.arcology.builders / (none)
debug1: client_input_hostkeys: hostkeys file /home/ppham/.ssh/known_hosts2 does not exist
debug1: client_input_hostkeys: no new or deprecated keys from server
debug1: Remote: /home/ubuntu/.ssh/authorized_keys:7: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug1: Remote: /home/ubuntu/.ssh/authorized_keys:7: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug1: remote forward success for: listen 30000, connect 127.0.0.1:30001
debug1: remote forward success for: listen 9999, connect localhost:8888
debug1: forwarding_success: all expected forwarding replies received
debug1: client_input_channel_open: ctype forwarded-tcpip rchan 4 win 2097152 max 32768
debug1: client_request_forwarded_tcpip: listen localhost port 9999, originator 127.0.0.1 port 51706
debug1: connect_next: host localhost ([127.0.0.1]:8888) in progress, fd=6
debug1: channel 2: new [127.0.0.1]
debug1: confirm forwarded-tcpip
debug1: channel 2: connected to localhost port 8888
debug1: channel 2: free: 127.0.0.1, nchannels 3
debug1: Connection to port 30000 forwarding to 127.0.0.1 port 30000 requested.
debug1: channel 2: new [direct-tcpip]
debug1: client_input_channel_open: ctype forwarded-tcpip rchan 5 win 2097152 max 32768
debug1: client_request_forwarded_tcpip: listen localhost port 30000, originator 127.0.0.1 port 55752
debug1: connect_next: host 127.0.0.1 ([127.0.0.1]:30001) in progress, fd=7
debug1: channel 3: new [127.0.0.1]
debug1: confirm forwarded-tcpip
debug1: channel 3: connected to 127.0.0.1 port 30001
debug1: channel 2: free: direct-tcpip: listening port 30000 for 127.0.0.1 port 30000, connect from 127.0.0.1 port 34994 to 127.0.0.1 port 30000, nchannels 4
debug1: channel 3: free: 127.0.0.1, nchannels 3

Starting a jupyter notebook like this on anti-villain gives a clue, that despite the command-line flag to listen for connections incoming to any hostname, we may still need to add the correct config to the file (and to c.ServerApp.ip)

$ jupyter notebook --port 8888 --ip '0.0.0.0'
[I 2024-06-01 14:29:25.832 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-06-01 14:29:25.834 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-06-01 14:29:25.837 ServerApp] jupyterlab | extension was successfully linked.
[W 2024-06-01 14:29:25.838 JupyterNotebookApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[I 2024-06-01 14:29:25.840 ServerApp] notebook | extension was successfully linked.
[I 2024-06-01 14:29:26.042 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-06-01 14:29:26.065 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-06-01 14:29:26.067 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-06-01 14:29:26.068 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-06-01 14:29:26.071 LabApp] JupyterLab extension loaded from /home/ppham/.pyenv/versions/llama/lib/python3.10/site-packages/jupyterlab
[I 2024-06-01 14:29:26.071 LabApp] JupyterLab application directory is /home/ppham/.pyenv/versions/llama/share/jupyter/lab
[I 2024-06-01 14:29:26.071 LabApp] Extension Manager is 'pypi'.
[I 2024-06-01 14:29:26.111 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-06-01 14:29:26.112 ServerApp] notebook | extension was successfully loaded.
[I 2024-06-01 14:29:26.113 ServerApp] Serving notebooks from local directory: /home/ppham/src
[I 2024-06-01 14:29:26.113 ServerApp] Jupyter Server 2.14.0 is running at:
[I 2024-06-01 14:29:26.113 ServerApp] http://anti-villain:8888/tree?token=7f7aa47e0c412d28bd48c58a3c1efde6e3b191c54eb92ada
[I 2024-06-01 14:29:26.113 ServerApp]     http://127.0.0.1:8888/tree?token=7f7aa47e0c412d28bd48c58a3c1efde6e3b191c54eb92ada
[I 2024-06-01 14:29:26.113 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-06-01 14:29:26.176 ServerApp]

    To access the server, open this file in a browser:
        file:///home/ppham/.local/share/jupyter/runtime/jpserver-3020350-open.html
    Or copy and paste one of these URLs:
        http://anti-villain:8888/tree?token=7f7aa47e0c412d28bd48c58a3c1efde6e3b191c54eb92ada
        http://127.0.0.1:8888/tree?token=7f7aa47e0c412d28bd48c58a3c1efde6e3b191c54eb92ada

Here's a Stackoverflow describing how to generate the jupyter config file

After running this command initially

jupyter notebook --generate-config

This generates the file

~/.jupyter/jupyter_notebook_config.py

and the following line should be added at the end

Still no luck. It's not Jupyter notebook, even a netcat listener is not receiving requests.

Solution

Some stackoverflow answers about sshd_config provided the solution.

The two options we need to enable in /etc/ssh/sshd_config on the public AWS jump server is

TcpForwarding=yes
EnableGateway

and restarting

sudo systemctl restart sshd

Now after runnnig autossh-ai.sh and installing jupyterhub

jupyterhub --ip 0.0.0.0

And reverse-proxying with nginx to use our TLS certs, we have success

image

Well, after fashion. I can't reset the admin password, and still can't log in. But until next time.

2024-05-30 AI Week 09

My goal today was to generate text in a loop based on user input. I added a command-line option to switch between training and inference to gpt_train.py.

I was able to train 10 epochs, but I'm not sure whether it was the Mark Twain autobiography or another dataset because I was working on a shared GitPod workspace.

Ep 10 (Step 000195): Train loss 0.459, Val loss 7.429
Every effort moves you don't have the secrets, the answers, so this question still rings true, Jon looks up and I was youthful, and I was looking for... this is the kind of thing I'm talking about.    These are the things from

The validation loss still remains high relative to the training loss, so the next thing I want to try is to see how the validation data is sliced from the main training data, or to give it a seed phrase taken from the original text.

2024-05-20 SC Week 08 Prep

I found the step I was missing from the tutorial to connect our webpack'd web app to our custom wasm file.

I made a change to the lib.rs Rust application

and re-ran

wasm-pack build

I did have to install it again

cargo install wasm-pack

which I believe is because GitPod only persists certain temporary layers within its Dockerfile, which are those that are tracked inside the git repo.

I changed the greet message in www/index.js, ran

npm run build

to webpack it, and

npm start

to serve it and made the web app available locally.

2024-05-16 AI Week 07 Lab

Starting Chapter 3, to train simple attention weights.

Section 3.3.1 untrainable weights first

["Your", "journey", "starts", "with", "one", "step"]

It looks like we compute them first for a single query token (the second token $x^2$), and the weights represent how much that query token should pay attention to the other tokens in the chunk.

image

$ python3 3_3_1_untrainable.py 
tensor([0.9544, 1.4950, 1.4754, 0.8434, 0.7070, 1.0865])

Because the 3-dimensional embedding for the above chunks was copied and pasted, the dot-product between the query token and each of the other tokens in the chunk is the overlap or similarity.

Rolling our own dot product should give us the same result (and it does)

image

Overlap should be 0.9544: 0.9544000625610352

and summing the attention weights / overlaps, gives us a sum for normalizing.

image

Attention weights: tensor([0.1455, 0.2278, 0.2249, 0.1285, 0.1077, 0.1656])
Sum:  tensor(1.0000)

We smooth / normalize with softmax, which I suspect is like the sigmoid function we used for clamping / smoothing neural network inputs.

Attention weights: tensor([0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581]) Sum: tensor(1.)

Attention weights: tensor([0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581])
Sum: tensor(1.)

2024-05-15 AI Week 07

Questions that came up during morning discussion.

Are Embedding Matrices multiplied with a single chunk vector, or a batch matrix, on the right or left?

Is a scalar a zero-dimensional tensor?

Why is adding the positional embedding to the token embedding the right thing to do?

Can we just keep the two embedding outputs separate?

2024-05-13 SC Lab 07

Committed beginning Rustwasm Game of Life in PR

image

Stuck on customizing the greet method to include my name in the string.

image

2024-05-09 AI Lab 06

Show Overlapping (x,y) Pairs

This code encodes text with tiktoken and prints out a sample (x,y) pair, shifted by one token so that y contains the predicted next token.

enc_sample = integers[50:]

context_size = 4

x = enc_sample[:context_size]
y = enc_sample[1:context_size+1]

print(f"x: {x}")
print(f"y:        {y}")

from dataloader import create_dataloader_v1

dataloader = create_dataloader_v1(raw_text, batch_size=1, max_length=4, stride=1, shuffle=False)

data_iter = iter(dataloader)
first_batch = next(data_iter)
print(first_batch)
x: [220, 220, 2048, 645]
y:        [220, 2048, 645, 8733]

Batches with stride and max_length

This code creates a new dataloader by combining our custom dataset training example creator with PyTorch's built-in dataloader.

from dataloader import create_dataloader_v1

dataloader = create_dataloader_v1(raw_text, batch_size=1, max_length=4, stride=1, shuffle=False)

data_iter = iter(dataloader)
first_batch = next(data_iter)
print(first_batch)

second_batch = next(data_iter)
print(second_batch)

This produces the following output

[tensor([47044,    46,  3483, 49656](/TheEvergreenStateCollege/upper-division-cs-23-24/wiki/47044,----46,--3483,-49656)), tensor([   46,  3483, 49656, 31300](/TheEvergreenStateCollege/upper-division-cs-23-24/wiki/---46,--3483,-49656,-31300))]
[tensor([   46,  3483, 49656, 31300](/TheEvergreenStateCollege/upper-division-cs-23-24/wiki/---46,--3483,-49656,-31300)), tensor([ 3483, 49656, 31300,    56](/TheEvergreenStateCollege/upper-division-cs-23-24/wiki/-3483,-49656,-31300,----56))]

We set a max_length (chunk size) of 4, so we see each training example to have 4 tokens. The stride is 1, so we see successive (x,y) pairs have an overlap, and the x of the second batch just advances one new token (the sliding window) from the y of the first batch.

We all see each call to next of the Python iter contains a batch of size one, which means just one (x,y) pair.

2024-05-06 SC Lab 06

Step 1

In this generated city

Hello, city!
Added Road 8
Added Road 14
Added Road 17
Added Road 25
Added Road 33
Added Road 41
Added Road 46
Added Road 5
Added Road 8
Added Road 14
Added Road 17
Added Road 22
Added Road 27
Added Road 34
Added Road 40
Added Road 47
The number of generated addresses is 1086
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
##################################################
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
##################################################
##################################################
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..
##################################################
##################################################
oooooooo#ooooo#oo#ooooooo#ooooooo#ooooooo#oooo#ooo
.......o#o...o#oo#o.....o#o.....o#o.....o#o..o#o..

1,086 addresses for generated and populated into the hash map.

As an overestimate:

  • City of size 50

  • For 7 avenues, with locations both east and west, 7 * (50 + 50) = 700

    • This is an overestimate because some avenues may be leftmost or rightmost
  • For 9 streets, with locations both north and south, 9 * (50 + 50) = 900

    • Some streets may northmost or southmost.
  • Overestimate total is 1,600

  • Three double-streets above (two streets directly adjacent)

    • "two-lane" or "double-wide" means subtract 3 * 2 * 50
  • 1,600 - 300 = 1,300

  • Every street and avenue intersect and overlap in 4 addresses.

  • 7 * 9 = 63 intersections, so subtract 4 * 63 addresses.

  • 1,300 - 252 = 1,048

Three generated addresses (from a separate random city) showing that the code from this morning works.

The address at coordinates (47, 9) is 667 Avenue 48 
The address at coordinates (4, 3) is 6 Avenue 5 
The address at coordinates (14, 16) is 129 Avenue 13 

Morgan found the bug for addresses around East-west streets. image

After moving the code where we initialize address_counter to reset after every road

image

we print out all the addresses at 16 to show that it's being re-used.

Reused address number 16 Avenue 6
Reused address number 16 Avenue 13
Reused address number 16 Avenue 17
Reused address number 16 Avenue 24
Reused address number 16 Avenue 33
Reused address number 16 Avenue 38
Reused address number 16 Avenue 44
Reused address number 16 Avenue 49
Reused address number 16 Street 3
Reused address number 16 Street 6
Reused address number 16 Street 12
Reused address number 16 Street 15
Reused address number 16 Street 20
Reused address number 16 Street 29
Reused address number 16 Street 36
Reused address number 16 Street 45
Reused address number 16 Street 49

2024-04-28 Rust

Rustlings exercises

primitive_types4.rs

    let nice_slice = &a[1..4];

I have to remember that the end index is not inclusive, like in Javascript and Python. I don't completely understand why the borrow & is necessary here, as does it even make sense to slice an array without borrowing?

primitive_types5.rs

    let cat = ("Furry McFurson", 3.5);
    let (name, age) = cat;

    println!("{} is {} years old.", name, age);

A tuple will work here but not a list, as unlike Python lists, Rust lists apparently need to be all the same type.

primitive_types6.rs

    let numbers = (1, 2, 3);
    // Replace below ??? with the tuple indexing syntax.
    let second = numbers.1;

    assert_eq!(2, second,
        "This is not the 2nd number in the tuple!")

The Rust tuple indexing seems a little goofy and inconsistent. Why shouldn't we index them with the same square brackets as Vecs? Presumably because they are fixed size. But then, how can you programmatically access a tuple element?

Hmm, it is the static type checking that is preventing dynamic tuple access.

move_semantics1.rs

    let mut vec = vec;

    vec.push(88);

Software Construction

To make progress on a tic-tac-toe "engine" that lets two players play each other over the network, I copied the single-threaded TCP listener from the Rust Book web server example.

I also used Gavin's tic-tac-toe board display formatter as a nicer ASCII representation and added a "Next " string at the end to indicate whose turn is next. This I think will be important as soon as you are passing boards back and forth between two solvers and need to validate that the right player is making the next move.

-------------
| X | O | O |
-------------
|   | O | X |
-------------
| X | O | X |
-------------
Next O

I'm not completely settled on the name "engine" but it does tick the game state forward.

2024-04-27 Discrete Math

Today I'm finishing some identities for use by Discrete Math students with some example problems, including what counts as allowable "moves" or algebraic manipulations.

I'd also like to demonstrate some proof bugs to better understand the proof process myself and generalize some lessons.

I realized I was perhaps drilling down a bit too deep when I found Andrew Alexander's site with a good summary of exponent identities and example problems.

I wrote a brute force string edit-distance algorithm using recursion for Smarty Plants. The next step is to try and combine it was some kind of string finder, like a prefix or suffix tree.

2024-04-25 AI Afternoon Lab

"Which voice will you choose for finetuning / synthesizing in the lab this afternoon? What ethical case can be made for this choice?"

I will be using my own voice for this afternoon, but I will collect it from a publicly available recording from an Evergreen youtube video. My justification is that it is my voice and I give myself consent for this particular one-time use, and I'd like to exercise the toolchain of LiveRecorder plugin to webm to ogg to splitting it into short WAV files.

My initial use of Warren Buffett's voice for reading Berkshire Hathaway annual letters I now find problematic, especially since the passing of his partner Charlie Munger. They both have had decades to consent to such recordings, and already have a wealth of their voice online saying exactly what they want to say and offer to the world.

As a precedent, people play recordings of loved ones who have passed on to remember them without special consent. This is not considered compelling others to perform on-command. My standard would be: if this person were alive and present for this playing of a recording or voice synthesis, would they be pleased by it? If I'm playing a recording to remember someone I miss, I think that would be well received. If I'm doing a functional task, like driving on a long road trip and are choosing to have someone's voice read a book, that seems like a much more gray area, especially if they have not licensed their voice for this purpose, like audiobook authors have done.

Post Activity Reflection: What emotional reaction do I have from hearing several synthetic voices, including mine, Richard's, and Steve Jobs?

I simulated all of them reading the novel Middlemarch by George Eliot, because it was famous as being this great piece of British literature with good characterization, it was written by a woman under a male pseudonym, and I've always meant to read it.

Hearing my own voice produced a mix of surprise and fascination at how believable I thought it was and embarrassment that it was self-indulgent. Hearing Richard's voice was humorous, because I didn't think it was very close but also reminiscent, and also irritated me to think I could find an optimization to improve the likeness. Hearing Steve Jobs's voice was the closest to the original of all, and I felt the same surprise and fascination at the technology but also anxiety that it somehow lessens original human readers, or the recordings that we do have of Steve Jobs where the words have a consciousness behind them.

Hearing a synthetic voice read to us can entertain or inform us, but it seems close to digital servitude in a way that listening to recordings don't, because we choose the words as well as the occasion. Examining my goals, I seem to want the novelty of famous or other people helping me absorb a written work, which the author themselves probably intended to be enacted and imagined by the reader (in silence) or a live performance.

2024-04-16 MNIST Training 2024-04-06 GPU Installation

26 March 2024 - Tuesday

I started reading Chapters 1 and 2 of Sebastian Rashka's "Build a Large Language Model (LLM) from Scratch) published by Manning.

There is no code in Chapter 1, just a brief introduction to how artificial intelligence is related to other terms that have become popular in computation:

  • machine learning
  • deep learning
  • generative AI

I agree with most of the definitions, and the Venn diagrams are helpful.

How are Large Language Models related to Generative Pre-trained Transformers? Are both AI systems, in the sense that they are a data structure which represents trained weights and parameters (the model), as well as algorithms which builds that model from an input training dataset and then uses the model to perform inference tasks?

Is inference the same as running the model in reverse with randomness to do the generative part?

Where does the randomness come from?

According to Rashka, there are LLMs which are not GPTs, and vice versa, so that neither one is a proper subset of the other. I remain skeptical and will keep looking for examples.

Other useful demos to get students in the right frame of mind:

  • MNIST demo of handwritten digit recognition, to generate and visualize a neural network
  • Eleven Labs, or whichever zero-shot TTS voice offshoot I tried this summer, that grew out of a paper I tried to read

Some questions I want to explore tomorrow are related to the parameters of LLMs

  • How is the size of the encoding related to the number of parameters?
    • For example, calling GPT-3 a 175 billion parameter model
    • I downloaded and tried to use LLaMa's 13 billion parameter version, but it needed about 40 GB to load

I'm working on translating this Python code into Rust

https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb

12 March 2024 - Tuesday

Morning - Web Front-end Class

I created a static profile.html with a hard-coded JSON to practice rendering a logged-in user's data. This will come from a fetch to a public API when it's ready.

This is data from two random U.S. cities from my database

[
        {
            "id": 12357,
            "name": "Barstow",
            "createdAt": "2024-03-05T17:56:43.302Z",
            "latitude": 34.8661,
            "longitude": -117.0471,
            "population": 25235,
            "authorId": 1
        },
        {
          "id": 12358,
        "name": "Avon Lake",
        "createdAt": "2024-03-05T17:56:43.322Z",
        "latitude": 41.4944,
        "longitude": -82.0159,
        "population": 25220,
        "authorId": 1
        }
    ]

I added this script to render the above data, one "card" per city.

    const bodyTag = document.getElementsByTagName("body")[0];

    for (let i in jsonData) {
        const cityCard = document.createElement("div");
        cityCard.classList = ["city-card"];

        const cityName = document.createElement("header");
        cityName.innerHTML = `<b>${jsonData[i].name}</b>`;
        
        cityCard.appendChild(cityName);

        const cityLat = document.createElement("p");
        cityLat.innerHTML = jsonData[i].latitude;

        const cityLong = document.createElement("p");
        cityLong.innerHTML = jsonData[i].longitude;

        cityCard.appendChild(cityLat);
        cityCard.appendChild(cityLong);
        
        bodyTag.appendChild(cityCard);
    }

The page rendering two cities currently looks like this

image

My next step is to add text labels to the fields: latitude, longitude, population, and put them on separate lines. Maybe add an image or link to a Google Maps page for that city.

Office Hours - Afternoon

I worked with Duc to run his API server locally on his windows laptop. I recommended moving his local repository to WSL because of a Typescript / npm package signature error.

After moving all these files in upper-division-cs into his WSL volume, all the files have Dos line endings (!!) and we shouldn't commit these into git.

I found this Stackoverflow post which recommended the find command with some flags that I wasn't familiar with.

find . -type f -print0 | xargs -0 dos2unix

This link explains what the -type f does:

The -type f option here tells the find command to return only files. If you don’t use it, the find command will returns files, directories, and other things like named pipes and device files that match the name pattern you specify. If you don't care about that, just leave the -type f option off your command.

This other link explains the print0 flag which outputs all the files separated by the ASCII null character. This allows filenames to have spaces in them, which could realistically happen in a Windows filesystem, and the Stackoverflow post was just being safe.

The xargs -0 interprets this null separator correctly, and then converts that standard input into arguments for the dos2unix command.

An equivalent way is to run dos2unix first on the output of the file command

dos2unix $(find . -type f)