Jonah AI HW 5 - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

Reading

Chapter 1 Questions

GPT (an acronym for "generative pretrained transformer") is a specific kind of LLM, specialized for text completion. LLM is a broader category of deep learning models based on transformers.
[A] Posts and responses from StackOverflow best resemble question/answer pairs.

Each response constitutes a label for the training datapoint. Votes on each response could be used to evaluate and compare the responses' correctness.
[C] The architecture described in Attention Is All You Need was designed to translate human languages.
The neural network of a deep learning model must have at least three layers.
Statements:

A. Pre-training is usually much more expensive and time-consuming than fine-tuning.

TRUE

B. Pre-training is usually done with meticulously labeled data while finetuning is usually done on large amounts of unlabeled or self-labeling data.

FALSE

C. A model can be fine-tuned by different people than the ones who originally pre-trained a model.

TRUE

D. Pre-training is to produce a more general-purpose model, and fine-tuning specializes it for certain tasks.

TRUE

E. Fine-tuning usually uses less data than pre-training.

TRUE

F. Pre-training can produce a model from scratch, but fine-tuning can only change an existing model.

TRUE
[A] GPT uses existing words in a sentence it has produced in the past as input.
Link to chat: https://chatgpt.com/share/1dc03e02-4c6c-4df5-aefb-593c2e168892

Both answering questions from provided question-answer pairs and identifying spam emails function using the next-word text completion capabilities of GPT. Both of these problems are solvable using text completion. GPT models are able to solve a wide variety of problems related to text processing and generation just because next-word prediction with a sufficient amount of training is so versatile.
Both [A] and [C], the encoder and the decoder, can be implemented as neural networks, as these are used to identify features in texts and convert features to text. The tokenizer can be implemented using simple text-processing tools. Tokenizing does not require the use of a neural network.
[A, B, C] as these models have been given no prior specific examples related to the inputs.
Zero-shot learning is the capability of an LLM to generate responses to inputs on which it hasn't been specifically trained. In few-shot learning, the user includes a few examples of desired input-response pairs to model the behavior.
GPT-3 uses 175 billion model parameters.

Chapter 2 Questions

LLMs cannot operate on words directly as the operations used to train and implement neural networks are designed to use numerical data.
An embedding is a representation of categorical data (e.g. words, sentences, images) in the form of "continuous" vectors.
At the end of the chapter, the author implements 256-dimensional embeddings.
These steps in order:

C. Breaking up natural human text into tokens

B. Giving unique token IDs (numbers) to each token

D. Converting token IDs to their embeddings, for example, using Word2Vec

A. Adding position embeddings to the token word embeddings

Human Writing

David Deutsch understands creativity to be the mental faculty of creation, "a bold conjecture that goes somewhere." A creative intelligence is capable of creating concepts and ideas which do not yet exist, or are inconceivable within the dominant ideological and intellectual frameworks. These concepts and ideas are not produced at random but are a logical development on their predecessors. Still, truly creative works aren't merely a continuation of or a cumulative development on what has come before, but constitute qualitative leaps which have the ability to transform fields in their entirety. This is like what Kuhn describes as revolutionary periods of scientific research in The Structure of Scientific Revolutions.

It's not unusual today to see AGI-optimists make bold predictions about the imminent and inevitable arrival of true artificial intelligence from large-language models. To determine whether an LLM could be capable of intelligence would require an understanding of just what it is to be intelligent. LLMs convert inputs into numerical representations. What it "perceives" are numbers and their quantitative relationships; common patterns found in its training data represented as points and vectors in high-dimensional space. Humans differ in that we can think and understand meanings. Humans can conceive what we perceive, transforming sense data into complexes of beings with quality rather than networks of data points which only differ. Humans think meanings with significance.

Creativity, as defined above, is dependent on this meaning-making faculty. Creation requires an understanding of concepts and how these relate to one another qualitatively. In other words, the creative must understand the meaning of the elements of their field if they are to develop it and further its principles. An LLM may be capable of devising never before seen sentences, paragraphs, books et.c., but these can never constitute a true conceptual development on its predecessors and their principles. No matter the quantitative increase in processing power and complexity, LLMs are structurally incapable of bridging the qualitative gap between mere pattern identification/synthesis and actual understanding. It seems unlikely then that the current dominant models of AI will be able to successfully create without human intervention and guidance.

I'm interested in the use of GPT-like chat bots for therapeutic use. Bots could introduce patients with social phobias, anxiety, or developmental disorders to social situations which might be overstimulating or intense if encountered in the real world. They could be used to provide patients with a safe, controlled environments for exposure therapy and experimentation. This could be combined with voice-generation and 3D characters rigged to speak along with the voice to simulate semi-realistic social situations chosen to meet the needs of the specific patient. Some of these principles are already in use, for instance in any one of the infamous "virtual girlfriend" apps which have become popular over the past few years.

If this technology were actually to be used medically it would probably need stricter safety guardrails to ensure patient safety.