The UFA (the core AI concept) explained - terrytaylorbonn/auxdrone GitHub Wiki
26.0306 Lab notes (Gdrive) Git
This page is no longer maintained. See the new page
I love work that involves analysis, getting to the gist of complex topics. But AI has been a hard nut to crack. I am making progress, but still work to do. This page is a WIP; working notes are in docx #600.
This wiki page TOC (currently 3 and 5 are kind of the same... they wont be after I clean them up)
- 2 What is human intelligence?
- 3 How LLM AI really works (GPT-3) (transformer (TF) basics).
- 1 The last half century of development that has led to practical AI
- 1.1 Binary computing based on integrated semiconductors
- 1.2 AI based on binary computing
- 4 Why TFs can be used for many types of applications (not just for LLMs).
- 5 TF (GPT-3) algorithm details (my unique analysis). AI is computational binary algorithms. Therefore understanding AI requires a bit of effort. This section tries to minimize that effort.
2 What is human intelligence?
A mystery that is hosted/incarnated in the human brain. The most fascinating thing in the universe.
Human conversation is basically
- Input = hear or read words.
- Language = a very primitive inexact method of communicating thoughts, concepts, ideas.
- Each language has different structures for distilling / encoding meaning into written/audio form.
- You read words, you formulate thoughts, and you distill your response into language-specific text/words.
3 How LLM AI really works (GPT-3)
3.1 AI (on the other hand)
- Has no intelligence, no thoughts.
- Can not duplicate intelligent conversation, human or otherwise. It can only simulate.
3.2 So how does AI do what it does?
- Works at electronic speeds and power levels (the brain is limited to 10-100 m/sec and 20W).
- Thus it can use massive brute force computing
- to convert input human language text into "machine" language that is vastly more deterministic than human language.
3.3 12288 floating point vectors per token
LLM machine language "spells" a token with 12288 FP numbers (in reality those numbers do a lot more than just define the token; more on that later).
At first you input tokens. 8 bit ASCII encodings of words. Then the TF converts these to an embedding (12288 numbers for each token). Then the TF starts to run massive computational statistical analysis on these numbers. The 12288 numbers are no longer embeddings. They are “hidden” layers. Which means they are encoding deep meaning of
- the token (what it means based on other tokens)
- the hierarchical collective meaning of the complete set of input tokens
3.4 The TF has 96 layers in which these numbers are processed.
Each layer produces a more exact representation of each token and the collective meaning.
Following shows the TF (transformer) overall loop. Each layer has 2 main parts: AH / FFN. The AH/FFN are inside the blue box (subloop B).
3.4.1 Attention head (AH)
AH's basically share info between tokens. This is critical for 2 things:
- Determine token context (how word meanings changed based on other words; river “bank” or savings “bank” for example).
- Build a hierarchical high level context of the meaning of all tokens taken together.
PS: The name “attention” is a misleading marketing term to give non-experts the feeling that AI has intelligence. TFs have no “attention” ability. Its all number crunching, statistics.
3.4.2 FFN (feed forward network)
FFN is a neural network (NN) that
- inputs the 12288 numbers for a token,
- detects high level meaning from those numbers, and then
- adds those detected (or modified) meanings back into those 12288 numbers for the same token.
PS: FFN is also a very misleading term. It does not say what the FFN is, but rather what it is not. Before TFs there were RNNs, a different kind of NN that had a feedback mechanism. But modern TFs have no feed “back”, only "feed forward". :) I never read this honest straight forward explanation of the name anywhere (I figured it out after repeated chats with GPT).
Below: Simple FFN (vastly simplified, needs work) from docx #600
3.5 "Simple Emergent Structure Diagrams (LLM Attention Across Layers)" (requires 96 AH/FFN layers)
Those 96 layers gradually compute the detailed hierarchical meaning of the entire set of input tokesn (original prompt + previously generated answer tokens).
These structures are what I consider to be foundational tools used to create seemingly intelligent chat responses. See the chat notes in doc "#600_core_AI_concepts_v04_26.0302" chapter "26.0302-3 afternoon: KEY CONCEPT: AH forms structure".
Input tokens
Middle layer
Late layer
Final layer
3.6 Determine the next token
The final hierarchical meaning is stored in the last token's 12288 numbers. These are analyzed to determine what the next token should be. Note that this is not simply computing the next token from the last token (as you always hear).
3.7 Why generate the final token like this?
Because this is how the LLM was trained (programmed). The above inference (runtime) process in general is a mirror image of the training process. The design of the LLM is driven primarily by the requirements of training.
1 The last half century of development that has led to practical AI
There is no better example of AI hype than at the beginning of this video when Marc Andreessen claims that the computer industry 80 years took a wrong turn when it decided to build binary computers rather than AI computers. That is one very bizarre claim (I find it hard to believe Andreessen really said that). So I asked GPT "Were “AI Computers” Possible 80 years ago? The answer:
"Modern AI systems require:
- Massive matrix multiplications
- Floating-point arithmetic
- Huge memory bandwidth
- Billions of parameters
Even a small GPT-class model would have been unimaginable in 1950 hardware terms. So saying the industry “chose binary instead of AI” is historically misleading. AI requires digital computation. It was not an alternative path."
"Historically misleading". Typical GPT politicized answer.
The following summarizes the basic history of binary and AI computing (my own perspective, from the first page of docx #600). AI is booming now because the required foundational tech only recently became available.
1.1 Binary computing based on integrated semiconductors
1.1a For the past 50 years, we sometimes did not even know why certain aspects of integrated semiconductors worked. Researchers simply tested, discovered, and then tried to figure out why it worked.
1.1b But it all stabilized gradually. Binary computing based on integrated semiconductor transistors has been making unimaginable gains ever since. When I worked in an IBM VLSI test lab in the early 1980s, state of the art was 2-3 microns (2000-3000 nm) and 25MHz with yields ~10%. Now state of the art (set by world leader Taiwan) is clock frequencies in GHz and 3-5 nm.
Below: Intel 4004 from 1971
1.2 AI based on binary computing
1.2a The era of practical AI has only recently started thanks to recent gains in performance in the semiconductor transistor-based GPUs that it runs on. Experts are often not sure exactly how certain aspects of AI work, but they continue to test, discover, and then develop theories that explain it all.
1.2b But AI is developing at a much faster rate than semiconductors. Where this will lead is hard to say. But as I recently mentioned in a chat with ChatGPT, the stunning performance and usefulness of GPT (something I could not have imagined just a few years ago) shows that the massive investments in AI are not a bubble.
Below: GeForce RTX (left), UFA 3d map (center; from Welch labs), and resulting logic map (right)
4 The analogous similarites of AI GPU algorithms for different types of apps
(basic idea: for example, CNN and TF similarities)
5 Basic LLM transformer (TF) algorithm details (GPT-3)
This (optional) section goes into tech detail (with my own diagrams).
- 5.1 Overall TF loop
- 5.2 AH
- 5.3 FFN
5.1 Overall loop
Diagrams below from "__ziptieai_BOOK2_LLM_v86_251230.docx" sections
- "5.1.1 Inference workflow overview" and
- "17.1.1 AH (attention heads) 25.1215"
Following shows how the input to the TF starts out as a prompt, and then after each loop the new token (computed by the TF) is added to the input.
Following shows the TF (transformer) overall loop. The AH/FFN are inside the blue box (subloop B).
Following shows the AH / FFN inside the subloop B.
5.2 AH algorithm basics
The AH algorithm is perhaps more complex than that of the FFN. For details see "__ziptieai_BOOK2_LLM_v86_251230.docx" section
- "5.1.1 Inference workflow overview" and
- "17.1.1 AH (attention heads) 25.1215"
Note that AH's for all tokens are run in parallel.
Following shows how AH is computed. Again, this is much more complex than meets the eye.
In section 17.1.1 of hte docx I created my own notation that makes the details clear (requires a bit of study to figure it out). GPT confirmed that my notation and text are correct.
5.3 FFN algorithm basics
The FFN (to me) is the main part of the UFA. The most basic FFN example is discussed in "3 Very basic UFA example" of docx #600.