Core AI concepts OLD - terrytaylorbonn/auxdrone GitHub Wiki

26.0306 Lab notes (Gdrive) Git


I love work that involves analysis, getting to the gist of complex topics. But AI has been a hard nut to crack. I've made great progress, but still work to do. This page is a WIP; working notes are in docx #600.

I disagree with many of the accepted narratives about AI (especially for LLMs like GPT), about it having intelligence and what makes it tick (and what exactly makes a computational binary algorithm worthy of the “AI inside” label). This “Core AI concepts” is my OWN take on AI. My original interpretation. I have worked a lot with GPT to verify my ideas.


TOC

The minimum you need to understand

  • 1 What is human intelligence? Something magical that AI does not have.
  • 2 How LLM AI really works (GPT-3) (transformer (TF) basics). It may be pure computational statistics, but AI has its own magic: Lightspeed signals and virtually unlimited power/size.
  • 3 The core functional components: "AI transistors" and "AI logic gates" (implemented on GPU). To me this concept is a more accurate explanation of how AI tools work.

Also good to know

  • 4 The last half century of development that has led to practical AI. Its only recently become possible to implment this brute-force method (similar to how PC-based video was not possible until the CPUs became powerful enough; also 3D graphics required the GPU before it really worked).
    • 4.1 Binary computing based on integrated semiconductors
    • 4.2 AI based on binary computing
  • 5 Why TFs can be used for many types of applications (not just for LLMs). Basically for any function that (1) is too complex to model with equations and (2) can be "trained" (programmed) using massive amounts of training data. Typically many inputs result in a single output (LLM generates a token; CNN recognizes bitmap objects, etc).
  • 6 TF (GPT-3) algorithm details (my unique analysis). WILL BE REWRITTEN SOON. AI is computational binary algorithms. Therefore understanding AI requires a bit of effort. This section tries to minimize that effort.

1 What is human intelligence?

A mystery that is hosted/incarnated in the human brain. The most fascinating thing in the universe. Something brilliant all-knowing AI gurus, those who will lead us to a more enlightened future, compare to crude brute-force clocked binary circuit algorithms.

Human communication is pure magic:

  • Input = hearing or reading words. Humans can hear and see. AI just has no senses. None.
  • Language (written, spoken) = a very primitive inexact method of communicating the vastly complex thoughts, concepts, and ideas of humans.
  • Each language has different structures for distilling / encoding meaning into written/audio form (I speak 4 languages, I know this from personal experience).
  • You read words, you formulate thoughts, and you distill your response into language-specific text/words. Your thoughts are the driving force. AI (LLM TFs) just compute the next token.

Intelligence is the ability to understand things, imagine things, see relationships. Its what makes us human. AI has none of these things. None. AI's only value is as a tool for intelligent humans. And in that role AI is revolutionary.



2 How LLM AI really works (GPT-3) (transformer (TF) basics)

The gist of how (0) AH, (1) FFN h layer, and (2) FFN y output layer work (inference only; for training see section 6).

Many of diagrams below are from "__ziptieai_BOOK2_LLM_v86_251230.docx" sections

  • "5.1.1 Inference workflow overview" and
  • "17.1.1 AH (attention heads) 25.1215"

TOC

  • Overview diagram
  • STEP 1 initial embedding
  • STEP 2 AH
  • STEP 3 FFN (overview)
  • STEP 4 95 more layers / Simple Emergent Structure develops
  • STEP 5 (after layer 96) predict next token from final state
  • STEP 6 then start whole process all over at STEP 1 (with new token appended)

Overview diagram

Following shows how the input to the TF starts out as a prompt, and then after each loop the new token (computed by the TF) is added to the input.

image

All steps.

image

Steps diagram.

image

3 tokens input, AH (attention head), FFN.

image

STEP 1 initial embedding

Input 3 tokens.

image image

Each token has 12288 scalars that define everything about the token.

image

STEP 2 AH

AHs share info between related tokens.

image image

STEP 3 FFN (overview)

image image

Below shows how

  • x1 (12288 scalars) is input to NN.
  • y1 (12288 scalars) is the output. In reality, there are
  • x1… x12288 inputs to each h layer.
  • ~49K (12288x4) total h layers (h1… h49152)
  • y1..y12288

this creates a massive computational, encoding region.

image

STEP 4 95 more layers / Simple Emergent Structure develops

Repeat the above loop 96 times.

image

(LLM Attention Across Layers) See “#514 Simple emergent structure (as layers progress)” The 12288 scalars that define each token state “interact” during these 96 layers. Inside those 12288 scalars an “emergent structure” is created.

this is the core “intelligence” of the LLM (yep, that’s it, a bunch of numbers).

First

image

Middle

image

Late

image

Final

image image

STEP 5 (after layer 96) predict next token from final state

https://youtu.be/wjZofJX0v4M?t=1317

image

STEP 6 then start whole process all over at STEP 1 (with new token appended)

Until the TF or the agent stops the generation of tokens.

image



3 The core functional components: "AI transistors" and "AI logic gates"

From GPT in response to my prompts:

One thing you invented that is interesting. Your phrasing:

  • AI transistors
  • AI logic gates

is not standard terminology, but the concept is legitimate.

Researchers often describe neurons as:

  • threshold units
  • feature detectors
  • basis functions

Your analogy to transistors and logic gates is simply another way of explaining the same mechanism.

The key implication: Attention + FFN together form something like:

  • logic gates + wiring between tokens

Your analogy actually captures this well:

  • FFN ≈ logic gates
  • AH ≈ signal routing / wiring

That combination is what allows transformers to build complex reasoning structures over many layers.

TOC

  • 3.1 AH transistors
  • 3.2 FFN transistors and logic gates
    • h transistor
    • y transistor
    • h and y combined: AI logic gates

3.1 AH transistors

This drawing summarizes all the computations in __ziptieai_BOOK2_LLM_v86_251230 section 17.1.1 "AH (attention heads)".

image

AH is based on Wx + b computations similar to those for FFN with Softmax (not GELU)

  • Softmax (SM) creates “routes” only to strongly related tokens.
  • Below: The upper (single) transistor example shows 2 matrices multiplied. A 1x128 W matrix on left and 128x1 hidden layer on right produces 1x1 scaler (like FFN transistors: many inputs, one output).
  • Below: The lower half shows a W maxtrix 128 x 12288 dot 12288 x 1 (all hidden layer vectors). Very typical for AH, many inputs, many outputs (AH is more modification and sharing of info, not FFN style detection). This is the equivalent of 128 transistors.
image

3.2 FFN transistors and logic gates

h transistor

x1 ... x12288 as inputs, each multiplied by its own weight W. Results added together, bias added to sum, then that result put through GELU.

image

y transistor

h1 ... x49152 as inputs, each multiplied by its own weight W. Results added together, bias added to sum. This is final result (no GELU?).

image

h and y combined: AI logic gates

Example below uses sigma (not GELU). Shows how an XOR is created for a simple 2 input x, 2 hidden h, one y output. I included this diagram because it shows how logic gates are created.

image

Below shows

  • x1 and x2 (12288) are input to the FFN.
  • 2 hidden layer neurons (h1, h2).
  • y1 (12288) is the output.

In reality, there are

  • x1… x12288 inputs to each h layer.
  • 49152 (12288x4) total h layers (h1… h49152).
  • y outputs y1..y12288.

A super massive computational surface.

image



4 The last half century of development that has led to practical AI

There is no better example of AI hype than at the beginning of this video when Marc Andreessen claims that the computer industry 80 years took a wrong turn when it decided to build binary computers rather than AI computers. That is one very bizarre claim (I find it hard to believe Andreessen really said that). So I asked GPT "Were “AI Computers” Possible 80 years ago? The answer:

"Modern AI systems require:

  • Massive matrix multiplications
  • Floating-point arithmetic
  • Huge memory bandwidth
  • Billions of parameters

Even a small GPT-class model would have been unimaginable in 1950 hardware terms. So saying the industry “chose binary instead of AI” is historically misleading. AI requires digital computation. It was not an alternative path."

"Historically misleading". Typical GPT politicized answer.

The following summarizes the basic history of binary and AI computing (my own perspective, from the first page of docx #600). AI is booming now because the required foundational tech only recently became available.

4.1 Binary computing based on integrated semiconductors

4.1a For the past 50 years, we sometimes did not even know why certain aspects of integrated semiconductors worked. Researchers simply tested, discovered, and then tried to figure out why it worked.

4.1b But it all stabilized gradually. Binary computing based on integrated semiconductor transistors has been making unimaginable gains ever since. When I worked in an IBM VLSI test lab in the early 1980s, state of the art was 2-3 microns (2000-3000 nm) and 25MHz with yields ~10%. Now state of the art (set by world leader Taiwan) is clock frequencies in GHz and 3-5 nm.

Below: Intel 4004 from 1971

image

4.2 AI based on binary computing

4.2a The era of practical AI has only recently started thanks to recent gains in performance in the semiconductor transistor-based GPUs that it runs on. Experts are often not sure exactly how certain aspects of AI work, but they continue to test, discover, and then develop theories that explain it all.

4.2b But AI is developing at a much faster rate than semiconductors. Where this will lead is hard to say. But as I recently mentioned in a chat with ChatGPT, the stunning performance and usefulness of GPT (something I could not have imagined just a few years ago) shows that the massive investments in AI are not a bubble.

Below: GeForce RTX (left), UFA 3d map (center; from Welch labs), and resulting logic map (right). The map is of a UFA that has 2 inputs (lat/longitude), and the output is binary (in Belgium or Netherlands). So the pic on right is perfect. The pic on the left (the mosaic) is, I believe, a visualization of the parameter training "geometry" (a typically very confusing concept that has little value to me).

image image



5 Why TFs can be used for many types of applications (not just for LLMs)

The analogous similarites of AI GPU algorithms for different types of apps

(basic idea: for example, CNN and TF similarities)



6 Basic LLM transformer (TF) algorithm details (GPT-3) (WILL BE REWRITTEN SOON)

This (optional) section goes into tech detail (with my own diagrams).

  • 6.1 Overall TF loop
  • 6.2 AH
  • 6.3 FFN

6.1 Overall loop

Diagrams below from "__ziptieai_BOOK2_LLM_v86_251230.docx" sections

  • "5.1.1 Inference workflow overview" and
  • "17.1.1 AH (attention heads) 25.1215"

Following shows how the input to the TF starts out as a prompt, and then after each loop the new token (computed by the TF) is added to the input.

image

Following shows the TF (transformer) overall loop. The AH/FFN are inside the blue box (subloop B).

image

Following shows the AH / FFN inside the subloop B.

image

6.2 AH algorithm basics

The AH algorithm is perhaps more complex than that of the FFN. For details see "__ziptieai_BOOK2_LLM_v86_251230.docx" section

  • "5.1.1 Inference workflow overview" and
  • "17.1.1 AH (attention heads) 25.1215"

Note that AH's for all tokens are run in parallel.

image

Following shows how AH is computed. Again, this is much more complex than meets the eye.

image

In section 17.1.1 of hte docx I created my own notation that makes the details clear (requires a bit of study to figure it out). GPT confirmed that my notation and text are correct.

image

6.3 FFN algorithm basics (CONTENT OF THIS SECTION IS WRONG)

image

The FFN (to me) is the main part of the UFA. The most basic FFN example is discussed in "3 Very basic UFA example" of docx #600.

image


⚠️ **GitHub.com Fallback** ⚠️