AI‐24sp‐2024‐04‐03‐Morning - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

AI Self-hosting, Spring 2024

2024-04-03 - Week 01 - Morning

Welcome

The tone of this class will be one of technical understanding of artificial intelligence with a view outward to surrounding social considerations, context, and implications.

You will be programming, setting up hardware, doing devops, and practicing many skills learned in SOS and cyberdefense competitions last quarter, to deeply understand the engineering behind two simplified AI micro-systems:

first a neural network that reproduces the famous MNIST hand-written digit classifier
then a small generative pretrained model (GPT), a specific kind of large language model (LLM),
- trained on the upper-division-cs monorepo, or a dataset of your choice (public or a private one you have permission to use)

These micro-systems we will build and train here on a college campus, to mimic larger systems developed and hosted by corporations in data centers, to study considerations and a holistic context that might otherwise be hidden:

what is an AI model versus a system versus an algorithm?
how do we collect and prepare a dataset for AI training?
what are some parameters we can use to compare the size, quality, or other characteristics of AI models, systems, algorithms?
mechanical and electrical considerations:
- what is the electricity cost?
- how much space, what temperature and ventilation are needed?
- what is the equipment cost?
governance considerations
- what are some decisions that any creators / maintainers of an AI system must make, consciously or unconsciously?
- who chooses the decision makers over time?
- who pays for the system?
- who benefits from the system?
ethical considerations
- how does training an AI on data affect people who produce / generate the data, and the people who use it?
- does the AI have a responsibility to human creators or users?
- do we have responsibilities to AI systems we create

in the context of holistic understanding, and in helping humans to evolve and co-exist with biodiverse life forms.

A Brief History of AI

What is Artificial Intelligence and Why...?

... (Do We Study and Build It)

imitation?
understanding knowledge?
automation?
creation?
- synthetic life
- able to generate "novel" patterns

Babbage and Lovelace

Turing test

AI and Cybernetics

The Mechanical Turk, an illusion purporting to be a chess-playing automaton built in 1770
- toured Europe and America, playing many challengers including Benjamin Franklin and Napoleon Bonaparte
- won most (but not all) of its games
- could repeat the knight's tour, using a knight's moves to cover every square on a chess board exactly once
- was later revealed to be fraudulent; a human grandmaster was hidden inside directing the Turk's actions.

We pause here because the Mechanical Turk is an important metaphor for this class and AI understanding.
- Many viewers in the 1770s believed it was a real machine. Some reactions
  - to give up studying chess.
  - the Turk could be adapted to solve other problems
  - we could build multiple Turks
  - the government should outlaw this technology, or seize the Turk, or restrict its export to other countries
- Game-playing, especially chess, remained an important focus for the AI community for centuries *
- The Turk was a large machine that probably took many days to move and install each time.
  - How could it be improved?

1970s AI

focusing on search problems through a graph of possibilities

* General knowledge systems (Watson)
    * Kinds of machine learning
      * Supervised learning

* Algorithms for machine learning
            * support vector machines
	* supervised learning
* Deep learning (Geoff Hinton, Ian Goodfellow), 2006 -
	* OpenAI
	* unsupervised learning
    * OpenAI founded in 2015
	* AI art 
	* Deepfake images
	* "uncanny valley"
* GPT architecture 2017
    * Re-animating dead performers, posthumous performances
* ChatGPT 3 in 2022
	* general question answering 
	* "creative" acts like writing poetry, raps, lyrics, TV shows
* Generative AI boom
	* deepfake videos / voices / music
	* text-to-video
* Artificial General Intelligence (AGI)

AI & Ethics / Safety

alignment with humans
- apocalyptic / singularity human extinction scenarios
- even killing one human (Asimov's Laws of Robotics)
provenance against misinformation
- deepfake of gary gensler / bitcoin ruling

What is Self-Hosting and Why?

running software on your own hardware (or hardware that you control)
for economic sovereignty
for data sovereignty
- privacy: not revealing your queries, or your data
- right-to-
for learning
- to train and become an AI engineer and work in this industry
- to become a better user
ownership vs. licensing intellectual property
- end-user licenses of software

What is Artificial Intelligence and Why?

brief history
- Turing test / Dijsktra quote
- Relation to cybernetics
- Dartmouth Summer Research Workshop
- 1970s and chess playing
- General knowledge systems (Watson)
- Neural networks for machine learning
  - supervised learning
- Deep learning (Geoff Hinton, Ian Goodfellow), 2006 -
  - OpenAI
  - unsupervised learning
  - AI art
  - Deepfake images
  - "uncanny valley"
- GPT architecture 2017
- ChatGPT 2022
  - general question answering
  - "creative" acts like writing poetry, raps, lyrics, TV shows
- Generative AI boom
  - deepfake videos / voices / music
  - text-to-video
- AGI

AI & Ethics / Safety

alignment with humans
- apocalyptic / singularity human extinction scenarios
- even killing one human (Asimov's Laws of Robotics)
provenance against misinformation
- deepfake of gary gensler / bitcoin ruling

What is Self-Hosting and Why?

running software on your own hardware (or hardware that you control)
for economic sovereignty
for data sovereignty
- privacy: not revealing your queries, or your data
- related to right-to-repair
to learn
ownership vs. licensing intellectual property
- end-user licenses

Neural Network Activity

Read and solve exercises from 3blue1brown's Chapter 1: What is a neural network?

Answer the following questions in a dev diary entry wiki page, linked to from your personal dev diary.

Draw a hand-drawn digit in the first demo box and click "check digit".
Click on several of the input nodes (in the first layer) and see a corresponding pixel light up yellow on your input hand-drawn digit.
- If you had to guess, describe in English one possible connection between nodes and pixels.
Click on several of the hidden nodes (in the second layer) and see two images pop up, one with all red/blue pixels, and one masked to just your hand-written digit.
- If you had to guess, describe in English what the two images might mean.
- See the incoming edges light up between your selected nodes and the input nodes in the first layer.
  - What does a bright blue edge mean in terms of the numerical weight on that edge? Is it less than zero, greater than zero, or equal to zero?
  - What does a bright red edge mean in terms of the numerical weight on that edge? Is it less than zero, greater than zero, or equal to zero?
  - Answer the above questions for a slightly blue or slightly red edge.
Click on your hand-drawn digit to reset the demo, and draw a new digit.
- Click on some hidden nodes (in the second layer). Do you think the color (weight) of these edges are the same as from your previous hand-drawn digit?
- Reset the demo and draw another hand-drawn digit to test your theory.
- Whether it's the same or different, describe in English why this is the case.
- Conceptually, what would the author of this website have to do
Reset the demo and draw a digit to be as ambiguous as possible (for example, halfway between a 4 and a 9).
- Run the network forward to check the digit. See if you can get a halfway gray color between two or more output nodes (in the last layer), indicating equal (un)certainty.
- What would you do in this case, if you needed this network to classify the digit definitely? (e.g. if you work for the U.S. Postal Service)
A machine learning model is trained from data, like a binary executable file is compiled from source files.
- In what ways could this be a valid analogy? Consider this prediction:
  - if i give you a binary executable program which doesn't need the internet to run (for example, a tic-tac-toe game), and you airgap yourself from the internet, do you have everything you need to run the game?
  - if I give you an ML model and all input images that I need to have classified on the same computer, and you airgap yourself from the internet, do you have everything you need to classify the input images?
- Name some differences in this analogy; that is, how is an ML model trained from data different than a compiled binary file?

Credits

The two micro-systems we are building are based on 0a. Three Blue One Brown's Neural Networks lesson, our primary source for the first half of this class 0b. Building an LLM from Scrach, our primary source for the second half of this class
https://www.nationalgeographic.com/science/article/alan-turing-test-artificial-intelligence-life-history
https://sitn.hms.harvard.edu/flash/2012/ai/
https://www.alanturing.net/turing_archive/pages/reference%20articles/theturingtest.html
https://en.wikipedia.org/wiki/Mechanical_Turk