stella dev diary AI - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

HW1

4/17/24 Input Data: Presenting the 784-pixel image (MNIST dataset) to the input layer. Layer Calculations: Calculating subsequent layers based on weights from the previous layer and comparing to biases. Output Comparison: At the output layer, comparing the results to the label (the text is cut off here, but it likely refers to comparing the network’s output to the true label or target value).

The legal case involving the resurrection of actor Peter Cushing in "Rogue One" is a complex matter centered on the ownership and exploitation of intellectual property rights. Tyburn Film Productions argues that they hold the rights to Peter Cushing's likeness under a 1993 agreement, claiming that the use of his image in the 2016 film constituted unjust enrichment for the defendants. They allege that the defendants, including Cushing's estate, have profited from a right that Tyburn was in a position to control. The defendants contend they had the right to resurrect Cushing based on a 1976 agreement with his production company and a subsequent 2016 agreement. They argue that these agreements granted them the necessary permissions, and thus no unjust enrichment occurred. Because the agreement was drafted in an era before artificial intelligence of this magnitude existed, there is a possibility for Cushing’s visage to be saved. This dispute draws parallels to other cases in the entertainment industry involving the use of deceased celebrities' likenesses, such as the legal battles over the holographic performances of musicians like Tupac, Frank Zappa and Amy Winehouse. Both cases highlight the evolving nature of performers' rights and the challenges in navigating historical agreements alongside technological advancements like CGI and holography. The High Court's decision to proceed with a full factual inquiry, as noted by Master Kaye, underscores the unresolved nature of the legal issues at hand. The case's outcome could set a significant precedent for IP and performers' rights, potentially influencing future legislative reforms.

HW3

  1. Connectionism and Distributed Representation Approach: Connectionism uses artificial neural networks to simulate brain-like information flow. Distributed representation encodes knowledge across many neurons. In MNIST, these approaches help recognize patterns like circles in digits.

  2. Recent Progress in Deep Learning: Factors include abundant data, powerful hardware, and architectural innovations. Transfer learning and regularization techniques also contribute. Collaboration within the research community drives deep learning advancements.

  3. Neurons in the Human Brain vs. AI Supercomputers: Human brains have ~86 billion neurons. Largest AI supercomputer (Hala Point) has 1.15 billion artificial neurons. In 2024, DeepSouth aims to simulate human brain-scale networks.

  4. Numpy Array Shape: The given Numpy array has dimensions 3x4 (3 rows and 4 columns). Its shape is represented as a tuple: (3, 4).

  5. Neural Network Weights and Biases Shape: Assuming the neural network net has layers [7, 13, 2], the shape of self.weights and self.biases in the constructor of Network would be: self.weights: A list of weight matrices with shapes [(13, 7), (2, 13)]. self.biases: A list of bias vectors with shapes [(13,), (2,)].

  6. Before training, a neural network’s initial weights and biases are randomly initialized. These initial parameters result in uninformed predictions. The network essentially “guesses” the output based on these random weights, leading to nonsensical or “trash” predictions. As training progresses, the network adjusts its parameters using backpropagation and gradient descent, gradually improving its predictions to match the correct labels. So, initially, the network’s output is far from reality due to its lack of knowledge about the data distribution and the task at hand.

  7. Ordering Cost Function Outputs: The correct order of cost function outputs (from least to most) is:

    1. Underfitting
    2. Optimal Fit
    3. Overfitting
  8. Effect of Learning Rate (η) in SGD: Large Learning Rate (η): May lead to unstable training or getting stuck. Weight updates may oscillate wildly. Small Learning Rate (η): Slow convergence or failure to train. May get stuck in local minima. Choosing an appropriate learning rate is crucial for effective training.

  9. Stochastic Gradient Descent (SGD): The term "stochastic" refers to using random samples (mini-batches) from the training data for weight updates. Unlike normal gradient descent (which uses the entire dataset), SGD processes smaller subsets, making it faster and more adaptable. SGD can escape local minima and handle large datasets efficiently.

1.How do the essays hang together? Is 4’s essay an improvement on 3.5’s? 4's essay is a little more coherrant but it's still very stiff and robotic.

  1. What do you think of ChatGPT as a student writer? ChatGPT can be a valuable tool but it has severely declined in quality in the past few months.

  2. Would you use ChatGPT for an assignment? How? Yes and no. It reaaly depends on the project. If I'm learning a new language, I'll maybe ask ChatGPT to break the code down line by line for me so I can get a better understanding. Bing's GPT is better for explaing complicated math problems. But you cannot blindly trust either machines, you need to look at further resources to make sure it is giving you correct information.

  3. How does framing relate to system prompts? Framing in journalism refers to how a story is presented. Similarly, system prompts guide AI responses. Both shape content by emphasizing specific angles or values.

  4. How does the "standard story" relate to AI chat systems? The "Standard Story" (e.g., mass incarceration analysis) influences narratives. AI chat systems like ChatGPT can replicate conventional writing tasks but may struggle with deep analysis.

  5. Connecting research assistance to learning technical topics or programming languages: AI chat tools can assist in summarizing content, and refining essays but they shoul not be used as much for finding credible sources. There are some instances, Chat GPT3.5 /4, will 'create' sources and anotations that do not acctually exist.

HW4

Symbols

A.

B.

C.

D.

Meanings

i. Activations from previous layer C

ii. Bias of this neuron, or threshold for firing D

iii. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range? A

iv. Weights from the previous layer to this neuron B

  1. MSE for outputs 0 - 9 3.3585

  2. Increase the bias associated with the digit-2 neuron and decrease the biases associated with all the other neurons.

  3. The phrase "neurons that fire together wire together" refers to the Hebbian learning principle, where connections between neurons that frequently activate together are strengthened. Increasing the weights of connections between highly activated neurons reinforces their influence on subsequent layers, enhancing the network's sensitivity to certain input features. This allows the network to better recognize and respond to patterns in the data that are predictive of successful outcomes. It's a learning optimization process that improves the model's performance by emphasizing relevant features through repeated activations.

  4. changes to the weights of all the neurons "requested" by each training data

    1. A cautious person calculating how to get down a hill.

    2)Making a small change to one law at a time chosen by random groups of n people, until everyone in the country has been asked at least once

  5. Each mini batch has a size of 12.

Backpropagation Calculus

  1. A. Weights between Input and Hidden Layer: 784×100784×100 --each input neuron connects to each hidden neuron Biases for Hidden Layer: 100 --one bias per hidden neuron Weights between Hidden and Output Layer: 100×10100×10 --each hidden neuron connects to each output neuron Biases for Output Layer: 10 --one bias per output neuron 79,510 elements B.weights and biases significantly influence the cost function by controlling neuron activation levels and adjusting thresholds, which is quantified and optimized using their gradients during training

  2. Symbols

A.i. Activations from the previous layer B. iv. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range C. ii. Bias of the current layer D. v. Weights from the previous layer to this neuron E. iii. Activations of the current layer

  1. The bottom half of the second diagram is the same as the first diagram. The second diagram extends backward into the neural network, showing a previous layer whose outputs the layer depends on. The second diagram can be extended further back to layer L−2L−2, all the way to the first layer. Drawing a path between two quantities in either diagram will show you which partial derivatives to "chain together" in calculating.

AI LAB 04

Make progress on the AI Lab 04 on Voice Cloning before attempting this writing exercise. (You don't have to complete it first).

Use the questions below to choose a topic about AI ethics.

Consider the Warren Buffett voice cloning demonstration. How does this compare to the unjust enrichment lawsuit against the estate of the actor Peter Cushing in the Week 01 reading? $25million lost in AI-generated video call fraud What safeguards if any should govern the right to voice clone actors or public figures from freely available recordings on the internet?

In my personal opinion, I believe that voice cloning should be completely illegal and punishable by life in prison.

What thoughts or feelings do you have right now, before hearing your synthesized voice?

Humans are inherently evil and should not be trusted with this technology.

Should you have the right to synthesize your own voice? What are possible advantages or misuses of it?

No, I should not, no one else should either. I don't see any advantages, I can only think of ways that it could ruin both my life and the lives of those around me. One obvious misuse would be using my voice to give verbal consent to empty my bank account.

Photographs, video and audio recordings are technologies that changed the way we remember loved ones who have died.

Should family members have the right to create AI simulated personalities of the deceased?

Absolutely not, I would be appalled if someone made a replica of one of my dead loved ones.

Death is one of our last true freedoms, replicating the likeness and voice of the dead for monetization would be one step closer to living in hell.

If generative AI allows us another ay to interact with a personality of a person in the past, how does this compare with historical re-enactors, or movies depicting fictional events with real people?

It's the same thing, you aren't interacting with the person, just the idea of the person and what little we know about them.

HW5

  1. A GPT is a type of LLM, they may be synonymous based on the context.

  2. D

  3. C

  4. Two or more layers but there are no fixed amount of layers.

  5. Yes it is.

  6. A. Pre-training is usually much more expensive and time-consuming than fine-tuning. True - It can take a vast amount of resources and may require a supercomputer. B. Pre-training is usually done with meticulously labeled data while fine-tuning is usually done on large amounts of unlabeled or self-labeling data. False - Pre-training is usually done on large amounts of unlabelled data. Fine tuning uses smaller, labeled datasets. C. A model can be fine-tuned by different people than the ones who originally pre-trained a model. True - Once a model is pre-trained, it can be fine-tuned by other developers. D. Pre-training is to produce a more general-purpose model, and fine-tuning specializes it for certain tasks. True - We've been doing this in class E. Fine-tuning usually uses less data than pre-training. True. Pre-training requires an extensive dataset to cover a broad spectrum of general knowledge, fine-tuning typically uses much smaller, task-specific datasets that might only consist of thousands to tens of thousands of samples, like customer service interactions for a chatbot or medical journals for clinical information extraction. F. Pre-training can produce a model from scratch, but fine-tuning can only change an existing model. True. Pre-training starts with a randomly initialized model and trains it on a large corpus of data to develop a base level of language understanding and generation capabilities. Fine-tuning, on the other hand, adjusts the weights of an already trained model to adapt to more specific tasks or data types.

  7. A, B & C

  8. https://www.bing.com/search?form=MOZLBR&pc=MOZI&q=ch&showconv=1 Bing's main focus is on: Answering questions on a subject after being trained with question-answer examples Predicting the next word in a sequence

  9. A, B & C We have been studying and implementing all three components in our homework and labs.

  10. B & C

  11. Zero-shot learning refers to the ability of a model to correctly perform a task without having received any specific training examples for that task. It relies on a general understanding or learned representations that can be applied to new tasks. Few-shot learning involves the model performing tasks given only a few training examples. Many-shot learning refers to the traditional model training approach where the model sees many examples during training and learns to perform tasks specifically from large amounts of data.

  12. GPT-3: 175B
    GPT-3.5: 20B
    GPT-4: 175 - 500B
    

w5 small essay goes here not done We are living in a casual arms race with no real ethical or consensual ways of collecting massive datasets. Generative AI, like Sora, is a prime example. OpenAI scraped billions of images and videos for its Sora training dataset, yet the exact sources and quantities remain undisclosed. When questioned, OpenAI representatives often deflect and avoid giving straight answers. The lack of transparency in data collection for training generative models is alarming. Ethical implications arise when personal information, images, and videos are used without explicit consent, raising questions about data ownership and control. The public is often unaware their online activities and shared content are used in these vast datasets, breaching privacy rights. Using extensive datasets without clear sources or consent leads to data bias. If data is collected mainly from certain demographics, it results in biased AI outputs, perpetuating social inequalities and reinforcing stereotypes. These ethical concerns extend to the misuse of biased AI in decision-making processes, affecting employment, law enforcement, and social services. Sora's case exemplifies broader issues in the AI industry. The lack of transparency when companies like OpenAI are questioned about their data practices increases public distrust. Accountability and ethical guidelines are crucial in AI development, emphasizing transparency, consent, and unbiased datasets. The casual arms race in AI development often overlooks ethical considerations, focusing on advancing AI capabilities at the expense of integrity. Society must advocate for ethical standards in AI, enforce regulatory frameworks for transparent data collection, and educate the public about data rights and implications.

HW6

  1. Red -Encoding text from words to token IDs (integers). Blue -Decoding from token IDs (integers) to words.

  2. 783 <|endoftext|> 784

  3. At the end/in between each concatenated document there is <|endoftext|>. You can tell by how many times <|endoftext|> appears.

  4. For each choice below, explain why it is an advantage of disadvantage of this approach.

    It lets the GPT learn connections between the long word and shorter words based on common parts, like "pneumonia", "microscope", or "volcano". This approach can work in any language, not just English. It could probably work in other non-Germanic/Latin based languages but will probably make more mistakes for tonal languages with many different meanings for the same word/prefix/suffix/character depending on the context.

    All words are broken down in the same way, so the process is deterministic and results from repeated runs will be similar.

    The system will handle any word it encounters in chat or inference, even if it has never seen the word during training.

    It is easier than looking up the word in a hash table, not finding it, and using a single <|unk|> unknown special token.

  5. Monosyllabic words have a smaller token id while more complex words have a larger token id. When a word could be pronounced in a similar but mean different things depending on the context.Larger integers for token IDs tend to be associated with less common words or specific terms. Tokens smaller than words, including prefixes, suffixes, or single characters. These are used to handle rare words, out-of-vocabulary words, or morphological variants. if token_str.startswith("##"): possible '##' prefix for sub-word tokens. decoded_string += token_str[2:] append w/o spaces

The legal battle between the Internet Archive and major publishers over the Controlled Digital Lending (CDL) program highlights important issues about digital rights and the ethical use of digital resources, crucial for selecting training datasets for AI models like GPT. Hachette, HarperCollins, Wiley, and Penguin Random House argue that the Internet Archive’s CDL program infringes on their copyrights by digitizing and lending books without permission. They claim this practice undermines sales, causes financial losses, and disincentivizes the production of new works by violating their exclusive rights to reproduce and distribute content. The Internet Archive, led by founder Brewster Kahle and supported by the Electronic Frontier Foundation (EFF), argues that CDL is an extension of traditional library lending. They contend that CDL, which allows digital lending only as many times as physical copies are owned, benefits public access to knowledge and preserves texts, operating under fair use principles without significant financial harm to publishers. The Google Books Project faced similar lawsuits for scanning millions of books and providing searchable snippets. Courts ruled this as fair use due to its transformative nature and public benefit, contrasting with the ruling against the Internet Archive's CDL. The Internet Archive’s argument that CDL is an extension of traditional library practices is compelling, emphasizing fair use and public benefit. If I were the judge, I would consider the educational and non-commercial intent, alongside the controlled nature of digital lending, as strong factors in favor of fair use. Training a GPT model based on classmates' work would involve similar ethical considerations, obtaining explicit permissions and transparency would be ideal in a perfect world.

HW7

Question 0. Which of the following vectors is a one-hot encoding? How would you describe a one-hot encoding in English? The one hot encoding vector is the 1, represents categorical variables as numerical values in a machine learning model.

Question 1. What is an (x,y) training example (in English)?

An (x, y) training example consists of an input feature vector x and its corresponding target label y. In the context of MNIST, x is a 784-dimensional vector representing the pixel values of an image, and y is a single digit from 0 to 9 representing the label of the image

Question 2. We call large texts for training a GPT "self-labeling" because we can sample from the text in a sliding window (or batches of words).

Match the following terms (A,B,C) with its definition below (1,2,3):

A. max_length
    ii. chunk size, or number of token IDs to group together into one x or y of a training example (x,y)

B. stride
    i. the number of token IDs to "slide" forward from one (x,y) training example to the next (x,y) training example

C. batch size
    iii. the number of (x,y) training examples returned in each call to next of our dataloader's iterator.

Question 3: Weights Matrix Size The size of the weights matrix is 3x4

Question 4: batch matrix (8x6) by the embeddings matrix (6x12)

Question 5: Your vocabulary has 7 token IDs in it, and the embedding output dimension is 4

Question 6: Writing

Essay on Permacomputing and AI Energy Consumption

The two articles, "Thoughts on Permacomputing" and "Generative AI’s Environmental Costs are Soaring — and Mostly Secret," present critical insights into the environmental impact of computing technologies. The first article advocates for sustainable computing practices, while the second highlights the hidden and escalating energy costs associated with generative AI.

Main Thesis Summaries:

  • Permacomputing Article: The article emphasizes the need for sustainable computing practices to mitigate environmental impacts and promote longevity in technology use.
  • AI Energy Article: This article argues that the environmental costs of generative AI are significant, rapidly increasing, and often obscured from public view.

Thesis Expansion:

  • Permacomputing Article:

    • Definition and Principles: The article begins by defining permacomputing, drawing parallels to permaculture in agriculture. It outlines principles such as low-energy consumption, long-term sustainability, and resilience.
    • Practical Applications: It discusses various practical approaches to implementing permacomputing, including using energy-efficient algorithms, extending the lifespan of hardware, and promoting software that requires minimal resources.
    • Challenges and Future Directions: The article concludes by addressing the challenges in adopting permacomputing widely and suggests future directions for research and practice.
  • AI Energy Article:

    • Current State of AI Energy Use: The article starts by detailing the current energy consumption of AI systems, particularly large language models, and the lack of transparency in reporting these figures.
    • Environmental Impact: It then delves into the environmental consequences, such as increased carbon emissions and the strain on energy resources.
    • Call for Accountability: The article calls for greater accountability and transparency from tech companies regarding their energy usage and advocates for policies to mitigate these impacts.

Evidence and Arguments:

  • Permacomputing Article:

    • Energy Efficiency: The article cites examples of energy-efficient computing practices, such as using algorithms that optimize power consumption and hardware designed for low energy use.
    • Longevity and Resilience: It argues that extending the lifespan of computing devices and building resilient systems can significantly reduce e-waste and the need for frequent replacements, thus benefiting the environment.
  • AI Energy Article:

    • Energy Consumption Data: The article references studies showing that training large AI models can consume as much energy as several hundred households do in a year.
    • Carbon Footprint: It provides evidence of the substantial carbon footprint associated with AI training processes, emphasizing the urgent need for sustainable practices in AI development.

Independent Evidence:

  • Supporting Permacomputing: A study published in the "Journal of Sustainable Computing" supports permacomputing by demonstrating that optimized algorithms can reduce energy consumption in data centers by up to 30%.
    • Supporting AI Energy Concerns: An article from "Nature" reports that the carbon emissions from training a single AI model are equivalent to five times the lifetime emissions of an average car, underscoring the environmental impact highlighted in the AI energy article.

Comparison and Contrast:

  • Attitude: The permacomputing article takes a proactive and optimistic approach, focusing on solutions and best practices for sustainable computing. In contrast, the AI energy article adopts a more critical stance, highlighting the lack of transparency and the urgent need for action.
    • Approach: The permacomputing article is solution-oriented, offering practical advice for implementation, whereas the AI energy article is more diagnostic, identifying problems and calling for accountability.

Personal Agreement:

The two articles, "Thoughts on Permacomputing" and "Generative AI’s Environmental Costs are Soaring — and Mostly Secret," present critical insights into the environmental impact of computing technologies. The first article advocates for sustainable computing practices, while the second highlights the hidden and escalating energy costs associated with generative AI. I agree with the thesis of both articles. The evidence presented in the permacomputing article is convincing, particularly the focus on energy-efficient algorithms and hardware longevity, which are practical and actionable steps toward sustainability. The AI energy article also presents a compelling argument about the hidden costs of AI, backed by concrete data on energy consumption and carbon emissions. Transparency and accountability in AI energy usage are indeed crucial, as the growing reliance on AI technologies necessitates sustainable practices to mitigate their environmental impact. In conclusion, both articles highlight the urgent need for sustainable computing practices. While permacomputing offers practical solutions for reducing environmental impact, the AI energy article underscores the importance of transparency and accountability in the tech industry. Together, they provide a comprehensive view of the challenges and potential solutions for achieving a sustainable future in computing.

Lab 05/16/24/HW 8

tensor([0.9544, 1.4950, 1.4754, 0.8434, 0.7070, 1.0865])
tensor(0.9544)
tensor(0.9544)
Attention weights: tensor([0.1455, 0.2278, 0.2249, 0.1285, 0.1077, 0.1656])
Sum: tensor(1.0000)
Attention weights: tensor([0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581])
Sum: tensor(1.)
Attention weights: tensor([0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581])
Sum: tensor(1.)
tensor([0.4419, 0.6515, 0.5683])
tensor([[0.9995, 0.9544, 0.9422, 0.4753, 0.4576, 0.6310],
        [0.9544, 1.4950, 1.4754, 0.8434, 0.7070, 1.0865],
        [0.9422, 1.4754, 1.4570, 0.8296, 0.7154, 1.0605],
        [0.4753, 0.8434, 0.8296, 0.4937, 0.3474, 0.6565],
        [0.4576, 0.7070, 0.7154, 0.3474, 0.6654, 0.2935],
        [0.6310, 1.0865, 1.0605, 0.6565, 0.2935, 0.9450]])
tensor([[0.2098, 0.2006, 0.1981, 0.1242, 0.1220, 0.1452],
        [0.1385, 0.2379, 0.2333, 0.1240, 0.1082, 0.1581],
        [0.1390, 0.2369, 0.2326, 0.1242, 0.1108, 0.1565],
        [0.1435, 0.2074, 0.2046, 0.1462, 0.1263, 0.1720],
        [0.1526, 0.1958, 0.1975, 0.1367, 0.1879, 0.1295],
        [0.1385, 0.2184, 0.2128, 0.1420, 0.0988, 0.1896]])
Row 2 sum: 1.0
All row sums: tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
tensor([[0.4421, 0.5931, 0.5790],
        [0.4419, 0.6515, 0.5683],
        [0.4431, 0.6496, 0.5671],
        [0.4304, 0.6298, 0.5510],
        [0.4671, 0.5910, 0.5266],
        [0.4177, 0.6503, 0.5645]])
Previous 2nd context vector: tensor([0.4419, 0.6515, 0.5683])
tensor([0.4306, 1.4551])
keys.shape: torch.Size([6, 2])
values.shape: torch.Size([6, 2])
tensor(1.8524)
tensor([1.2705, 1.8524, 1.8111, 1.0795, 0.5577, 1.5440])
tensor([0.1500, 0.2264, 0.2199, 0.1311, 0.0906, 0.1820])
tensor([0.3061, 0.8210])

inputs = torch.tensor(
tensor([[0.2996, 0.8053],
        [0.3061, 0.8210],
        [0.3058, 0.8203],
        [0.2948, 0.7939],
        [0.2927, 0.7891],
        [0.2990, 0.8040]], grad_fn=<MmBackward0>)
Traceback (most recent call last):

multi_head_attention = MultiHeadAttentionWrapper(d_in=4, num_heads=2, d_out=2)


 keys shape: torch.Size([32, 128, 512])
queries shape: torch.Size([32, 128, 512])
values shape: torch.Size([32, 128, 512])
attn_scores shape: torch.Size([32, 128, 128])
attn_weights shape: torch.Size([32, 128, 128])

Writing

Cybernetics, a term coined by Norbert Wiener in the mid-20th century, is fundamentally the study of systems, feedback, and control in machines, living organisms, and organizations. It involves understanding how systems regulate themselves through feedback loops, a concept that has profound implications across various fields, including artificial intelligence (AI). The relationship between cybernetics and AI is intrinsic, as both fields explore the principles of information processing, learning, and adaptation within systems. Cybernetics can significantly impact AI by providing frameworks for developing adaptive and self-regulating systems. Principles such as feedback loops and system dynamics are crucial for creating robust AI models that can learn and adapt in real-time. Conversely, AI can influence the field of cybernetics by offering new tools and methodologies for analyzing and modeling complex systems. Recent progress in LLMs, for example, can enhance our understanding of natural language processing within cybernetic systems, leading to more sophisticated models of human-machine interaction. The exploration of ethics and second-order cybernetics offers an unnerving understanding of the interplay between human observers, technological systems, and the ethical implications of our interventions. As AI continues to evolve, incorporating principles from cybernetics will be essential for developing systems that are not only intelligent but also ethically sound and reflective of our responsibilities as creators and users of technology.

⚠️ **GitHub.com Fallback** ⚠️