Dominic Dev Diary for AI I guess? - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

AI HW 1

These instructions seem to just say, make this directory in the repo. Ok? done. Maybe the link is broken?

The reading: Warning, this is pretty lengthy and not well organized but hopefully its interesting.

Ok, so, where do I even start. I don't think this quandary is one of AI or ML but one of death and its importance in life. Which makes this a pretty fun conversation to have. Life as we know it is finite. While we get closer and closer to making that less true it's foundational to the evolution of life and is consequentially influential on how society acts. This starts to demonstrate how technology is developing faster then society can handle. This is true for much of human history. Id like to point out the ever loved example of how primitive firearms influenced japan. There is also a good star trek episode that has a similar idea of introducing imbalance in the societal churning of a primitive planet with tech they are not ready for. But on an even smaller scale, the refining of bronze and the influence it had on tools. Those things all took time to propagate. But nothing can spread faster then something digital thanks to the internet. So with that in mind. How is that related to Death? For something to propagate it needs something to propagate through. Like how waves propagate through water. Or how a virus might transfers from person to person. Ideas do a similar thing and that kinda what leads to change. There is a bit of a catch though. Because life is intentionally volatile, people are not great at changing. A human who lives forever cannot over time sprout wings. I guess as I am not a biologist I don't technically know that but I feel like its a safe enough assumption. But just as we cannot sprout wings its hard to make room for new actors, new ideas, new technology, ideology, etc, if the current thing is left unchanged. So in that sense, death is necessary. phoenix from the ashes or whatever cliche thing you would prefer. Now, what does that mean for star wars? There is probably some, minimal threshold where it starts to matter. The issue we probably care to avoid is just using the same actors over and over again to make the rest of all films thus killing the need for actors and it would be a shame to kill off any form of art. Maybe it would make new avenues instead. Who am I to say. But there is only so much time. Hopefully that makes enough sense.

AI HW 3 (no 2?)

So, Connectionism is this sorta idea of how to math the way the human brain seems to think. Sorta the basis for this whole ML thing. If I'm understanding things correctly this thought of distributed representation is that this ability to identify things isn't determined by a soul component. This must be what makes the human brain so flexible. There are a handful of cases where people lose like, 40% of there brain matter but function perfectly fine because there brains capacity is malleable enough possibly because of this idea of distributed representation. But that just me speculating. I can see that's whats going on from 3blue1browns visualized mnist. Each node (I want to say cluster? in bio world) Is just holding some sorta blended bit of the puzzle of what this thing is. So, you can't point to one node for identifying one bit. I guess if you think of how, the absentee of something is still data then even nodes that don't correlate still correlate and identify the thing. So that information is still there through all nodes. I guess? That is pretty neat if I'm getting that right.

Factors for the explosion. Well we have talked to death about the magic of hidden layers. That's probably the biggest technical thing. But like, that's the entire premise. So surely, the correct answer to this is for me to talk about things like the insane data we now have assess to. Or the computation power we have. More efficient designs. Maybe the question isn't what I think it is.

Well the amount of neurons in the human brain isn't just like, one number. But around the 86 billion mark. Which is quite a lot. But its also mega flexible and can be rearrange. Chat has some 175ish billion params, but again, lacks that human flexibility. Kinda reminds me of the lord of the rings thing where humans are special because they control there fate.

OK well those unicode characters are not rendering correctly so uh. that makes this a difficult question. This first one kinda feels strange as "thing that knows nothing knows nothing" is a weird thing to need to confirm.

I don't think I understand the question here. What is the question here? Network([7,13,2]) is the shape and the bits are defined above. so.. what is it asking? I am not sure on these next three questions to be honest. Seems vague. Maybe I just don't get it yet.

eta describes the step size. how radical of a change any one instance makes to the network.

So far I think Stochastic just means the behavior of the thing is a bit fuzzy in the sense of its not predictable. Its chaotic. We should call it Chaotic Gradient Decent. Id listen to that band.

that is not organized ideally. Maybe Ill fix that later.

AI HW 4 (I guess?)

Question 0. In this equation, match the symbols with their meaning:

Symbols

A. $\sigma$ iii. Sigmoid, or squishing function, to smooth outputs to the 0.0 to 1.0 range?

B. $w_i$ iv. Weights from the previous layer to this neuron

C. $a_i$ i. Activations from previous layer

D. $b$ ii. Bias of this neuron, or threshold for firing
Meanings (what even are meanings)

Question 1.

Calculate the cost of this "trash" output of a neural network and our desired label of the digit "3"

3.3585

Question 2.

Suppose we are feeding this image forward through our neural network and want to increase the classification of it as the digit "2"

I think, we increase the bias for 2 and decrease for all. However I would expect that we adjust accordingly to how the input registured wrongly. (how the arrows in this image point). But I guess not.

Answer this question about decreasing the cost of the "output-2" node firing

Question 3.

What does the phrase "neurons that fire together wire together" mean in the context of increasing weights in layer $L$ in proportion to how they are activated in the previous layer $a_L$ ?

Its my understanding that the whole way this works in biology is that its the familiar pathings that get reinforced over time. Same deal. Were reinforcing the paths. That's how we identify things.

Question 4.

The following image shows which of the following:

changes to the weights of all the neurons "requested" by each training data
changes to the biases of all the neurons "requested" by each training data
changes to the activations of the previous layer
changes to the activation of the output layer

In addition to the answer you chose above, what other choices are changes that backpropagation can actually make to the neural network?

OK, so, I think this is just changing the weights as the training data seems to imply. As for the other part, the biases also get shifted somehow, if not here, when?

Question 5.

In the reading, calculating the cost function delta $\nabla C$ by mini-batches to find the direction of steepest descent is compared to a

a cautious person calculating how to get down a hill
a drunk stumbling quickly down a hill
a cat leaping gracefully down a hill
a bunch of rocks tumbling down a hill

A cautious person as the steps are precise just maybe ill informed.

What is the closest analogy to calculating the best update changes $\nabla C$ by mini-batches?

passing laws by electing a new president and waiting for an entire election's paper ballots to be completely counted
asking a single pundit on a television show what laws should be changed
asking a random-sized group of people to make a small chance to any law in the country, repeated $n$ times, allowing a person the possibility to be chosen multiple times
making a small change to one law at a time chosen by random groups of $n$ people, until everyone in the country has been asked at least once

Making small changes until we all have been asked. Man, can you imagine the process that would be like. I wonder if that would work for human law.

Question 6.

If each row in this image is a mini-batch, what is the mini-batch size?

Remember in our MNIST train.py in last week's lab, the mini-batch size was 10.

Well if my eyes do not deceive me that is groups of 12. Why is this a question?

Backpropagation Calculus

3Blue1Brown Chapter 4: Backpropagation Calculus

Question #1

For our neural network with layers of [784, 100, 10], what is the size (number of elements) of the $$\nabla C$$ (cost function changes) matrix below:

ok so, the input layers don't have bias, and the output doesn't have weight. so 784 * 100 * 2 + 10... 156810? The syntax of math is weird, L makes me think this is looking at the over all change layer by layer not neuron per neuron. And this abstraction could be made. But, I don't think that what the question is assuming.

Answer the question again for this smaller neural network

I mean.. 6? Man this doesn't even feel like comp sci anymore.

Question #2

Symbols

A. $a^{(L-1)}$ I believe this is our "output" for L-1.

B. $\sigma$ our flattening function. sigmoid

C. $b^{L}$ bias of all nodes in the layer L

D. $w^{L}$ weight of all nodes in layer L. I mean

E. $a^{(L)}$ The current layer output.

Question 3.

In this tree diagram, we see how calculating the final cost function at the first training image 0 $C_0$ is dependent on the activation of the output layer $a^(L)$. In turn, $a^{(L)}$ is dependent on the weighted output (before the sigmoid function) $z^(L)$, which itself depends on the incoming weights $w^{(L)}$ and activations $a^{(L-1)}$ from the previous layer and the bias of the current layer $b^{L}$

What is the relationship of this second, extended diagram to the first one?

Choices (choose all that apply)

~~There is no relationship~~
The bottom half of the second diagram is the same as the first diagram
The second diagram extends backward into the neural network, showing a previous layer $L-2$ whose outputs the layer $L-1$ depends on.
The second diagram can be extended further back to layer $L-3$, all the way to the first layer $L$
~~The first diagram is an improved version of the second diagram with fewer dependencies~~ if only it were that simple.
Drawing a path between two quantities in either diagram will show you which partial derivatives to "chain together" in calculating $\nabla C$

Human stuff.

So, I want to go back to this idea of death. I know, so fun. But once more I think this whole weirdness stems from the idea that things end. It happens. Grief is an important thing to process, and I think having the voice of a loved one now past is going to just.. cause more trouble then its worth. But I am not going to dictate that, that should be up to them. Voice actor stuff? now that's dangerous. Ill cut the lengthy bit and just say the difference between them is money. But at the end of the day I think it just can't be done convincingly. You can't capture the human element. Even if you did it would change because that is what humans do. Perhaps the problem is around what it means to be human and the identify of self. I think we collectively as humanity struggle to understand the extend we ourselves exist as in the context of the world around us. I can't imagine a machine can capture that without us understanding ourselves better.

AI HW 05

OK questions for stuff.

1, What is the difference between a GPT and an LLM? Are the terms synonymous?

pretty sure there not. GPT is a methodology LLM is more a categorization of a result. I think.

2, Labeled training pairs of questions and answers, in the model of "InstructGPT" are most similar to which of the following?

OK, so. I don't think there's really much of a difference between A B C and D other then stack overflow and... somewhat, Quora, have some means of quality control that is directly correlative to relevance.

It might be worth looking into Grices maxims for speech to explain that more.

3, The GPT architecture in the paper "Attention is All You Need" was originally designed for which task

C, English to German. which is probably fairly tricky.

4, How many layers of neural networks is considered "deep learning" in the Rashka text?

more then just one hidden layer. sense it adds, ya know.. depth.

5, Is our MNIST classifier a deep learning neural network by this definition?

technically it has more then one hidden layer soooo yea? technically?

6, For each statement about how pre-training is related to fine-tuning for GPTs:

A), Pre-training is usually much more expensive and time-consuming than fine-tuning.
- True, Starting from nothing is harder. What would it even take for this to not be true.
B), Pre-training is usually done with meticulously labeled data while finetuning is usually done on large amounts of unlabeled or self-labeling data.
- This seems like a generalization without a clause so I want to say false based on that alone. However, I feel its also false with the clause of "Larger".
C), A model can be fine-tuned by different people than the ones who originally pre-trained a model.
- True, Assuming acssess to the original. Data is still data.
D), Pre-training is to produce a more general-purpose model, and fine-tuning specializes it for certain tasks.
- While fine tuning is, I dont think pre training is 100% every time for more general purposes.
E), Fine-tuning usually uses less data than pre-training.
- Well yea, your building off of something. But It again isn't necessarily the case.
F), Pre-training can produce a model from scratch, but fine-tuning can only change an existing model.
- cant fine tune nothing. pre-training would mean, before training. I hope. 7, GPTs work by predicting the next word in a sequence, given which of the following as inputs or context?
A), The existing words in sentences it has already produced in the past. No, its isolated.. Avoid the oroborus problem.
B), Prompts from the user. Yep
C), A system prompt that frames the conversation or instructs the GPT to behave in a certain role or manner, Be weird if it meant nothing.
D), New labeled pairs that represent up-to-date information that was not present at the time of training, How could it?
E), The trained model which includes the encoder, decoder, and the attention mechanism weights and biases. That is the idea.

I no longer have the energy or time to format that verbose.

8 Funnily enough I have already had this metal dual with chat. it was so unwavering in this idea that Questions and information recalling is separate. I think this is silly. Like its the difference between "what is" and "is". its the same process. The other just has the smallest bit of extra context behind where 'is' is to be used. Any question could lack the question mark and still be a question. The difference between saying "calculus" and "what is calculus" isnt miles different because the subject is the same. So silly.

all just pattern recognition anyways.

9 encoding is the network bit. That's the, what do words mean. Decoding probably is too for similar reasons.

10, zero shot learning is that idea of expanding knowledge based on information. I think IBM Watson was all interesting because of a similar concept. B might, who knows. its all just pitches and vibrations anyways. C Though, if you ask it, it says it doesn't So, maybe it doesn't. So I'm going to say it doesn't. But id like it to. The example I keep getting is for image analysis. so in a similar sense I guess email filters could be? It seems to me its more of a choice based on input not output. Like, if your training something with little info you few shot, if your training with Something that is definable ridged and finite, you want many shot. If you want that sweet flexibility and have the data to boot, you want zero shot. Perhaps it should be called infinite shot just to put many shot and itself sorted in nomenclature as well as data size requirements.

Did I accidentally answer 11 enough?

12, both 3.5 and 3 are 175 billion. and Open AI wont say how many 4.0 has. probably more then 4. (this is a joke)

Questions but the other ones, Chapter 2:

1, LLMs can't operate on words directly. To start think of all the extra memory that would need. I kid. But what would that even mean? how do you math words? what would it mean to put words into vector space. you cant index by "what" instead of 324. Really its a failure in the design of language that it cannot. If only we made words with mathematics in mind.

2, I believe embedding is the process of getting something as unmathable as words into vector space. So that it can be have math done to it.

3, I don't quite know what the verdict would be. Thats not dependant on the text but what we do to it, right? I don't think we talked much more beyond frequency. as for ChatGPT, if you think about how it has 175 billion params and that this number is s whatever its vocabulary is and its dimensionality. Then we can maybe think of some idea. But its pretty large.

As a side note. Im wondering if multidimensional shapes might be a usable way of constructing a AI module not on just words but actual information. I could see the possibility of understanding what is correct could be seen as collision detection between "true" bits of info. I now need to look into higher dimension collision detection. If that's even like.. a thing.

4, C. Breaking up natural human text into tokens, which could include punctuation, whitespace, and special "meta" tokens like "end-of-text" and "unknown", B. Giving unique token IDs (numbers) to each token. D. Converting token IDs to their embeddings, for example, using Word2Vec, A. Adding position embeddings to the token word embeddings.

This last one is kinda weird wording wise. I think it could be interpreted as the process in which we pair embedded vector space and the word itself, but it could also mean how one dirives said translation, but as that is what word2vec is I assume this is the order.

Human bit again:

Creativity is admittedly hard to define. As I'm sitting here thinking instead of typing, I think the way I want to describe creativity is randomness that leads to constructive interference. Constructive interference as in how waves from physics interact. So imagine, if you will, Pool of water that represents something like, what makes a good axe. I want to steer away from something as pure and subjective as raw art. There is preconceived notions on what makes a good axe but it also depends on what kind of axe. There is a shocking amount of diversity in axes. Lets roll with what ML likes us to think about in that peaks in this pool are what makes up an axe good at whatever the thing an axe needs to be good at. The reason why I say creative needs to be constructive is that anyone, can make something at random and have a really terrible axe. For example, Ill grab a plastic cup and put a garden hose coiled around it and try to pin it together with thumb tacks. Is this idea creative? Not in terms of being an axe. It wasn't even that good at being a random assortment of things two of the 3 components deal with liquids still. I can even tell you that the idea of wrapping it around the cup my brain probably stole from thinking of electromagnetism. Totally unoriginal. The randomness was just as bad at being constructive as it was at being random. So, what I would ask next is does what makes a good axe a good axe relative to humans. Like, some alien creature of intelligence who wants to extract resources out of this bit tall biological structure because it seems pretty sturdy and grows fast, might have different physiology and thus an axe might sit differently in there hands. So in some sense, yes. But, in another sense, the laws of physics the alien creature exists within are (probably, this is actually debatable with the whole right handedness in chemistry, fun stuff.) the same we exist in. So its not like an axe would cease working in the presence of some Alien (again, probably). So there is some form of weakish objective means in which we can say an axe is good. But what is a creatively made axe? It could be some kind of innovation or it could perhaps just look nice for one reason or another. It could simply just feel the best in your hands due to the particular choice of wood and the way it was carved. Is creativity strictly intentional? I don't think so. Getting back to ML. Can these machines be creative under these definitions? No. Remember that first bit? random. That's the first reason. The bigger problem is curiosity. See even those of us who are not that curious about things in depth cant help but make observations anyways. Curiosity in a sense is our informed randomness generator. No, its not purely that. But Its a huge part of why children learn things so well. They have more things to ponder. The older we are the more we assume we know things. So can we make a computer curious? That I am not so sure about.

AI HW 6

1, encoding is blue decoding is red. Like.. obviously

2, junk and EOF. so we have a way of saying "thing we don't know" and "done"

3, EOF. ok fine end of text. sense.. now were using it differently. Not quite what I was expecting.

4, a, advantage because it helps with understanding relation b, advantage.. would you want it to only work on one language? is that.. ok moving on. c, advantage... are these not.. all advantages? I mean. I guess theres some trade off.. Like, you probably want a system like a LLM to be interpretive. d, I mean thats nice ish. Like.. thats useful. thats simple. simple is often nice.

5, Its all on frequency, right? more often a word shows up the smaller the counter. this is kinda like a week categorizations of how like "study able" a word can be. Or more how significant it is. I actually find this a pretty untuitive way to think of language. A while back I was looking at constructed languages and one of the principles I used for it was the more simple the construct the shorter the word. Same kind of idea. Fundamental thing smaller. That allows you to make meaning out of combining words. Which, I believe is the point here.

The human bit, again.

Ok, well, copy write law kinda just makes me sad. Oh how the best of intentions can be continuously used for evil. Thought I think this in particular has more implications on what learning is and the purpose of information and text. So lets ask some fundamental questions. Why do we write things down? There are two arguments. One, Persistence, as we computer scientist call it. Archival. So that we may capture an idea. Two, to share that idea. Even if you do this for only one reason the other is a more or less natural consequence, regardless of how well it does the job, it inevitably attempts both. Its also relatively fair to say that persistence is arguably just sharing with yourself and thus just a specialized case of the other. You can start to see how they are pretty tightly coupled.

So, if that is the point of writing things down. Then, you would imagine a publishers dream would be to have there book accessible and persistent. Alas, that is not what they actually do for a living. See we got this strange concept currency involved and that kinda made things a little jank to say the least. But refraining from that can of worms. Id rather talk about what the transfer of information looked like pre internet and pre printing press.

Before the printing press books were transcribed by hand. It was slow and arduous. As you can imagine. One of the reasons the printing press was such a big deal is books became cheaper, information became more preventable and easier to propagate. I guess I don't know it for a fact. But I imagine this lead to a jump in humans similar to how the discovery of fire and cooking lead to our brain development as we were able to digest proteins better. I mean, we also had to get people reading more but I suppose that probably wouldn't have happened without things to read being easier to get.

before the internet in contrast, It was all about finding the information tucked away in books across vast libraries. Librarians study and practice the act of finding and organizing information after all. Any number of books could be the right book. Its easy to see how the internet became a natural extension of information with the ability to query a insurmountable amount of information.

That all being said. What is the purpose of books if not to learn from. If I can learn from a book, why not an LLM.

I do however suspect using LLMs as a learning tool has a few problems with information condensing. Cliff notes is a good example of this. The nuance of the original material is often loss which holds its own kind of value. The ability to interpret for yourself is very important. So it makes a good starting point like Cliff notes might do. We already see this is the correct use of informational ML and LLMs. Its a starting point. But just like Cliff notes its not always being used correctly.

Is my friend who has read a book subject to copy write infringement if they tell me the plot? I would like to see how that plays out.

in a sense, all books are just a tedious to make save files. This digitalization is a partial conclusion. But Just as books have not replaced storytelling around a campfire, the internet hasn't replaced books and LLMs cant replace the internet. They are all different mechanisms.

AI 07

ok this is the 3rd time I have written this now and IDK WHY it is not here. But here we go again... I thought I sublited this in a text box but who knows.. just gone now.

0, the vector is 0010. one hot encoding is cool. its in the TLB for memory fetching. guarentees one out put for any one input by using one 'hot' bit.

1, our inputs are x, the training data, with our label is y.

2, stride = i max_len = ii batch size = iii

3, this... isn't a question. this is a statement. So I agree. Affirmative. What is the actual question? Is this a game of jeopardy?

PS did you know this Isn't even on canvas as an assignment?

human bit once more.

You know. Power consumption wast even a concern that crossed my mind. As far as the future of AI goes that is. I mean obviously with great computation comes great power draw. There is not a lot of disputing that. I can definitely tell you first hand how my computer alone can keep the room it is in warm when put under any serious load. But the physical power draw isn't something I think about much. I think this speaks a lot to how inefficient we make things now days. We don't program like we use to. Chrome eats all the ram. That's power consumption somewhere. I assume though that AI shenanigans are probably, hopefully, better optimized then that though. I do not know much about the power grid. But I do know a fair bit about energy.

Lets talk about batteries and battery alternatives. Flywheels mentioned by one paper is fun and all but then you need to deal with friction. To start, this is our energy loss. In a vacuum this is great but with air we lose a lot of energy really fast. Not to mention, if we spin too fast that friction starts to become heat in its own and could be dangerous. Then, they also are going to behave like gyroscopes. So, I think you would have problems using something like this in anything that moves. But there is probably a lot of really nice potential in this idea. Angular momentum is pretty great. Good for space. Not so great in atmosphere.

So what are some good on planet energy batteries? I know something that we already do is heat up metal. This doesnt sound ideal either. But, in mass, if you get something like Iron and have enough excess energy to make it molten it can retain heat pretty well. insulate that a bit, and you get a pretty good and effective energy storage. Great for if your energy production produced more energy at certain times but need it more consistently. I know some power plants already do this. But, its still not perfect. Wear and tare on the material as you heat it and cool it you will over time lose some of it, and at this scale, that loss can be significant. I am sure there are other problems but I am working off of old knowledge.

So what about a material that we don't care if it degrades? How about water. Something we also do is have two man made lakes. One that higher then the other. We store the energy as potential energy in water. This, is fantastic. Its quite land intensive though. But, there is no loss over time. Sure, some water might evaporate but, it also rains. Water is pretty use to that kind of life cycle. More importantly the planet is already able to cycle water back into a usable state. I cannot say the same thing about Iron I don't think.

Those solutions are all pretty large scale though. So those don't help a whole lot on a smaller scale.

Nothing is better for free. By this I mean, there is always a cost to something. Things have tradeoffs. At current, we are designing our technology to be sleeker and faster, but we are pushing the possibilities instead of building for longevity and I think these papers are seeing that. Imagine if we designed instead for sustainability. Would we even have the tech we have now?

Framework is an interesting company that is making modular reparable laptops. They are pretty easy to understand. Most laptops are complex and really difficult to open up. They put QR codes on parts that link to documentation which is just amazing. But there is an interesting debate here. Old computer parts can still be pretty useful. You can hand it off or use it as a server for this or that. But replacing parts is a bit more complicated. Selling computer parts second hand is already kinda less then ideal. But at least hardware is pretty well defined now days. But some of frameworks modules were not very reusable. However, they have made some really cool strides to make those parts usable in different ways. One such example is how there gpu unit now has an adapter you could use in a normal computer.

FairPhone is an interesting company who pride themselves on using as many recycled materials as they can while also having a semi modular design. Not as modular as something like framework but I think both have there own merit. After all phones are a good deal smaller. But the secondary cost of this design is how due to the form factor of smartphones its not going to be able to be on the same cutting edge. They also don't have as large of a market sense there still really small scale production. Only available in Europe.

I didn't know this before but they even have a recycling program. Which is awesome. So how can we extend this mentality? We still use a lot of batteries. I think when I was younger there were a lot more batteries my family would need. But now days, it seems primarily AA and AAA batteries. Then a handful of button batteries on rare occasions. My headphones I know use rechargeable cellphone batteries used in flip phones. I would be curious if this could be simplified further. I think if we could get away with one, reliable rechargeable battery format. Then if we also start building a mentality of recycling these batteries in there own way, maybe we could get somewhere.

AI HW 8

0, on line 2 (which isn't like.. guaranteed between implementations I don't think.) d_in = 3 d_out = 2 context_length = 6

1, on line 16 d_in = 2 num_tokens = 6 b = 2

2, on line 18-20.. keys [2, 6, 2] values [2, 6, 2] queries [2, 6, 2]... kinda starting to question if this code is right..

3, on line 22 / 25 attn_scores = attn_weights = [2, 6, 6]

why did I type them if I was just going to paste an image..

####Human bit. Sure.

This is actually a really interesting read. When I hear the word cybernetics I think of the thing in video games that turn your player character into robo cop. A fun misconception.

In reality this thought of cybernetic is in some form or another speaking on this idea of how complex systems are effected by an observer. Very Schrödinger's cat. Not even sure you need to get to complex for this to be an issue. Someone working vs someone working with a person watching over there shoulder is different.

I think If I am understanding this correct that cybernetics is the study of how complex systems interact. First order cybernetics speaks form a place of absolute objective viewpoint. Not really an obtainable thing. Where second order acknowledges that the viewer of a system must have an effect on it, and that no outside observer can have an absolute viewpoint.

Any observation effects the system. But I suppose the point is that an observer that is in a system longer effects it less. The system gets comfortable. It adapts. maybe? that part is a bit more wishy washy.

I think what I want to talk most about here is in what way things can be subject to second order cybernetics. Like, If I try and time a program the very effect of trying to time it effects its run time because there is some amount of overhead to set and check those timers. In fact the OS tries to account for this with knowledge from the scheduler for how a user uses a computer. Its not just a timer but a series of timers meaning different things. Or how a loading bar will always make loading take longer.

Humans kinda end up in a neat predicament in that all participants of a system are inherently observers in there own right. Which I imagine balances out things in a peculiar way. Like if we had an effective way of communicating thoughts from one person to another (maybe call it something like "words" or "speech") then in theory we should be able to extract and process that information to get the best possible information from a system. But alas, this concept of words and speech are not ridged enough and in itself is a complex system and thus also susceptible to relativistic interpretation. Hence why science does what it does.

This reminds me of my favorite philosophical razor, Newtons flaming laser sword. also known as Alders Razor (but that name sucks). It (more or less) states that "If something can not be settled by experiment of observation then it is not worthy of debate". Philosophical razors are used to dismiss or 'shave off' unnecessary 'things'. But its kernel rests in this same idea that without concrete (or as close to it) information nothing is really known. I often find myself talking about how "truth is only as true as the truth around it". Humans can be wrong. Once upon a time we thought the sun revolved around the earth. More recently we had to rewrite the theory of superconductivity because we discovered bismuth can be a superconductor (which before trying, we didn't think it could be). This whole llm thing hinges on the ability for information to be absolute but it only sort of can be. People use words wrong, and words evolve. There vector space drifts as society does. So, that's probably something to be concerned about.