home5 26.0214 - terrytaylorbonn/auxdrone GitHub Wiki

26.0214 Lab notes (Gdrive) Git

(PAGE REORG 26.0208-14) This wiki documents my work with AI (for a video summary see Youtube video (TODO). I alone (with the help of ChatGPT) determined the structure, content, and main concepts. About the author.

Note: In the PHASES 4-1 below (latest phases at top), UFA = universal function approximator. This the best description of what an LLM transformer (TF) really is (LLM = TF + internal agent). The logic of many functions can be described as equations. But some (such as language) are far too complex. So a NN (neural network) is used as a UFA whose "coefficients" are "trained" into the NN using real world data.

PHASE 4 (FUTURE 2027++)

ZAI 5 — GI (genuine intelligence). In the GPU-/CPU-based AI (simulated intelligence) world, terms like "neural network", "intelligence", "belief", "observation", etc have vastly different meanings than in the biological world (true intelligence) (see my related Substack post #64). In the future world of GI (hosted on 3d electronic substrates), the meanings of such terms will match or align closely with those in the biological world.

ZAI 4 — ZAI Books. Bringing it all together in one conceptual whole (see Substack #66 for the basic idea). Books/blogs/videos summarizing my take on ZAI 1-3 AI (and ZAI 5 GI). To be distilled from the docx lab notes (Gdrive).

PHASE 3 (2026): Robotic intelligence (RI). This the current focus.

SUMMARY: RI is

Input -> computation -> output. Simulation of intelligent comprehension of the external world (there is no real intelligence).
(3b) "Agents" . They interface between the external world and the AI "sensors".
(3a) AI "sensors"
- internal NN world models (JEPA) (inputs->iAgent->TF and TF->iAgent->outputs) (model transformers) use UFAs (universal function approximators, NNs) because no pure function can describe the complex logic
- internal deterministic functions (f(x)) that define world physics.

ZAI 3b — Robotic control agents (see Robotic AI demos). The agent perceives the state of the external world, imagines action results using its own internal world model, chooses an action, and then interacts with the external world via actuators.

ZAI 3a1 — JEPA Sensor ++ Simulation ("imagination") (see Robotic AI demos).
- (5 NNs; for demos see #525_RAI-5_.docx) JEPA world model (latent predictive representation). State becomes learned + predictive; enables simulation, imagination, planning. SIMULATION would be like "testing" prompt seqs in LLM, then choosing the best one based on output (for Robot, you want to research the results of actions before actually sending implementing those actions).

ZAI 3a — Robotic sensor (world model NN or deterministic function) demos (see Robotic AI demos). Drones (ZAI 1) rely on forgiving physical dynamics (air, inertia) to absorb error, but close-range robots and self-driving systems operate in unforgiving environments where errors cannot be tolerated (making functionality like uncertainty-aware belief maintenance a core requirement).
- (4 NNs; for demos see #524_RAI-4_.docx) Neural assist (correction / uncertainty / measurement learning). NNs that improve state estimation. NN improves belief quality but does not replace explicit state. residuals.
- (2 FUNCTIONS; for demos see #522_RAI-2_.docx) Deterministic world state (explicit belief). Explicit math world (Geometry, tracking, Kalman-style state, occlusion, persistence).
- (1 NNs) Perception from raw pixels; CNN / ViT features, detection, segmentation (but don’t model dynamics; CNN/ViT features, no world model, No state, no dynamics, no belief).

PHASE 2 (2024-2025): LLMs.

The latest AI tools (LLMs like GPT) made MERN stack dev (ZAI 1.4) vastly more efficient. These tools were obviously the future. I shifted my focus.

SUMMARY: LLM AI is:

Input -> computation -> output. There is no real intelligence.
(2b) "iAgents" (internal model agents) that interface between input and TF. they create the illusion of intelligent conversation.
(2a) AI "sensors" (model transformers) use UFAs (universal function approximators, NNs) because no pure function can describe the complex logic

ZAI 2b — TF control agent (binary programmed logic) demos: AI LLM stacks (shows demos of MERN stack access to self-deployed LLMs). An LLM has an internal agent (iAgent) that controls the UFA sensor (TF). This combination creates the simulation of intelligence. Note that you can also create your own (external) agent (the common meaning of the word "agent") that allows you to control the LLM programmatically (instead of typing in chat prompts).

ZAI 2a — UFA sensors: transformers (TFs): Language TFs (GPT-3) are the core token sequence generators inside LLMs (storyline recognition for "language models"). You input a prompt and the current accumulated response words, and the LLM gives a single output that is decoded to give you the next word (repeat the cycle to get more response words). To me TF is also a "sensor".

PHASE 1x (2024): WEBSITE DEV (side trip that got me interested in LLMs programming)

I never really used AI tools for drone AI dev. I was working (mainly Chinese) HW and open source (chaos) SW, and doing test flights. Not the kind of thing you could use ChatGPT very effectively for (just used search engines). I dont think ChatGPT (at least then) had be programmed (learned) AI tech.
But then I discovered LLM AI when programming Tech (MERN) stacks. The drone (ZAI 1) market was saturated, working in Ukraine was not the best idea, and I was spending all my time on debugging open source SW and Chinese components. I shifted focus to refreshing my website dev skills. I focused mainly on MERN stacks (but rather unexpectedly, my interest quickly shifted to the AI tools (LLMs, ChatGPT) I was using to quickly create MERN stacks).

PHASE 1 (2023-2024): DRONES (my first programming with AI (CNNs))

SUMMARY AI Drones: The goal was to build AI (CNN object recognition) drones in Ukraine (AI includes code assist, search engines, "fuzzy" logic, etc.. but this is the first time I programmed an AGENT + NN). Note: For drone building part, FPV and Pixhawk builds, see EPIC 1 - Build/fly FPV drone and - EPIC 2 - Build/fly Pixhawk drone.

1b Autonomous flight (AGENT (+Pixhawk + Jetson Nano + CNN object recognition (on Nano/PI)). For details see EPIC 4 – Basic Autonomy, EPIC 5 – Advanced Autonomy.
1a Deterministic (functions) and UFA (NN) sensors
- Image CNNs (vision AI used on drones). Add AI to Pixhawk drone, 10.3 SITL AP AI Yolo obj recog. CNN study is good prep for TFs (2024). A CNN is similar to TF. You input many pixels, and you get a single output that decode to determination the image classification (therefore I refer to it as a "sensor"; it senses something). THIS WAS EASIER THAN KALMAN because just used pre-trained model on Nvidia Jetson Nano via API.
- 15.1 AI Kamikadze (Kalman, PID). Very complex (dealing with actual equations).

The "Aux" in "Auxdrone" is part of the name of an NGO I was working with that was active in Ukraine.