Phase concepts - terrytaylorbonn/auxdrone GitHub Wiki

2 Diagrams, details

P1 Drone AI (CNNs)

NOTE: A drone is like an airliner: It flies in a very forgiving environment. So you have flight plans, auto-pilot, etc as long as your sensors are good. So the only AI for (my) drones is CNNs (object recognition).

(1) A human, script, or camera sends image/video to CNN.
(2) CNN returns classification.

P2 LLMs

A very forgiving environment for AI like for airliners and drones. Errors in chats dont cause big problems, so this AI took off quickly. Problem is, many assumed that this meant that robotic AI was also going to take off quickly.

(1) Human (via browser chat) or script sends prompt / receives response.
(2a) NN adds a token to response and loops again.
(2b) Word / response sent back human/script.

P3 Robotic AI

This where the AI rubber (ducky) hits the road (reality; the school of hard knocks).

(1a) Human (text/audio) or script sends commands.
(1b) Camera input.
(2) Robotic AI agent works with UFAs (CNN, ViT, JEPA) or deterministic (Kalmman) to determine state, belief, etc.
(3) Robot controller gets intel from Agent and commands actuators.

The problem with AI and robots (with arms or "self-driving" cars) is that AI UFAs are not up to the task. I dont see JEPA changing this (JEPA will still use UFAs are far as I understand).

P3b eRobotic ("agentic") AI

I think this is the most common situation when people talk about agentic AI.

(1) The human or script interacts with the agent.
(2) The agent has the traditional coded logic for interacting with human and NN UFA, for constructing conversations and managing memory, etc.
(2a) Agent searches for data via info APIs (DBs, etc).
(2b) Agent executes APIs that modify things (the dangerous part).