3b‐1 Agent concepts - terrytaylorbonn/auxdrone GitHub Wiki

26.0404 (0331) Lab notes (Gdrive) Git

WIKI PAGE MOVED TO

https://github.com/terrytaylorbonn/auxdrone/wiki/4%E2%80%901-Agentic-AI-concepts


Agent demos

This page describes basic agent concepts:

  • 4 LLM outputs are unpredictable
  • 5 AI agents "on rails" are the only way forward
  • 7 My Substack posts
  • 8 Guru posts (mainly Youtube videos) **

** The last section on this page "8.4 An AI guru who has far more modest expectations for AI" is an interview with Andrew Ng. The Youtube video in 8.4 is well worth watching.



4 LLM outputs are unpredictable

How can a python loop reliably interact with a model?



5 AI agents "on rails" are the only way forward

Question: How can a traditional binary deterministic (not AI, not "intelligent") program (the "agent") use AI (LLM) without crashing (on bad LLM output)?

Answer (the only answer): Constraining model inputs with "AI on rails" ("harness engineering").

image

This is the latest (third) iteration of prompt engineering.

image

The term "AI on rails" is my own (borrowed from Ruby on rails).

image



7 My Substack posts

These are typically quick rough draft posts on various topics.



8 Guru posts (mainly Youtube videos)

Recent videos (most are from my favorite AI blogger “Best Partner”) that are very relevant to my agent demos focus.

  • 8.1 A VC analysis of the future agent market (it belongs to those who adapt quickly) (#375)
  • 8.2 The future of AI belongs to framework tools that maximize AI model reliability (such as Harness) (#377)
  • 8.3 An AI guru talks about the “bitter lesson” (bitter for AI gurus) of the unrealized promises of AI (#Welch)
  • 8.4 An AI guru who has far more modest expectations (and no empty promises) for AI (#376)  

8.1 A VC analysis of the future agent market (it belongs to those who adapt quickly) (#375)

https://www.youtube.com/watch?v=mNhHjkf_ahM 【人工智能】Cursor要凉了么 | 顶级风投杰瑞·默多克警告过时 | 自主Agent | SaaS海啸主浪潮未至 | 编排层革命 | 软件买家剧变 | 基础设施重构 | AI原生公司 | 科技趋势

image



8.2 The future of AI belongs to framework tools that maximize AI model reliability (such as Harness) (#377)

【人工智能】Agent Harness Engineering | Agent驾驭/管控工程 | 长时任务的缺陷 | 计算机的操作系统 | 通用型和垂直型 | 苦涩的教训 | 工程实践 https://www.youtube.com/watch?v=qua6FfJmydo

image image



8.3 An AI guru talks about the “bitter lesson” (bitter for AI gurus) of the unrealized promises of AI (#Welch)

The following is a quote from Richard Sutton (AI guru) in the recent YT video “Can humans make AI any better?” (from Welch Labs https://youtu.be/2hcsmtkSzIw?t=547 ). This article states that it was a bitter lesson to discover that the “contents” of minds are so “irredeemably” complex (intelligent) that AI can not fake human intelligence.

Sutton's comments (I struggle to understand exactly what he is saying, but basically its sounds like "(1) AI is not intelligent. (2) The answer: Use the same AI tech to search and discover the world by itself)."

image

Note: Yan LeCun has some harsh words for LLMs. I spent a few months doing various demos of his new JEPA approach to AI. From what I could tell its a rehash of the LLM model concept (just another transformer based UFA). I am still a bit confused by how Yan thinks this will make robots (1) intelligent, (2) able to perceive the world, (3) able to learn autonomously, or (4) safe enough to be allowed anywhere near humans (let alone in the home). Apparently Gemini (below) has been programmed ("trained") solely on text that supports LeCun's ideas.

image



8.4 An AI guru who has far more modest expectations (and no empty promises) for AI (#376)

Andrew Ng is brilliant, soft-spoken and unassuming. This video is full of his brilliant insights (too many to list) on what AI really is. The biggest thing I took from this video is that he originally tried to study the human brain to understand how it hosts intelligence. But the brain was too complex, so he dropped the idea and instead focused on GPU-based binary AI. Such honesty is refreshing. This video is a must watch for anyone who really wants to understand AI.

【人工智能】AGI还早着呢 | 吴恩达 | Agentic AI | 规模化Scaling Laws瓶颈 | 持续学习难题 | 灾难性遗忘 | 开源模型优势 | 赋能人类 | 职业重构 | 行业泡沫

https://www.youtube.com/watch?v=D5rlQiSvbek

image










(OLD) Main concepts (P3b as an evolution of phases 1,2,3)

The idea is This phase P3b will be focused on agentic AI (what I jokingly refer to as eRobotic AI). My thinking:

  • 1 no agent in P1 (the CNN just spits out whatever you input).. just a barebones UFA
  • 2 agents are used in P2 and P3 (internal to LLM and optionally externally).
  • 3 in P2/P3 LLMs and JEPA the iAgent (CPU) builds conversations, manages history, etc. The UFA (NN in GPU) can only spit out tokens that correspond to the tokens for the input (just like CNN with images).
  • in P3 the focus was really on the UFA (NN) and "functional" intelligence (Kalman, etc). not much on the external agent aspect. so.....
  • 3b P3b focuses on "agentic AI" is all the rage nowadays. And for good reason. I name it "eRobotic" = electronic robotic... (just thinking out loud)... the idea is that this is very much like P3 robotics.
    • 3b.1 just like with P3, you want to operate on an external world (for P3b = electronic deterministic)
    • 3b.2 the world that you will operate on is very unforgiving
      • in P3 you could not allow the robot to damage or injure physical objects
      • in P3b must avoid damage to binary deterministic objects (DBs, OSs, avoid misuse of APIs, etc)
    • 3b.3 PROBLEM: you are dependent on a UFA in P3 (and P3b). unreliable, dangerous...
      • almost any human can be certified to operate dangerous machinery (cars, planes, etc)
      • NO UFA has been certified (governments are bending safety rules for Tesla FSD, Waymo, etc)
      • all traffic rules, infrastructure, are designed for human thought patterns, speeds, etc.
    • 3b.4 SOLUTION: HUMAN MUST BE IN THE LOOP TO VERIFY
      • its ok for UFA to assist (drive assist, search engines, medical imagery analysis (note that JEPA does not even attempt at this point to do anything like surgery; its first applications will be medical analysis)
      • indeed, many of the claimed robotic performances are fake (humans are secretly controlling; Waymo monitoring drivers, humanoids being remotely contolled, etc)
      • but never in critical situations (FSD, medical surgery)



⚠️ **GitHub.com Fallback** ⚠️