Agent S:example - chunhualiao/public-docs GitHub Wiki

Concrete Example Walkthrough of Agent S

Let's walk through a concrete example to illustrate how Agent S works, incorporating the technical details described in the paper.

Scenario

  • User Task (Tu):
    "In LibreOffice Calc, open the file /home/user/Documents/report.ods. Find the sum of column C (cells C2 to C50) and put the result in cell D1. Then, make the text in D1 bold."

  • Initial Environment:
    A standard Ubuntu desktop view. The file report.ods exists in the specified directory. LibreOffice Calc is not yet open.

Step 0: Initial Observation (O0)

  • ACI captures:
    • Screenshot of the desktop.
    • Accessibility Tree containing interactable elements (e.g., Files icon, Documents folder icon).

Step 1: Manager - Planning

  • Observation-Aware Query:

    • Q = LLM(Tu, O0) → Generate query to plan.
  • External Knowledge Retrieval:

    • Kweb = Retrieve(Web, Q) → Instructions like opening files, using SUM formula, bolding text.
  • Internal Narrative Memory Retrieval:

    • En = Retrieve(Mn, Q) → Abstract summary of similar full tasks.
  • Fusion & Subtask Planning:

    • {(s0, Cs0), (s1, Cs1), (s2, Cs2)} = MLLM(Kfused)
      (where Kfused = LLM(En, Kweb))

    • Subtasks:

      • s0: Open /home/user/Documents/report.ods using LibreOffice Calc.
      • s1: Calculate the sum of C2 to C50 into D1.
      • s2: Make the text in D1 bold.

Step 2: Worker 0 - Execute Subtask s0 (Open File)

  • Episodic Memory Retrieval:

    • Es0 = Retrieve(Me, (Tu, s0, Cs0))
  • Trajectory Reflection:

    • Observes the execution (initially no reflection needed).
  • Action Generation & Execution:

    1. a0: Click Files icon → O1
    2. a1: Click Documents folder → O2
    3. a2: Double-click report.odsO3 (LibreOffice Calc opens)
  • Subtask Completion:

    • Worker w0 sees report.ods open. Signals DONE.

Step 3: Self-Evaluator - Update Episodic Memory (Me)

  • Summary:
    • Rs0 = S(Episode_0)
  • Storage:
    • Save(Me, (Tu, s0, Cs0), Rs0)

Step 4: Worker 1 - Execute Subtask s1 (Calculate Sum)

  • Episodic Memory Retrieval:

    • Es1 = Retrieve(Me, (Tu, s1, Cs1))
  • Action Generation & Execution:

    1. a3: Click D1 cell → O4
    2. a4: Type =SUM(C2:C50) and press Enter → O5
  • Subtask Completion:

    • Worker w1 sees the sum in D1. Signals DONE.

Step 5: Self-Evaluator - Update Episodic Memory (Me)

  • Summary:
    • Rs1 = S(Episode_1)
  • Storage:
    • Save(Me, (Tu, s1, Cs1), Rs1)

Step 6: Worker 2 - Execute Subtask s2 (Make Bold)

  • Episodic Memory Retrieval:

    • Es2 = Retrieve(Me, (Tu, s2, Cs2))
  • Action Generation & Execution:

    1. a5: Re-select D1 if needed.
    2. a6: Click Bold button → O7
  • Subtask Completion:

    • Worker w2 sees D1 text is bold. Signals DONE.

Step 7: Self-Evaluator - Update Episodic Memory (Me)

  • Summary:
    • Rs2 = S(Episode_2)
  • Storage:
    • Save(Me, (Tu, s2, Cs2), Rs2)

Step 8: Manager - Task Completion

  • Manager confirms that s0, s1, and s2 are all marked DONE. Task complete.

Step 9: Self-Evaluator - Update Narrative Memory (Mn)

  • Overall Summary:
    • Enu = S(G(Tu))
  • Storage:
    • Save(Mn, Q, Enu)

Flow Summary

  1. Manager plans high-level subtasks using web/narrative memory.
  2. Workers execute subtasks step-by-step using episodic memory and reflection.
  3. ACI handles perception and action execution.
  4. Self-Evaluator updates both episodic and narrative memories based on task outcomes.