Agent S:example - chunhualiao/public-docs GitHub Wiki
Concrete Example Walkthrough of Agent S
Let's walk through a concrete example to illustrate how Agent S works, incorporating the technical details described in the paper.
Scenario
-
User Task (
Tu):
"In LibreOffice Calc, open the file/home/user/Documents/report.ods. Find the sum of column C (cells C2 to C50) and put the result in cell D1. Then, make the text in D1 bold." -
Initial Environment:
A standard Ubuntu desktop view. The filereport.odsexists in the specified directory. LibreOffice Calc is not yet open.
Step 0: Initial Observation (O0)
- ACI captures:
- Screenshot of the desktop.
- Accessibility Tree containing interactable elements (e.g., Files icon, Documents folder icon).
Step 1: Manager - Planning
-
Observation-Aware Query:
Q = LLM(Tu, O0)→ Generate query to plan.
-
External Knowledge Retrieval:
Kweb = Retrieve(Web, Q)→ Instructions like opening files, using SUM formula, bolding text.
-
Internal Narrative Memory Retrieval:
En = Retrieve(Mn, Q)→ Abstract summary of similar full tasks.
-
Fusion & Subtask Planning:
-
{(s0, Cs0), (s1, Cs1), (s2, Cs2)} = MLLM(Kfused)
(whereKfused = LLM(En, Kweb)) -
Subtasks:
s0: Open/home/user/Documents/report.odsusing LibreOffice Calc.s1: Calculate the sum of C2 to C50 into D1.s2: Make the text in D1 bold.
-
Step 2: Worker 0 - Execute Subtask s0 (Open File)
-
Episodic Memory Retrieval:
Es0 = Retrieve(Me, (Tu, s0, Cs0))
-
Trajectory Reflection:
- Observes the execution (initially no reflection needed).
-
Action Generation & Execution:
a0: Click Files icon →O1a1: Click Documents folder →O2a2: Double-clickreport.ods→O3(LibreOffice Calc opens)
-
Subtask Completion:
- Worker
w0seesreport.odsopen. Signals DONE.
- Worker
Step 3: Self-Evaluator - Update Episodic Memory (Me)
- Summary:
Rs0 = S(Episode_0)
- Storage:
Save(Me, (Tu, s0, Cs0), Rs0)
Step 4: Worker 1 - Execute Subtask s1 (Calculate Sum)
-
Episodic Memory Retrieval:
Es1 = Retrieve(Me, (Tu, s1, Cs1))
-
Action Generation & Execution:
a3: Click D1 cell →O4a4: Type=SUM(C2:C50)and press Enter →O5
-
Subtask Completion:
- Worker
w1sees the sum in D1. Signals DONE.
- Worker
Step 5: Self-Evaluator - Update Episodic Memory (Me)
- Summary:
Rs1 = S(Episode_1)
- Storage:
Save(Me, (Tu, s1, Cs1), Rs1)
Step 6: Worker 2 - Execute Subtask s2 (Make Bold)
-
Episodic Memory Retrieval:
Es2 = Retrieve(Me, (Tu, s2, Cs2))
-
Action Generation & Execution:
a5: Re-select D1 if needed.a6: Click Bold button →O7
-
Subtask Completion:
- Worker
w2sees D1 text is bold. Signals DONE.
- Worker
Step 7: Self-Evaluator - Update Episodic Memory (Me)
- Summary:
Rs2 = S(Episode_2)
- Storage:
Save(Me, (Tu, s2, Cs2), Rs2)
Step 8: Manager - Task Completion
- Manager confirms that
s0,s1, ands2are all marked DONE. Task complete.
Step 9: Self-Evaluator - Update Narrative Memory (Mn)
- Overall Summary:
Enu = S(G(Tu))
- Storage:
Save(Mn, Q, Enu)
Flow Summary
- Manager plans high-level subtasks using web/narrative memory.
- Workers execute subtasks step-by-step using episodic memory and reflection.
- ACI handles perception and action execution.
- Self-Evaluator updates both episodic and narrative memories based on task outcomes.