Agent S:example - chunhualiao/public-docs GitHub Wiki
Concrete Example Walkthrough of Agent S
Let's walk through a concrete example to illustrate how Agent S works, incorporating the technical details described in the paper.
Scenario
-
User Task (
Tu
):
"In LibreOffice Calc, open the file/home/user/Documents/report.ods
. Find the sum of column C (cells C2 to C50) and put the result in cell D1. Then, make the text in D1 bold." -
Initial Environment:
A standard Ubuntu desktop view. The filereport.ods
exists in the specified directory. LibreOffice Calc is not yet open.
O0
)
Step 0: Initial Observation (- ACI captures:
- Screenshot of the desktop.
- Accessibility Tree containing interactable elements (e.g., Files icon, Documents folder icon).
Step 1: Manager - Planning
-
Observation-Aware Query:
Q = LLM(Tu, O0)
→ Generate query to plan.
-
External Knowledge Retrieval:
Kweb = Retrieve(Web, Q)
→ Instructions like opening files, using SUM formula, bolding text.
-
Internal Narrative Memory Retrieval:
En = Retrieve(Mn, Q)
→ Abstract summary of similar full tasks.
-
Fusion & Subtask Planning:
-
{(s0, Cs0), (s1, Cs1), (s2, Cs2)} = MLLM(Kfused)
(whereKfused = LLM(En, Kweb)
) -
Subtasks:
s0
: Open/home/user/Documents/report.ods
using LibreOffice Calc.s1
: Calculate the sum of C2 to C50 into D1.s2
: Make the text in D1 bold.
-
s0
(Open File)
Step 2: Worker 0 - Execute Subtask -
Episodic Memory Retrieval:
Es0 = Retrieve(Me, (Tu, s0, Cs0))
-
Trajectory Reflection:
- Observes the execution (initially no reflection needed).
-
Action Generation & Execution:
a0
: Click Files icon →O1
a1
: Click Documents folder →O2
a2
: Double-clickreport.ods
→O3
(LibreOffice Calc opens)
-
Subtask Completion:
- Worker
w0
seesreport.ods
open. Signals DONE.
- Worker
Me
)
Step 3: Self-Evaluator - Update Episodic Memory (- Summary:
Rs0 = S(Episode_0)
- Storage:
Save(Me, (Tu, s0, Cs0), Rs0)
s1
(Calculate Sum)
Step 4: Worker 1 - Execute Subtask -
Episodic Memory Retrieval:
Es1 = Retrieve(Me, (Tu, s1, Cs1))
-
Action Generation & Execution:
a3
: Click D1 cell →O4
a4
: Type=SUM(C2:C50)
and press Enter →O5
-
Subtask Completion:
- Worker
w1
sees the sum in D1. Signals DONE.
- Worker
Me
)
Step 5: Self-Evaluator - Update Episodic Memory (- Summary:
Rs1 = S(Episode_1)
- Storage:
Save(Me, (Tu, s1, Cs1), Rs1)
s2
(Make Bold)
Step 6: Worker 2 - Execute Subtask -
Episodic Memory Retrieval:
Es2 = Retrieve(Me, (Tu, s2, Cs2))
-
Action Generation & Execution:
a5
: Re-select D1 if needed.a6
: Click Bold button →O7
-
Subtask Completion:
- Worker
w2
sees D1 text is bold. Signals DONE.
- Worker
Me
)
Step 7: Self-Evaluator - Update Episodic Memory (- Summary:
Rs2 = S(Episode_2)
- Storage:
Save(Me, (Tu, s2, Cs2), Rs2)
Step 8: Manager - Task Completion
- Manager confirms that
s0
,s1
, ands2
are all marked DONE. Task complete.
Mn
)
Step 9: Self-Evaluator - Update Narrative Memory (- Overall Summary:
Enu = S(G(Tu))
- Storage:
Save(Mn, Q, Enu)
Flow Summary
- Manager plans high-level subtasks using web/narrative memory.
- Workers execute subtasks step-by-step using episodic memory and reflection.
- ACI handles perception and action execution.
- Self-Evaluator updates both episodic and narrative memories based on task outcomes.