Agentic Rubber Duck ‐ CS312 Project Feedback - beanlab/rubber-duck GitHub Wiki

Agentic Rubber Duck - CS312 Project Feedback

What do I want to be able to do?

Provide useful feedback to students on their CS312 projects.
Currently, I want to have 5 feedback ducklings that each provide feedback on a unique item. More may be added later.
These currently include:
- code structure
- report quality
- code quality
- time complexity
- space complexity
The Master Feedback Duck is able to call one, two or all of the feedback ducklings based on the student's request.

If the student asks for feedback on everything, the conversation could look like this:

Student: I want feedback on everything. (uploads code) (uploads report)

Bot: Code implementation .... You have the correct implementation for the project .... etc. etc.

Bot: Code structure feedback.... You shouldn't duplicate code .... etc. etc.

Bot: Code quality feedback.... You shouldn't name functions var1 .... etc. etc.

Bot: Report feedback.... Blasphemy! is spelled wrong ....

Bot: Time complexity feedback: Your functions.... you should .... ...

Bot: Space complexity feedback: Your functions.... you should .... ...

If the student asks for specific feedback on one thing, it might look like this:

Student: I want feedback on my code structure. (uploads code) (uploads report)

Bot: Code structure feedback.... You shouldn't duplicate code .... etc. etc.

Student: Now I want feedback on the time complexity

Bot: Your functions.... you should .... ...

Known Details -- Master Duck and Ducklings

General Rubric Ducklings

For 3 of my feedback ducklings, they are "general rubric" ducklings.

it uses: a 'general rubric' a general_rubric_prompt.md file
it takes in: student work (code or report), rubric items (yaml file/array of items), context (a list of files i.e. example report, additional instructions provided to the student etc.) (default=None)
it returns: list of feedback items corresponding to the rubric items passed in (Pydantic structured output)

Example rubric items input:

- code is understandable (ie a human could read it and not be confused)
- no duplication (ie functions are not copied and pasted)
- appropriate comments (ie no code commented out, comments included are clear and understable)
- no outstanding "Todo's"

Example Output Object

class FeedbackResponseItem(BaseModel):
    meets_criteria: bool
    citation: str
    comments: str

class FeedbackResponseItems(BaseModel):
    items: List[FeedbackResponseItem] = Field(min_length=num_items,
                                              max_length=num_items)

Code Complexity Ducklings

For 2 of my feedback ducklings, they are "code complexity" ducklings.

a "code complexity agent" returns feedback about the student's code analysis.
I need 2 "code complexity" ducklings -- time and space. They follow the same exact process, but have at least 1 different prompt file.
"Code Complexity" ducklings
- it takes in: student code (python file), student complexity report (md file/pdf), a list of major functions (yaml/python list)
- it returns a list of feedback complexity items for each function (Pydantic structured output)
There are several steps to this process
- Have a bot analyze code
  - it uses: a bot analyze code prompt (.md file -- unique for time and space complexity), student code
  - it returns: the code with annotations from the bot for the complexity (a string, or a file)
- I need to divvy up the functions from both the student analysis and the bot analysis. This agent would be called twice to:
  - Divvy up the bot analyzed code into functions
  - Divvy up the student analysis into individual functions
  - For The divvy up functions function
    - it uses: a divvy up functions prompt, code/report, a list of functions to extract
    - it returns: a list code analyzed extracted functions
Compare the student analysis with the bot analysis
- it uses: a compare prompt, the list of bot code analyzed extracted functions, the list of student analyzed functions
- it returns: a list of feedback for each of the functions

Implementation/Open questions:

How do we implement it?
- Creating a tool function (in tools.py), but specifying the parameters in the config
  - This would allow the general rubric prompt to be used in multiple places
I need the tools to be able to do their own chatgpt calls with specific prompts, context, and structured input/output
- I need structured input/output on most, if not all, of these calls. Is creating a tool, with specific structured input/output sufficient?
  - I need specify the length of the list of the output based on the number functions/rubric items provided
- I need specific prompts for various calls. Is the way the config set up right sufficient?
- I need to be able to provide context files for a lot of these calls.
  - Can we set up the config to be able to provide those?
The complexity requires several tools calls (at least 4).
- Option 1: use logic and specify the order and do the chat completions ourselves while specifying paramters
- Option 2: provide the tools to the agent and tell it when to call what
  - Would creating multiple a tool and specifying parameters in the config be sufficient?
- Others?
Handoff prompts -- How effective are they? Will we need to be more specific with them? What can we do to make them more effective?

Potential Future Feature requests:

Student can re-upload the report and see if the feedback issues were addressed
Student can upload report and code seperated.
Example conversation:

Student: I want feedback on my code structure. (uploads code)

Bot: Code structure feedback.... You shouldn't duplicate code .... etc. etc.

Student: student asks a question about the provided feedback

Bot: answers question about feedback...

Student: Now I want feedback on the time complexity (uploads report)

Bot: Your functions.... you should .... ...

Student: I fixed it! How about now? (uploads report)

Bot: You still need to fix this ....

Student: I think I really fixed it this time. How about now? (uploads report)

Bot: Nice work. You are still missing....

Student: I think I really actually fixed it this time. How about now? (uploads report)

Bot: Looks good! Nice work!

Potential Feature #2

Bot probes if the student submitted their best work

Student: I want feedback on my code. (uploads code)

Bot: Is this your best work?

Student: No... I should remove the leftover comments in my code.

Bot: Do that and then submit again.

Student: I removed my comments (uploads code)

Bot: Do you have any unnecessarily duplicated code?

Student: yes... I'll go fix it.

Bot: Good job!

Student: I fixed it (uploads code)

Bot: Are you sure you fixed it? It looks like it might still have a lot of duplicated code

Student: Well... I tried, but it didn't go very well.

Bot: What did you try?

... conversation continues until the student fixes their own duplicated code ...

Potential Feature #3

Bot asks about each rubric item before providing feedback making the student think about each one.