lab 2 - humphd/topics-in-open-source-2024 GitHub Wiki
Lab 2
Due Date
Friday Sept 20 by Midnight.
Overview
This week we are going to practice contributing and submitting Pull Requests to other repos we don't own. This lab will help you gain experience doing the following:
- forking and cloning other projects
- creating branches to work on new features and fix bugs
- working on code you didn't write, trying to maintain the original style and not break things
- learning more about working with LLMs
- creating pull requests
- collaborating with other developers on GitHub
- reviewing code changes
- updating your pull requests to include fixes for review comments
Step 1. Pick another Student Project
Pick another student's project from the Lab 1 Submissions - "Repo you Reviewed (URL)" list. You can work on any project other than your own, and you do not need to work with the same partner as last week. Ideally, make sure no one else is working on this repo if possible (one student per repo for this lab is ideal, but not a requirement). You can start by messaging the owner on Slack.
Step 2. Add a New Feature: Token Info
When programming with LLMs it is necessary to understand how many tokens you are sending, receiving, and being billed for with a given request/response. In addition, all models have a fixed context length (i.e., how many tokens they can process), so it is important to stay within a given token budget.
To better understand this, you are asked to add a new command-line flag: --token-usage
or -t
. When the program is run with the --token-usage
/-t
flag set, extra information will be reported to stderr
about the number of tokens that were sent in the prompt and returned in the completion.
A typical OpenAI-style chat completion response looks like this:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The 2020 World Series was played in Texas at Globe Life Field in Arlington.",
"role": "assistant"
},
"logprobs": null
}
],
"created": 1677664795,
"id": "chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW",
"model": "gpt-4o-mini",
"object": "chat.completion",
"usage": {
"completion_tokens": 17,
"prompt_tokens": 57,
"total_tokens": 74
}
}
It includes a usage
property, which holds information about the number of tokens that were used in the completion, including:
completion_tokens
: the number of tokens in the generated completion (i.e., the response)prompt_tokens
: the number of tokens in the prompttotal_tokens
: the number of tokens used in the prompt and completion (i.e.,completion_tokens + prompt_tokens
)
Being able to obtain this information easily is useful when debugging.
Step 3. File an Issue
Search through the existing Issues to make sure no one has filed an Issue for this feature yet. If there is one already, move on to another project repo.
If there isn't, file an Issue to add this new feature. Describe what you want to do in detail, and mention that you'd like to work on this. Give enough information for the project owner to understand what you plan on doing, and give feedback about how they want it done.
Step 4. Fork, Clone, Branch
Fork the other student's project on GitHub, then clone your fork. Next, create a branch for your work. If you filed Issue #5, name your new branch issue-5
:
git checkout -b issue-5
Do all of your changes (i.e., all of your commits) on this branch, not the main
branch.
Step 5. Write the Code
First, read the existing code. Get a sense of how it's organized (files, classes, functions), and make sure you can run it. If the code is broken before you begin making changes, it will be hard to test your work. If you're unclear about how something is written, ask the owner for tips. Remember, open source is a team sport. You don't need to struggle on your own in silence. Use the community to discuss your work and get help.
You will need to make the following changes to your partner's repo:
- Determine which LLM provider(s) they currently support, and research out how to get token usage information from the completion response. Different providers provide it in slightly different ways
- Find the code where your partner handles command-line flags and try to understand where you'll add
--token-usage
/-t
- Find the code where your partner parses the LLM response, and figure out where you'll add the token usage extraction
- Find the code where your partner outputs their response and diagnostic info, and figure how how you'll integrate the token usage
Once you understand the existing code, start making changes in order to implement your feature. Make sure you write your code as closely to the style of the original author as possible. Make it look like the same person wrote all the code. Pay attention to how they name things, how they do formatting, where they put things, etc. You're not trying to rewrite their code in your style, but write new code in their style.
Try to change as little as possible in the existing code. Don't start rewriting everything because you like a different style. Don't touch code that is unrelated to your changes. Don't fix bugs unrelated to your current work (that should be done in another issue, pull request, branch). Be focused! Touch only the code you need to in order to make your changes work. Write as little code as possible, while still making sure the feature works. NOTE: if you do find other bugs while you are working, feel free to file additional issues in the other student's repo.
As you work, commit changes to your branch. For example, you might start by adding support for the --token-usage
and -t
flags. Once that's written, you should commit your code before proceeding further. Your commits should be small, and tell a story: "Add --token-usage and -t flags", "Update response parsing to extract token usage information", etc.
Make sure your changes don't break the original code. Test, test, test, and test again. When you are satisfied that things are working, proceed to step 6.
Step 6. Update the Docs or Other files
Because your code is adding a feature, it's likely that you need to update other non-code files as well. For example, the docs (README) will need to be updated with info about this new feature, as well as what the information it gives means and how to understand it. There could be other files that need to be updated as well.
Making changes to a project often involves updating code, tests, dependencies, etc. Make sure you look for all the places you need to update things. Include all of these related changes in your branch.
Step 7. Create a Pull Request
When you're finished Steps 1-6, create a Pull Request. Start by pushing your branch to your fork on GitHub (i.e. your origin
). Assuming you were working on a branch called issue-5
:
git push origin issue-5
Obviously you should rename issue-5
to the actual branch name you are using.
Follow these steps to create a Pull Request from your branch. Pay attention to the following:
- Pick the correct branch in your repo (e.g.,
issue-5
for you andmaster
ormain
for the original repo). You want your work to get merged into the original project'smaster
ormain
branch eventually - Write a complete title for your pull request. For example, "Add support for --token-usage/-t flag"
- Write a complete description of what you did, including info that this
Fixes #5
(or whatever Issue number you are fixing). GitHub will automatically link an Issue and Pull Request for you if you use the correct syntax. In your description, talk about what you changed in the code, how you did it, explain why you made certain choices, and discuss any problems you encountered or bugs you know about. Make sure the project owner can understand why and what you want to change with your pull request. Be detailed!
Step 8. Get Feedback and Update your Pull Request
Find the original repo's owner on Slack, and politely ask them to review your Pull Request. It is almost guaranteed that they will ask you to make changes (NOTE: if you are reviewing someone else's changes to your repo, please ask them to change something so they can practice this part, even if it's small).
When you are asked to make changes, go back to your code and make sure you are on the same branch that you submitted. For example, git checkout issue-5
to get on the issue-5
branch.
Edit the code to address the reviewer's comments. Make sure you deal with all of them! When you're done, add another commit to this branch:
git checkout issue-5
git add file1
git commit -m "Updating x, y, and z based on review feedback"
git push origin issue-5
Again, change issue-5
to whatever branch you are working on. Once you've done this, go back to the Pull Request on GitHub and leave a comment telling the reviewer you have completed all their changes, and what you did to accomplish them.
Repeat this cycle as many times as necessary for the project owner to Approve your changes and merge your work.
NOTE: if you are merging another student's work on your main
branch, make sure you pull these changes into your local machine afterward (assuming you are working on the main
branch):
git checkout main
git pull origin main
This will bring all of the new code changes into the repo on your local machine so that you can build on top of them. If you forget to do this, the changes will be included in your repo on GitHub but not in your locally cloned repo.
Also, make sure the original Issue gets closed once the Pull Request is merged. It might have happened automatically, depending on whether or not the original issue included the text Fixes #5
(or whatever the issue number is) in the description.
Step 9. Write a Blog Post
Write a blog post about the process of contributing a code change to another project. In your post, include links to everything you discuss (e.g., the project repo, your issue, your pull request). Discuss what you did, the changes you made for your feature, and the process of getting your work accepted. What problems did you have? What did you learn? What would you do differently next time?
If your repo received a pull request, please also talk about what it was like to get a submission. How much did you need the author to change? How did that process go?
Submission
When you have completed all the requirements above, please add your details to the table below.