Developing Locally, Training Remotely on Runpod - Nerogar/OneTrainer GitHub Wiki

Developing on clouds

Finetunes or much larger LoRAs can require more vram that you have on a consumer card, requiring you to rent. On the other hand, development on rented GPUs can get expensive, especially when you leave the GPU is idle while you are editing python files.

Set-up private fork of OneTrainer

Create an empty repository (not a fork) on github. Run locally:

git clone --bare https://github.com/Nerogar/OneTrainer
cd OneTrainer
git push --mirror https://github.com/your-username/OneTrainer-private
cd ..
git clone https://github.com/your-username/OneTrainer-private
cd OneTrainer-private
git remote add upstream https://github.com/Nerogar/OneTrainer
git remote set-url --push upstream DISABLE

Setting up the cloud

  • Start a cloud server
  • Login to github on the cloud server using apt-get update && apt-get install gh && gh auth login
  • Run cloud OneTrainer locally, change settings:
  • Start Training
  • Stop the cloud cheaply on RunPod. Only disk costs of roughly 0.03 $ per hour remain for 100 GB, while you edit your python files locally.

Developing

  • After editing files, run git commit -a -m "update" && git push locally to push the changes to your private repo
  • Press "Start Training" again - the remote cloud fetches the changes and trains. "Update OneTrainer" must be enabled.
  • Stop cloud
  • When you are done, delete the cloud to avoid the disk costs. Next time, repeat the github login above.
  • When you are ready to push your changes to a public branch, avoid the long commit history created by the commit commands above. One method is to create a patch using git diff master > diff-file in your private repo, and git apply diff-file it in your public repo branch.

Limitations

  • If you need pdb breakpoint()s, copy the training command displayed after "Start Training" in the local console, and run it on the cloud console
  • when you try to resume a stopped RunPod, it is possible that all GPUs on the machine that you have your storage are occupied. If this is a regular occurrance, you can avoid it by using a network storage volumes. See RunPod website.