Running local LLMs - BNNLab/BN_Group_Wiki GitHub Wiki

LM Studio is a software which allows one to run a local Large Language Model. This has the benefit of not compromising any potential IP, but require careful selection of the right LLM for the purpose without bogging down your desktop computer. AutoCoder is the best model for coding, but is quite heavy. Hermes-3-Llama-3.1-8b is normally an acceptable compromise, unless one is writing code with an exotic Python package.

LM Studio can be download at: https://lmstudio.ai/download. Follow the installation instruction on this website. On a university desktop PC, a request to IT is required. Once install, you can follow the step below to get up and running.

Step 1: Click on 'Select a model to load' button near the top of the window, or press Ctrl+L
Step 2: Search for 'Hermes llama', it will shows 'No matching result', which means that no such LLM has been downloaded. Click on the 'Search more results' button.
Step 3: You should be able to select Hermes-3-Llama-3.1-8b model now. Click the Download button at the bottom right corner. Wait for it to be downloaded.
Step 4: Now lick on 'Select a model to load' again, the Hermes-3-Llama-3.1-8b should show up as the one available.
Step 5: You should see the screen below, with the loaded model name at the top. You can now type in your questions/prompts, and it's possible to include a file with your question.
Step 6: At the top right, there is a button with a flask icon. That's the settings for the model, which are also accessible with Ctrl+B.
Step 7: You can change these setting as you go along and see how the model responses. To understand them, some additional reading is recommended. Different LLMs have different settings, but most of them should allow you to change the number of CPU cores and the amount of RAM dedicated to the model.