Troubleshooting most common problems - lahoramaker/awesome-lmstudio GitHub Wiki

Welcome to this page, where you'll find answers to the most common issues when running LM Studio.

Failed to load model 'The Bloke • Model name qX_0 gguf' ?

If you are getting this error, probably you are trying to load a model bigger than what your computer can handle. Running Large Language Models fluently requires usually a large GPU.

By default LM Studio will try to load the model into your system's RAM. When initializing, it will reserve the size of the model (in Gb), plus some extra ammount of Gbs for the context (the amount of text that the model will take in consideration when creating new words in the output).

As a rule of thumb, the model will require the size of the model times 1.25 in RAM or Graphics Card's RAM. Different quantizations (compressions) will reduce the amount of memory required.

If you are getting errors with a high quantization (Q8_0, Q6_K, or Q5_K_M), try downloading some of the smaller quantized models (like Q4_K_S, Q3_K_M, etc).

image

The performance I am getting is really slow. Is there any way to accelerate the performance?

By default LM Studio will use your CPU to produce the chat responses. In order to enable GPU acceleration, and make it much more faster, follow these steps:

  • Open the chat window (third icon on the left)
  • Expand the Settings panel on the right hand side of the chat window
  • Expand Model Configuration carousel
  • Expand the Hardware Setting at the bottom
  • Mark the checkbox "GPU Acceleration". Set the appropiate number of layers.
  • Press tab or click in other parts of the window to save the settings. A new button should appear at the bottom of the chat area.
  • Click the new button to reload the button.
  • Enjoy the new speed

How many layers should I configure for GPU Acceleration?

The recommended way to test the appropirate number of layers is start with a small number, like 5, and grow from that number slowly. This will help you accelerate any model, even if it can't be loaded fully into the GPU memory.

The brute force apprach would be setting the number of layers to 1000 and try to load the model. If it fails, you can reduce the number until it stops crashing. The main challenge with this approach is that if the model is large enough it might take a lot of time to find the nice spot until it doesn't crash.

My Windows system has all minimum requirements but I'm still unable to load models

LM Studio requieres a working Visual C++ Redistributable Package to operate. Usually this is installed in your computer from previous installations but current installation might not be up to date with the latest release. You can install the latest versions from Microsoft Learn

How do I setup a preset config for each of the models? I have heard about Alpaca format, ChatML, etc.

LM Studio 0.2.8 adds auto-configuration for all models. You can set it up manually also on the Settings panel of the chat window.