1.2. Required Reading - SamuraiBarbi/jttw-ai GitHub Wiki
-
We'll need to get up to speed with best practices for writing prompts that produce the intended responses from large language models ( LLM ). The best resource for this is the Prompt EngineeringAwesome GPT Prompt Engineering repo. It's comprehensive collection of the best prompt engineering guides and information currated by the A.I. entheusiasts community. Read it.
-
Hardware Requirements
In order to run LLM locally hosted on our own machine/server we'll need to understand the hardware requirements, costs, and best available hardware options in accordance to our budget. The best resource for this is Tim Dettmers' comprehensive write up on The Best GPUs for Deep Learning in 2023. Read it. If you can't be bothered to read it then refer to his charts Best GPUs by Relative Performance, Best GPUs by Relative Performance Per Dollar, and GPU Recommendation Chart.
Tim's post specifically covers loading LLM into GPU memory ( VRAM ). He does not get into loading LLM into shared memory like MacBook Pros. Generally as of the time of me writing this the only methods of loading LLM and having fast responses are using GPU or certain models of MacBook that have large amounts of memory. While we can load LLM with system memory ( RAM ) and run them on the systems CPU it'll be incredibly slow compared to the other two methods.