1.3.0. The Stack - SamuraiBarbi/jttw-ai GitHub Wiki
This installation of components is being performed on a Linux System. Installation of components and their interactions outside of a Linux environment I cannot guarantee will work as expected as I've not tested other environments. These are my system specs confirmed via neofetch.
Property | Value |
---|---|
OS | Linux Mint 21.2 x86_64 |
Host | MS-7B93 1.0 |
Kernel | 5.15.0-91-generic |
Terminal | gnome-terminal |
CPU | AMD Ryzen 9 3950X (32) @ 3.500GHz |
GPU | NVIDIA GeForce RTX 3090 |
GPU | NVIDIA GeForce GTX 1080 Ti |
Memory | 13740MiB / 128731MiB |
Preparation
Let's get our jttw space and environment setup completed and then we'll move on to install the components.
sudo apt-get install nala
sudo nala install python3.10-venv
sudo nala install python3-dev
sudo mkdir -p /opt/LLM/
sudo chown -R $(whoami):$(id -gn) /opt/LLM/
# Create the directory for our test files
mkdir -p /opt/LLM/jttw/tests
# Create the directory for our components
mkdir -p /opt/LLM/jttw/components/open-webui/
mkdir -p /opt/LLM/jttw/components/litellm/
# Create our LiteLLM models list config
touch /opt/LLM/jttw/components/litellm/llm_servers.yaml
# Create the directory for our JTTW python environment
mkdir -p /opt/LLM/jttw/jttw_venv
# Create our JTTW python environment
python3 -m venv /opt/LLM/jttw/jttw_venv
# Activate into our JTTW python environment, install/update pip, and clear python package cache
source /opt/LLM/jttw/jttw_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
Ollama
- What is Ollama?
- Ollama is a local large language model server.
- Installation, Running, Downloading Models, Testing
- Documentation
- Repository
- Discord
This implementation of Ollama is confirmed working with Ollama version 0.1.18 Results beyond version 0.1.18 of Ollama may vary. Confirmed version via ollama --version
Name | ollama |
---|---|
Version | 0.1.18 |
Install Ollama
While you can run Ollama as a docker I've found that the Ollama docker version does not work with LiteLLM since LiteLLM automatically serves Ollama rather than presenting Ollama. After determining this incompatibility was an issue between the docker version of Ollama and LiteLLM I decided to use the sh install method for Ollama instead. Your mileage may vary if you instead decide to use the Ollama docker version as well as the Ollama compatible LiteLLM docker.
Installing Ollama automatically installs it as a service. We're including OLLAMA_HOST=0.0.0.0:11434
to explicitly force Ollama to run on 11434. We make sure to kill any existing processes running on port 11434 - the default port for Ollama. This is to ensure that Ollama launches with the expected port.
# Kill processes running on port number 11434
lsof -ti :11434 | xargs -r kill
# Stop, disable, and delete any existing Ollama service
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /etc/systemd/system/ollama.service
# Delete any existing Ollama files, any existing Ollama user, and any existing Ollama group
sudo rm $(which ollama)
sudo rm -r /usr/share/ollama
sudo userdel -r ollama
sudo groupdel ollama
# Install Ollama
curl https://ollama.ai/install.sh | sh
# Stop and update Ollama service to run Ollama on port 11434
sudo systemctl stop ollama
sudo sed -i '/^Environment=/ {/OLLAMA_HOST=0\.0\.0\.0:11434/! s/$/ OLLAMA_HOST=0.0.0.0:11434/}' /etc/systemd/system/ollama.service
sudo sed -i '/^Environment=/ {/OLLAMA_FLASH_ATTENTION=1/! s/$/ OLLAMA_FLASH_ATTENTION=1/}' /etc/systemd/system/ollama.service
Next we're going to move the /usr/share/ollama
directory to /opt/LLM/jttw/components/ollama
but first we'll need to elevate the user privelages in order to do so.
# Let's first elevate privelages
sudo su
Now we'll move the /usr/share/ollama
directory to /opt/LLM/jttw/components/ollama
and create a symbolic link from /usr/share/ollama
directory to /opt/LLM/jttw/components/ollama
mv /usr/share/ollama /opt/LLM/jttw/components/ollama
ln -s /opt/LLM/jttw/components/ollama /usr/share/ollama
We nolonger need elevated privelages so we'll just exit that
exit
We're all set to reload the service with the changes we've made.
# Reload system services, enable and start Ollama service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
Running Ollama Manually
Installing Ollama automatically installs it as a service that starts when the machine boots up. If for some reason the service is disabled or stopped and we need to run Ollama manually then we'd proceed as follows.
We're going to include OLLAMA_HOST=0.0.0.0:11434
to explicitly force Ollama to run on 11434.
# Launch Ollama server on port 11434
OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0:11434 ollama serve
Download Ollama Models
We'll need to download the LLM now. There's various good models to choose from which are documented at https://ollama.ai/library. For the purposes of this project we'll be using a number of different models for different purposes
Models Trained for Chatting/General Assistance
# Download chat/general assistance trained language models for Ollama
ollama pull openhermes2.5-mistral
ollama pull dolphin2.2-mistral:7b-q6_K
ollama pull dolphin-mixtral
ollama pull eas/nous-capybara
ollama pull samantha-mistral
ollama pull gemma
ollama pull command-r
ollama pull mixtral
ollama pull mistral
ollama pull openchat
ollama pull qwen
ollama pull falcon
ollama pull wizard-vicuna-uncensored
ollama pull nous-hermes2-mixtral
ollama pull wizardlm-uncensored
ollama pull everythinglm
ollama pull yarn-mistral
ollama pull stablelm-zephyr
ollama pull deepseek-llm
ollama pull nexusraven
ollama pull alfred
ollama pull xwinlm
ollama pull wizardlm2:7b-q6_K
Models Trained for Coding/Programming
# Download coding/programming trained language models for Ollama
ollama pull codellama
ollama pull codeup
ollama pull deepseek-coder
ollama pull magicoder
ollama pull open-orca-platypus2
ollama pull phind-codellama
ollama pull starcoder
ollama pull wizardcoder
ollama pull codegemma
ollama pull nous-hermes2
ollama pull starcoder2
ollama pull dolphincoder
ollama pull codebooga
ollama pull codeqwen
Models Trained for SQL/Database Queries
# Download sql/database query trained language models for Ollama
ollama pull sqlcoder
ollama pull duckdb-nsql
Models Trained for Math/Calculations
# Download math/calculations trained language models for Ollama
ollama pull wizard-math
Models Trained for Image Analysis
# Download image analysis trained language models for Ollama
ollama pull llava
ollama pull bakllava
Models Trained for Medical Tasks
ollama pull medllama2
ollama pull meditron
Test Ollama
Now that we have Ollama running and serving at least one model we need to test to make sure we're it's working properly.
To test - in a new tab we're going to send a curl request to the Ollama server making sure to use one of the models we've downloaded in the "model":
portion of the request. Since I've downloaded openhermes2.5-mistral that is what I'm going to specify in the "model":
portion. It may take a moment but we should see activity in the tab where Ollama is running.
# Send test prompt via curl to Ollama endpoint serving openhermes2.5-mistral model on port 11434
curl -X POST -H "Content-Type: application/json" -d '{"model": "openhermes2.5-mistral", "prompt": "Why is the sky blue?"}' http://localhost:11434/api/generate
This test sent a prompt request asking "Why is the sky blue?" to our Ollama server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.
Open WebUI
- What is Open WebUI?
- Open WebUI is a ChatGPT-Style Web Interface for Ollama.
- Installation, Running, Testing
- Documentation
- Repository
- Discord
Install Open Web-UI
git clone https://github.com/open-webui/open-webui.git /opt/LLM/jttw/components/open-webui
mkdir -p /opt/LLM/jttw/components/open-webui/open-webui_venv
cd /opt/LLM/jttw/components/open-webui/
python3 -m venv /opt/LLM/jttw/components/open-webui/open-webui_venv
source /opt/LLM/jttw/components/open-webui/open-webui_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
cp -RPp .env.example .env
# Install NodeJS
sudo su
curl -fsSL https://deb.nodesource.com/setup_21.x | bash - &&\apt-get install -y nodejs
npm i
npm run build
exit
# Serving Frontend with the Backend
cd /opt/LLM/jttw/components/open-webui/backend
pip3 install -r requirements.txt -U
# Change the launch port from default 8080 to 11435
sed -i 's/PORT:-8080/PORT:-11435/g' /opt/LLM/jttw/components/open-webui/backend/start.sh
Running Open-WebUI as System Service
sudo nano /etc/systemd/system/open-webui.service
[Unit]
Description=Open WebUI Service
After=network.target
[Service]
Type=simple
User=owner
Group=owner
WorkingDirectory=/opt/LLM/jttw/components/open-webui/backend
ExecStart=/bin/bash -c ". /opt/LLM/jttw/components/open-webui/open-webui_venv/bin/activate && /bin/bash /opt/LLM/jttw/components/open-webui/backend/start.sh"
Restart=always
[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable open-webui
sudo systemctl restart open-webui
Running Open-WebUI Manually
If for some reason the service is disabled or stopped and we need to run Open-WebUI manually then we'd proceed as follows.
source /opt/LLM/jttw/components/open-webui/open-webui_venv/bin/activate
cd /opt/LLM/jttw/components/open-webui/backend
sh /opt/LLM/jttw/components/open-webui/backend/start.sh
Test Open-WebUI
We should now be able to access our Open-WebUI server by using http://0.0.0.0:11435 in any browser. We'll need to click the create account link when you first load it. The first account created is going to be the administrator by default.
LiteLLM
- What is LiteLLM?
- LiteLLM is an OpenAI proxy server.
- Installation, Running, Testing
- Documentation
- Repository
- Discord
This implementation of LiteLLM -> Ollama is confirmed working with LiteLLM version 1.14.1 Results beyond version 1.14.1 of LiteLLM may vary. Confirmed version via pip3 show litellm
Name | litellm |
---|---|
Version | 1.14.1 |
Summary | Library to easily interface with LLM API providers |
Home-page | |
Author | BerriAI |
Author-email | |
License | MIT |
Location | /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages |
Requires | aiohttp, appdirs, certifi, click, importlib-metadata, jinja2, openai, python-dotenv, tiktoken, tokenizers |
Required-by |
Install LiteLLM
We're installing specifically the LiteLLM 1.14.1 version of the LiteLLM python package because latest version 1.15.1 introduced a bug that broke Ollama requests. The fix should be out soon in the next release though. Before launching LiteLLM I'm going to make sure to kill any existing processes running on port 8000 - the default port for LiteLLM. This is to ensure that LiteLLM launches with the expected port.
## Kill processes running on port number 8000
lsof -ti :8000 | xargs -r kill
# Activate into our LiteLLM python environment
mkdir -p /opt/LLM/jttw/components/litellm/litellm_venv
cd /opt/LLM/jttw/components/litellm/
python3 -m venv /opt/LLM/jttw/components/litellm/litellm_venv
source /opt/LLM/jttw/components/litellm/litellm_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
# Install LiteLLM python package and the required Async Generator python package
pip3 install 'litellm[proxy]'
pip3 install async_generator
Configure LiteLLM Servers
nano /opt/LLM/jttw/components/litellm/update_llm_servers.sh
In the /opt/LLM/jttw/components/litellm/update_llm_servers.sh
file we'll place the following contents and save the file
#!/bin/bash
{
echo "model_list:"
ollama list | awk 'NR>1 {sub(/:latest$/, "", $1); print " - model_name:", $1, "\n litellm_params:\n model: ollama/"$1"\n api_base: http://0.0.0.0:11434"}'
} > /opt/LLM/jttw/components/litellm/llm_servers.yaml
Now we need to make our /opt/LLM/jttw/components/litellm/update_llm_servers.sh
file executable by setting the appropriate chmod
chmod +x /opt/LLM/jttw/components/litellm/update_llm_servers.sh
Running LiteLLM as System Service
We're automatically updating the models config file for LiteLLM any time the LiteLLM service starts, this way our available models is always up to date to be used. We achieve this by echo'ing the formatted output from ollama list
into our models config file /opt/LLM/jttw/components/litellm/llm_servers.yaml
.
sudo nano /etc/systemd/system/litellm.service
[Unit]
Description=LiteLLM Service
After=network.target
[Service]
Type=simple
User=owner
Group=owner
WorkingDirectory=/opt/LLM/jttw/components/litellm/
ExecStart=/bin/bash -c ". /opt/LLM/jttw/components/litellm/update_llm_servers.sh && . /opt/LLM/jttw/components/litellm/litellm_venv/bin/activate && litellm --config /opt/LLM/jttw/components/litellm/llm_servers.yaml --host 0.0.0.0 --port 8000 --debug --add_function_to_prompt --drop_params"
Restart=always
[Install]
WantedBy=default.target
Now we just need to reload the system daemon, enable, and restart the LiteLLM service.
sudo systemctl daemon-reload
sudo systemctl enable litellm
sudo systemctl restart litellm
Running LiteLLM Manually
We're automatically updating the models config file for LiteLLM before launching LiteLLM, this way our available models is always up to date to be used. We achieve this by echo'ing the formatted output from ollama list
into our models config file /opt/LLM/jttw/components/litellm/llm_servers.yaml
.
We are adding the --debug argument so that we can see detailed info about requests LiteLLM receives and it's responses back. We're also adding the --add_function_to_prompt
and --drop_params
arguments when executing LiteLLM because MemGPT makes heavy use of function calling and params which Ollama does not support and without these arguments LiteLLM will produce errors indicating that Ollama doesn't support the function and param calls MemGPT is sending. All of our models are being automatically aliased in /opt/LLM/jttw/components/litellm/llm_servers.yaml
allowing us to use the alias value when integrating with other software packages rather than needing to use the full --model
value - so for example ollama/openhermes2.5-mistral
is aliased and can be used as openhermes2.5-mistral
, ollama/llava
is aliased as llava
, so forth and so on.
# Activate into our LiteLLM python environment
source /opt/LLM/jttw/components/litellm/litellm_venv/bin/activate
# Update LiteLLM models config file and launch LiteLLM proxy server using Ollama serving the model openhermes2.5-mistral on port 11434.
cd source /opt/LLM/jttw/components/litellm
/opt/LLM/jttw/components/litellm/update_llm_servers.sh && litellm --config /opt/LLM/jttw/components/litellm/llm_servers.yaml --host 0.0.0.0 --port 8000 --debug --add_function_to_prompt --drop_params
Test LiteLLM
Now that we have LiteLLM running and serving as an OpenAI proxy for the Ollama endpoint we need to test to make sure we're it's working properly.
To test - in a new tab we're going to send a curl request to the LiteLLM server making sure to use one of the models we've downloaded in the "model":
portion of the request. Since I've downloaded openhermes2.5-mistral that is what I'm going to specify in the "model":
portion, however any models that we've downloaded can be used. It may take a moment but we should see activity in the tab where Ollama is running as well as tab where LiteLLM is running.
# Send test prompt via curl to LiteLLM proxy server on port 8000 using Ollama serving the model openhermes2.5-mistral
curl --location 'http://0.0.0.0:8000/chat/completions' --header 'Content-Type: application/json' --data '{"model": "openhermes2.5-mistral", "messages": [{"role": "user", "content": "why is the sky blue?"}]}'
This test sent a prompt request asking "Why is the sky blue?" to our LiteLLM server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.
Live Portrait
https://github.com/KwaiVGI/LivePortrait
Install Live Portrait
sudo nala install v4l2loopback-dkms
sudo modprobe v4l2loopback
git clone https://github.com/KwaiVGI/LivePortrait.git /opt/LLM/jttw/components/live-portrait
cd /opt/LLM/jttw/components/live-portrait/
conda create -p /opt/LLM/jttw/components/live-portrait/live-portrait_venv python=3.10
conda activate /opt/LLM/jttw/components/live-portrait/live-portrait_venv
conda install -p /opt/LLM/jttw/components/live-portrait/live-portrait_venv cudatoolkit cudnn
pip3 cache purge
pip3 install -r requirements.txt
Running Live Portrait
conda activate /opt/LLM/jttw/components/live-portrait/live-portrait_venv
cd /opt/LLM/jttw/components/live-portrait
python3 app.py
Once it's done you can access the interface via http://127.0.0.1:8890 by default.
Webcam Live Portrait
https://github.com/Mrkomiljon/Webcam_Live_Portrait
Install Webcam Live Portrait
sudo nala install v4l2loopback-dkms
sudo modprobe v4l2loopback
git clone https://github.com/Mrkomiljon/Webcam_Live_Portrait.git /opt/LLM/jttw/components/webcam-live-portrait
cd /opt/LLM/jttw/components/webcam-live-portrait/
conda create -p /opt/LLM/jttw/components/webcam-live-portrait/webcam-live-portrait_venv python=3.10
conda activate /opt/LLM/jttw/components/webcam-live-portrait/webcam-live-portrait_venv
conda install -p /opt/LLM/jttw/components/webcam-live-portrait/webcam-live-portrait_venv cudatoolkit cudnn
pip3 cache purge
pip3 install -r requirements.txt
Running Webcam Live Portrait
conda activate /opt/LLM/jttw/components/webcam-live-portrait/webcam-live-portrait_venv
cd /opt/LLM/jttw/components/webcam-live-portrait
python3 inference.py -s assets/examples/source/s9.jpg
Stream Diffusion
https://github.com/cumulo-autumn/StreamDiffusion
Install Stream Diffusion
sudo nala install v4l2loopback-dkms
sudo modprobe v4l2loopback
# Download the Stream Diffusion repository
git clone https://github.com/cumulo-autumn/StreamDiffusion.git /opt/LLM/jttw/components/stream-diffusion
cd /opt/LLM/jttw/components/stream-diffusion
conda create -p /opt/LLM/jttw/components/stream-diffusion/stream-diffusion_venv python=3.10
conda activate /opt/LLM/jttw/components/stream-diffusion/stream-diffusion_venv
conda install -p /opt/LLM/jttw/components/stream-diffusion/stream-diffusion_venv pytorch torchvision torchaudio pytorch-cuda -c pytorch -c nvidia
pip3 cache purge
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121
pip3 install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]
python3 -m streamdiffusion.tools.install-tensorrt
Running Stream Diffusion Real-Time Img2Img
conda activate /opt/LLM/jttw/components/stream-diffusion/stream-diffusion_venv
cd /opt/LLM/jttw/components/stream-diffusion/demo/realtime-img2img
pip3 cache purge
pip3 install -r requirements.txt
chmod +x start.sh
./start.sh
This will take a while to complete because it's building a lot of files. Once it's done you can access the interface via http://127.0.0.1:7860 by default.
Running Stream Diffusion Real-Time Txt2Img
conda activate /opt/LLM/jttw/components/stream-diffusion/stream-diffusion_venv
cd /opt/LLM/jttw/components/stream-diffusion/demo/realtime-txt2img
pip3 cache purge
pip3 install -r requirements.txt
chmod +x start.sh
./start.sh
Once it's done you can access the interface via http://127.0.0.1:9090/ by default.
Fooocus
Install Fooocus
# Download the Fooocus repository
git clone https://github.com/lllyasviel/Fooocus.git /opt/LLM/jttw/components/fooocus
# Activate into our Fooocus python environment
mkdir -p /opt/LLM/jttw/components/fooocus/fooocus_venv
cd /opt/LLM/jttw/components/fooocus/
python3 -m venv /opt/LLM/jttw/components/fooocus/fooocus_venv
source /opt/LLM/jttw/components/fooocus/fooocus_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
pip3 install -r requirements_versions.txt
Because we want all of our generative outputs placed in one easy place ( /opt/LLM/jttw/output/
) we're going to first move the output folder for Fooocus ( /opt/LLM/jttw/components/fooocus/outputs
), and then create a symbolic link in the old location pointing to the new location ( /opt/LLM/jttw/output/fooocus
).
mkdir -p /opt/LLM/jttw/components/fooocus/outputs
mkdir -p /opt/LLM/jttw/output/fooocus
mv /opt/LLM/jttw/components/fooocus/outputs /opt/LLM/jttw/output/fooocus
ln -s /opt/LLM/jttw/output/fooocus /opt/LLM/jttw/components/fooocus/outputs
Running Fooocus as System Service
sudo nano /etc/systemd/system/fooocus.service
[Unit]
Description=Fooocus Service
After=network-online.target
[Service]
Type=simple
User=owner
Group=owner
WorkingDirectory=/opt/LLM/jttw/components/fooocus
ExecStart=/usr/bin/bash -c ". /opt/LLM/jttw/components/fooocus/fooocus_venv/bin/activate && python3 entry_with_update.py --port 8888"
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable fooocus
sudo systemctl restart fooocus
Running Fooocus Manually
cd /opt/LLM/jttw/components/fooocus
source /opt/LLM/jttw/components/fooocus/fooocus_venv/bin/activate
python3 entry_with_update.py --port 8888
Stable Diffusion WebUI
Install Stable Diffusion WebUI
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git /opt/LLM/jttw/components/stable-diffusion-webui
mkdir -p /opt/LLM/jttw/components/stable-diffusion-webui
mkdir -p /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv
cd /opt/LLM/jttw/components/stable-diffusion-webui
python3 -m venv /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv
source /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
sudo apt-get install libgoogle-perftools4 libtcmalloc-minimal4 -y
bash webui.sh --server-name 0.0.0.0 --port 7860 --device-id 0 --api --listen --xformers --enable-insecure-extension-access
Because we want all of our generative outputs placed in one easy place ( /opt/LLM/jttw/output/
) we're going to first move the output folder for Stable Diffusion Webui ( /opt/LLM/jttw/components/stable-diffusion-webui/output
), and then create a symbolic link in the old location pointing to the new location ( /opt/LLM/jttw/output/stable-diffusion-webui
).
mv /opt/LLM/jttw/components/stable-diffusion-webui/output /opt/LLM/jttw/output/stable-diffusion-webui
ln -s /opt/LLM/jttw/output/stable-diffusion-webui /opt/LLM/jttw/components/stable-diffusion-webui/output
See the Stable Diffusion WebUI CLI documentation for more information about the CLI arguments/options.
Configuring Stable Diffusion WebUI
Notes For best results SD 1.5 Models keep CFG Scale at or below x, using X Sampling Method SDXL Turbo Models keep CFG Scale at or below 2, using either Euler A or LCM Sampling Method, about 25 sampling steps or higher results in higher detail SDXL Models keep CFG Scale below at or below 4, using either Euler A or LCM Sampling Method, about 25 sampling steps or higher results in higher detail
Extensions
- Stable-Diffusion-Webui-Civitai-Helper
- sd-perturbed-attention
- ultimate-upscale-for-automatic1111
- OneButtonPrompt
- sd-webui-infinite-image-browsing
- sd-webui-regional-prompter
- adetailer
- multidiffusion-upscaler-for-automatic1111
- sd-webui-reactor
- infinite-zoom-automatic1111-webui
- sd-webui-deforum
- sd-webui-controlnet
Models
- Juggernaut X
- DreamShaper XL
- ThinkDiffusionXL
- Starlight XL 星光 Animated
- ZavyChromaXL
- SDXL_Niji_Special Edition
- Crystal Clear XL
- epiCPhotoGasm
- AniMerge
- epiCRealism
- AbsoluteReality
- Realistic Freedom - SFW and NSFW
- AniVerse
- RealCartoon-Pixar
- ColorfulXL
Running Stable Diffusion WebUI as System Service
sudo nano /etc/systemd/system/stable-diffusion-webui.service
[Unit]
Description=Stable Diffusion WebUI Service
After=network-online.target
[Service]
Type=simple
User=owner
Group=owner
WorkingDirectory=/opt/LLM/jttw/components/stable-diffusion-webui
ExecStart=/usr/bin/bash -c ". /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv/bin/activate && /opt/LLM/jttw/components/stable-diffusion-webui/webui.sh --server-name 0.0.0.0 --device-id 0 --api --listen --xformers --enable-insecure-extension-access"
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable stable-diffusion-webui
sudo systemctl restart stable-diffusion-webui
Guidance
- What is Guidance?
- Guidance is a large language model prompt templating language to enforce consistant and predictable formatted responses.
- Installation, Testing
- Documentation
- Example 1, Example 2, Example 3, Example 4
- Repository
This implementation of Guidance -> LiteLLM -> Ollama is confirmed working with Guidance version 0.1.10 Results beyond version 0.1.10 of Guidance may vary. Confirmed version via pip3 show guidance
Name | guidance |
---|---|
Version | 0.1.10 |
Summary | A guidance language for controlling large language models. |
Home-page | https://github.com/guidance-ai/guidance |
Author | Scott Lundberg and Marco Tulio Ribeiro |
Author-email | [email protected] |
License | |
Location | /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages |
Requires | aiohttp, diskcache, gptcache, msal, numpy, openai, ordered-set, platformdirs, pyformlang, requests, tiktoken |
Required-by |
Install Guidance
# Activate into our Guidance python environment
mkdir -p /opt/LLM/jttw/components/guidance/guidance_venv
cd /opt/LLM/jttw/components/guidance/
python3 -m venv /opt/LLM/jttw/components/guidance/guidance_venv
source /opt/LLM/jttw/components/guidance/guidance_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
# Install Guidance python package ( this requires that the litellm[proxy] version 1.14.1 package is already installed )
pip3 install guidance==0.1.10
pip3 install 'litellm[proxy]'
Test Guidance
nano /opt/LLM/jttw/tests/guidance_litellm_test_001.py
Now that we have Guidance in place we need to test to make sure it's working properly. We'll create a python script in our /opt/LLM/jttw/tests/
directory and name the file guidance_litellm_test_001.py
. Fun fact, the LiteLLM model is not called in Guidance like other models traditionally would be - example models.OpenAI, but rather by one of three methods as detailed in the Guidance _lite_llm.py
class in their repo - LiteLLMChat
, LiteLLMInstruct
, or LiteLLMCompletion
. For this test we're going to use LiteLLMCompletion
. The contents of /opt/LLM/jttw/tests/guidance_litellm_test_001.py
will be the following
from guidance import models, gen
# Create our test prompt
prompt = """
why is the sky blue?
"""
# Create our configuration for the LiteLLM endpoint. API Key is required but the value can be anything.
# Guidance explicitly requires the exact model that was supplied in the LiteLLM --model argument, you cannot use the LiteLLM --alias value
model_endpoint = models.LiteLLMCompletion(
"ollama/openhermes2.5-mistral",
temperature=0.8,
api_base="http://0.0.0.0:8000"
)
# Initiate our model endpoint, append our test prompt, and generate a response
lm = model_endpoint
lm += prompt
lm += gen()
# Convert the response to a string value and print it so we can read it
print(str(lm))
Next we'll enter our python environment for JTTW and run the python test script. If everything is working properly we should be having output from our LLM being served from Guidance through LiteLLM to Ollama.
# Activate into our Guidance python environment
source /opt/LLM/jttw/components/guidance/guidance_venv/bin/activate
# Run our Guidance-to-LiteLLM test script
python3 /opt/LLM/jttw/tests/guidance_litellm_test_001.py
This test sent a prompt request asking "Why is the sky blue?" via Guidance to our LiteLLM server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.
MemGPT
- What is MemGPT?
- MemGPT is a memory manager for LLM that facilitates the ability to recall/remember information that well exceeds typical context length limits.
- Installation, Configuring, Testing
- Documentation
- Repository
- Discord
This implementation of MemGPT -> LiteLLM -> Ollama is confirmed working with MemGPT version 0.3.3. Results beyond version 0.3.3 of MemGPT may vary. Confirmed version via pip3 show pymemgpt
Name | pymemgpt |
---|---|
Version | 0.3.3 |
Summary | Teaching LLMs memory management for unbounded context |
Home-page | |
Author | Charles Packer |
Author-email | [email protected] |
License | Apache License |
Location | /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages |
Requires | chromadb, demjson3, docstring-parser, docx2txt, html2text, httpx, lancedb, llama-index, numpy, prettytable, pydantic, pypdf, python-box, pytz, pyyaml, questionary, setuptools, tiktoken, tqdm, typer |
Required-by |
Install MemGPT
# Activate into our MemGPT python environment
mkdir -p /opt/LLM/jttw/components/memgpt/memgpt_venv
cd /opt/LLM/jttw/components/memgpt/
python3 -m venv /opt/LLM/jttw/components/memgpt/memgpt_venv
source /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
# Install MemGPT local python package and the required Transformers and PyTorch python packages
pip3 install 'pymemgpt==0.3.3'
pip3 install transformers
pip3 install torch
pip3 install 'pymemgpt[local]==0.3.3'
Configure MemGPT
MemGPT needs the OPENAI_API_BASE set to our LiteLLM Proxy endpoint, and the OPENAI_API_KEY environment variable to be set and have a value. It doesn't matter what the value is for OPENAI_API_KEY, it just can't be blank.
# Set OPENAI_API_BASE to our LiteLLM Proxy endpoint and OPENAI_API_KEY environment variable to "key-to-success"
export OPENAI_API_BASE=http://0.0.0.0:8000
export OPENAI_API_KEY=key-to-success
Next we need to configure MemGPT in a way that it will work specifically with sending requests through LiteLLM to Ollama, and process responses back properly. In order to do this we need to make sure that the LLM inference provider is local, the LLM backend is webui, and that we are pointing the default endpoint to our LitelLLM endpoint at http://0.0.0.0:8000. If everything has been done correctly, after finishing the configuration process we should see Saving config to the memgpt config path in terminal.
# Activate into our MemGPT python environment
source /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate
# Configure MemGPT
memgpt configure
We'll want to make sure we answer the configuration questions as follows
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): vllm
? Enter default endpoint: http://localhost:8000
? Select default model: dolphin2.2-mistral:7b-q6_K
? Select default model wrapper (recommended: chatml): chatml-noforce-roles-grammar
? Is your LLM endpoint authenticated? (default no) No
? Select your model's context window (for Mistral 7B models, this is probably 8k / 8192): 8192
? Select embedding provider: local
? Select default preset: memgpt_chat
? Select default persona: sam_pov
? Select default human: basic
? Select storage backend for archival data: chroma
? Select chroma backend: persistent
? Select storage backend for recall data: sqlite
UPDATED NOTE: Upon further testing I've found that selecting any of the following combinations in MemGPT configure will work with MemGPT -> LiteLLM -> Ollama.
- LLM intererence local, LLM backend vllm
- LLM intererence local, LLM backend lmstudio
Running MemGPT as System Service
MemGPT can be set up to run as a server on port 8283 by default with callable API. We're going to set up a MemGPT server to run as a system service that starts up automatically and restarts should it crash.
sudo nano /etc/systemd/system/memgpt.service
In our /etc/systemd/system/memgpt.service
file we'll place the following contents and save the file
[Unit]
Description=MemGPT Service
After=network.target
[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/memgpt/
ExecStart=/bin/bash -c ". /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate && memgpt server"
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable memgpt
sudo systemctl restart memgpt
Test MemGPT
Now that we have MemGPT running and sending requests to the LiteLLM endpoint we need to test to make sure it's working properly. We'll go through and have a conversation with a MemGPT LLM agent being served from MemGPT through LiteLLM to Ollama
# Activate into our MemGPT python environment
source /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate
# Launch MemGPT using Ollama serving the model openhermes2.5-mistral via LiteLLM proxy on port 8000
memgpt run --model-endpoint http://0.0.0.0:8000 --model dolphin2.2-mistral:7b-q6_K --model-endpoint-type vllm --debug
This test should brings up a MemGPT LLM agent for us to converse with. After asking the agent several questions we should have received responses back with no errors in terminal. We cam also test our MemGPT server API via curl request to the send agent send message endpoint.
Note: Remove /api from the endpoint url. The documentation needs to be updated but the Discord channel indicated this.
Also user_id
is needed. This is how memgpt will remember/recall your conversations - by referencing the user_id
value that is passed in the arguments.
curl --request POST --url http://0.0.0.0:8283/agents/message --header 'accept: application/json' --header 'content-type: application/json' --data '{"user_id": "throwaway_user", "agent_id": "agent_1", "message": "why is the sky blue?", "stream": false, "role": "user"}'
This test sent a prompt request asking "Why is the sky blue?" via MemGPT to our LiteLLM server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.
AutoGen
- What is AutoGen?
- AutoGen is a multi-agent framework for directing multiple LLM to work together to complete tasks given to them as a group.
- Installation, Testing
- Documentation
- Example 1
- Repository
- Discord
This implementation of AutoGen -> LiteLLM -> Ollama is confirmed working with AutoGen version 0.2.8. Results beyond version 0.2.8 of AutoGen may vary. Confirmed version via pip3 show pyautogen
Name | pyautogen |
---|---|
Version | 0.2.8 |
Summary | Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework |
Home-page | https://github.com/microsoft/autogen |
Author | AutoGen |
Author-email | [email protected] |
License | UNKNOWN |
Location | /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages |
Requires | diskcache, docker, flaml, openai, pydantic, python-dotenv, termcolor, tiktoken |
Required-by | autogenstudio |
Install AutoGen
# Activate into our AutoGen python environment
mkdir -p /opt/LLM/jttw/components/autogen/autogen_venv
cd /opt/LLM/jttw/components/autogen/
python3 -m venv /opt/LLM/jttw/components/autogen/autogen_venv
source /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
# Install AutoGen python package
pip3 install pyautogen
pip3 install autogenstudio
Running AutoGen Studio as System Service
AutoGen Studio is a web interface that we can use to manage autogen agents, skills/tools for those agents, workflows, and sessions on port 8081. We're going to set up a AutoGen Studio to run as a system service that starts up automatically and restarts should it crash.
sudo nano /etc/systemd/system/autogenstudio.service
In our /etc/systemd/system/autogenstudio.service
file we'll place the following contents and save the file
[Unit]
Description=AutoGen Studio Service
After=network.target
[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/autogen/
ExecStart=/bin/bash -c ". /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate && autogenstudio ui --port 8081"
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable autogenstudio
sudo systemctl restart autogenstudio
Running AutoGen Studio Manually
If for some reason the service is disabled or stopped and we need to run AutoGen Studio manually then we'd proceed as follows.
source /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate
autogenstudio ui --port 8081
Test AutoGen
nano /opt/LLM/jttw/tests/autogen_litellm_test_001.py
Now that we have AutoGen in place we need to test to make sure it's working properly. We'll create a python script in our /opt/LLM/jttw/tests/
directory and name the file autogen_litellm_test_001.py
. The contents of /opt/LLM/jttw/tests/autogen_litellm_test_001.py
will be the following
from autogen import UserProxyAgent, ConversableAgent
# Create our configuration for the LiteLLM endpoint. API Key is required but the value can be anything.
config_list = [
{
"base_url": "http://0.0.0.0:8000",
"api_key": "key-to-success",
"model": "openhermes2.5-mistral"
}
]
# Create the agent that uses the LLM.
assistant = ConversableAgent(
"agent",
llm_config = {
"config_list": config_list
}
)
# Create the agent that represents the user in the conversation.
user_proxy = UserProxyAgent(
"user",
code_execution_config = False
)
# Let the assistant start the conversation. It will end when the user types exit.
assistant.initiate_chat(
user_proxy,
message = "How can I help you today?"
)
Next we'll enter our python environment for JTTW and run the python test script. If everything is working properly we should be having a conversation with a AutoGen LLM agent being served from AutoGen through LiteLLM to Ollama.
# Activate into our JTTW python environment
source /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate
# Run our AutoGen-to-LiteLLM test script
python3 /opt/LLM/jttw/tests/autogen_litellm_test_001.py
This test should brings up an AutoGen LLM agent for us to converse with. After asking the agent several questions we should have received responses back with no errors in terminal.
PromptFoo
- What is PromptFoo?
- PromptFoo is a unit testing and evaluation tool for prompts allowing for a better understand and insight into prompts that consistantly behave as expected, and those that need refactored.
- Installation
- Documentation
- Repository
- Discord
Install PromptFoo
mkdir -p /opt/LLM/jttw/components/promptfoo
mkdir -p /opt/LLM/jttw/components/promptfoo/promptfoo_venv
cd /opt/LLM/jttw/components/promptfoo
python3 -m venv /opt/LLM/jttw/components/promptfoo/promptfoo_venv
source /opt/LLM/jttw/components/promptfoo/promptfoo_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
npm install -g promptfoo
promptfoo init
This will create a file called configuration file promptfooconfig.yaml
in /opt/LLM/jttw/components/promptfoo
which you can use to define the prompts you want to evaluate and their inputs.
Running PromptFoo as System Service
PromptFoo can be utilized with a web interface making it much easier to manage and review model evaluations. We're going to set PromptFoo to launch in web ui mode as a system service that starts up automatically and restarts should it crash.
sudo nano /etc/systemd/system/promptfoo.service
In our /etc/systemd/system/promptfoo.service
file we'll place the following contents and save the file. We'll need to make sure to set Environment=PATH=
to the output from echo $PATH
which gives us the value of out PATH environment variable, and ExecStart=
to the output of echo $(command -v node);
which gives us the path of our node application, followed by a space and the output of echo $(command -v promptfoo);
which gives us the path of our promptfoo application.
Promptfoo has the ability to cache evaluation results for the purposes of saving time and cost with paid service providers however we want fresh results each time we evaluate so we'll turn off PromptFoo caching by explicitely specifying Environment=PROMPTFOO_CACHE_ENABLED=false
[Unit]
Description=PromptFoo Service
After=network-online.target
[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/promptfoo
Environment=PATH=<output of echo $PATH>
Environment=PROMPTFOO_CACHE_ENABLED=false
ExecStart=<output of echo $(command -v node);> <output of echo $(command -v promptfoo);> view
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable promptfoo
sudo systemctl restart promptfoo
Running PromptFoo Manually
If for some reason the service is disabled or stopped and we need to run PromptFoo manually then we'd proceed as follows. The output of echo $(command -v node); gives us the path of our node application, followed by a space and the output of echo $(command -v promptfoo); gives us the path of our promptfoo application.
cd /opt/LLM/jttw/components/promptfoo
promptfoo view
Test PromptFoo
PromptFoo web ui uses port 15500 by default. To access the web ui and test we'll open http://0.0.0.0:15500 in a browser. We'll want to
Zrok
- What is Zrok?
- Zrok is a zero trust networking platform that allows private or public, instant, secure tunneling of applications from anywhere
- Installation, Testing
- Documentation
- Repository
Zrok can be utilized to make our services and endoints each individually public facing - when configured a particular way. It should be noted that Zrok creates a folder located in /var/lib/private/
for each share that is created. If at any point after the initialization/creation of a share there are changes made the the .env file for that share, the service for that share needs to be stopped, the folder for the share in /var/lib/private/
needs to be deleted, and the api reference for the share needs to be deleted in order the changes to take affect.
Install Zrok
For this we're going to use the Zrok frontdoor feature.
# Install Zrok
curl -sSLf https://get.openziti.io/install.bash \
| sudo bash -s zrok-share
Now we need to update the zrok-share.env
file with our Zrok environemnt token for our account. All of our zrok environment files are stores in /opt/openziti/etc/zrok/
by default.
sudo nano /opt/openziti/etc/zrok/zrok-share.env
If we've already created a zrok account we can find this token by logging in and navigating to the api page, clicking our email address in the top right hand corner or the page, clicking the 'Enable Environment' option, and copying the alpha numeric code that appears in the pop up window. So for example if the pop up displays 'zrok enable q8KVXTMNEp7T' we will want to copy 'q8KVXTMNEp7T'. Set the value of ZROK_UNIQUE_NAME
to q8KVXTMNEp7T
bash
ZROK_UNIQUE_NAME="<YOUR ZROK ENVIRONMENT TOKEN>"
Running Zrok Proxy to Ollama as System Service
We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-ollama.env
which our Zrok Ollama share is going to use for our Zrok webtunnel proxy to Ollama endpoint.
sudo nano /opt/openziti/etc/zrok/zrok-share-ollama.env
In the /opt/openziti/etc/zrok/zrok-share-ollama.env
file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN
and ZROK_UNIQUE_NAME
to our own specific values.
ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="ollama"
ZROK_TARGET="http://0.0.0.0:11434"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the Ollama share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"
Next we'll create the Zrok Ollama service file /etc/systemd/system/zrok-share-ollama.service
.
sudo nano /etc/systemd/system/zrok-share-ollama.service
In our /etc/systemd/system/zrok-share-ollama.service
file we'll put the following contents.
[Unit]
Description=Zrok Ollama reserved public share service
After=network-online.target
[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-ollama
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-ollama.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-ollama.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
We're going to stop, and disable any currently existing zrok-share-ollama service, reload the service daemon, and then enable and start the zrok-share-ollama service for our change to the system services to take affect.
sudo systemctl stop zrok-share-ollama
sudo systemctl disable zrok-share-ollama
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-ollama
sudo systemctl restart zrok-share-ollama
Running Zrok Proxy to Open WebUI as System Service
We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-open-webui.env
which our Zrok Open WebUI share is going to use for our Zrok webtunnel proxy to Open WebUI endpoint.
sudo nano /opt/openziti/etc/zrok/zrok-share-open-webui.env
In the /opt/openziti/etc/zrok/zrok-share-open-webui.env
file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN
and ZROK_UNIQUE_NAME
to our own specific values.
ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="open-webui"
ZROK_TARGET="http://0.0.0.0:11435"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the Open WebUI share>"
ZROK_BACKEND_MODE="proxy"
Next we'll create the Zrok Open WebUI service file /etc/systemd/system/zrok-share-open-webui.service
.
sudo nano /etc/systemd/system/zrok-share-open-webui.service
In our /etc/systemd/system/zrok-share-open-webui.service
file we'll put the following contents.
[Unit]
Description=Zrok Open-WebUI reserved public share service
After=network-online.target
[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-open-webui
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-open-webui.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-open-webui.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
We're going to stop, and disable any currently existing zrok-share-open-webui service, reload the service daemon, and then enable and start the zrok-share-open-webui service for our change to the system services to take affect.
sudo systemctl stop zrok-share-open-webui
sudo systemctl disable zrok-share-open-webui
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-open-webui
sudo systemctl restart zrok-share-open-webui
Running Zrok Proxy to LiteLLM as System Service
We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-litellm.env
which our Zrok LiteLLM share is going to use for our Zrok webtunnel proxy to LiteLLM endpoint.
sudo nano /opt/openziti/etc/zrok/zrok-share-litellm.env
In the /opt/openziti/etc/zrok/zrok-share-litellm.env
file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN
and ZROK_UNIQUE_NAME
to our own specific values.
ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="litellm"
ZROK_TARGET="http://0.0.0.0:8000"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the LiteLLM share>"
ZROK_BACKEND_MODE="proxy"
Next we'll create the Zrok LiteLLM service file /etc/systemd/system/zrok-share-litellm.service
.
sudo nano /etc/systemd/system/zrok-share-litellm.service
In our /etc/systemd/system/zrok-share-litellm.service
file we'll put the following contents.
[Unit]
Description=Zrok LiteLLM reserved public share service
After=network-online.target
[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-litellm
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-litellm.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-litellm.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
We're going to stop, and disable any currently existing zrok-share-litellm service, reload the service daemon, and then enable and start the zrok-share-litellm service for our change to the system services to take affect.
sudo systemctl stop zrok-share-litellm
sudo systemctl disable zrok-share-litellm
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-litellm
sudo systemctl restart zrok-share-litellm
Running Zrok Proxy to MemGPT as System Service
We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-memgpt.env
which our Zrok MemGPT share is going to use for our Zrok webtunnel proxy to MemGPT endpoint.
sudo nano /opt/openziti/etc/zrok/zrok-share-memgpt.env
In the /opt/openziti/etc/zrok/zrok-share-memgpt.env
file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN
and ZROK_UNIQUE_NAME
to our own specific values.
ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="memgpt"
ZROK_TARGET="http://0.0.0.0:8283"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the MemGPT share>"
ZROK_BACKEND_MODE="proxy"
Next we'll create the Zrok MemGPT service file /etc/systemd/system/zrok-share-memgpt.service
.
sudo nano /etc/systemd/system/zrok-share-memgpt.service
In our /etc/systemd/system/zrok-share-memgpt.service
file we'll put the following contents.
[Unit]
Description=Zrok MemGPT reserved public share service
After=network-online.target
[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-memgpt
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-memgpt.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-memgpt.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
We're going to stop, and disable any currently existing zrok-share-memgpt service, reload the service daemon, and then enable and start the zrok-share-memgpt service for our change to the system services to take affect.
sudo systemctl stop zrok-share-memgpt
sudo systemctl disable zrok-share-memgpt
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-memgpt
sudo systemctl restart zrok-share-memgpt
Running Zrok Proxy to AutoGen Studio as System Service
We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-autogenstudio.env
which our Zrok AutoGen Studio share is going to use for our Zrok webtunnel proxy to AutoGen Studio endpoint.
sudo nano /opt/openziti/etc/zrok/zrok-share-autogenstudio.env
In the /opt/openziti/etc/zrok/zrok-share-autogenstudio.env
file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN
and ZROK_UNIQUE_NAME
to our own specific values. We're also going to add basic authentication with ZROK_BASIC_AUTH
formatted like user:pass
to prevent unauthorized access. Authentication credentials will be requested when attempting to access the AutoGen Studio share.
ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="autogenstudio"
ZROK_TARGET="http://0.0.0.0:8081"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the AutoGen Studio share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"
Next we'll create the Zrok AutoGen Studio service file /etc/systemd/system/zrok-share-autogenstudio.service
.
sudo nano /etc/systemd/system/zrok-share-autogenstudio.service
In our /etc/systemd/system/zrok-share-autogenstudio.service
file we'll put the following contents.
[Unit]
Description=Zrok AutoGen Studio reserved public share service
After=network-online.target
[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-autogenstudio
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-autogenstudio.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-autogenstudio.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
We're going to stop, and disable any currently existing zrok-share-autogenstudio service, reload the service daemon, and then enable and start the zrok-share-autogenstudio service for our change to the system services to take affect.
sudo systemctl stop zrok-share-autogenstudio
sudo systemctl disable zrok-share-autogenstudio
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-autogenstudio
sudo systemctl restart zrok-share-autogenstudio
Running Zrok Proxy to PromptFoo as System Service
We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-promptfoo.env
which our Zrok PromptFoo share is going to use for our Zrok webtunnel proxy to PromptFoo endpoint.
sudo nano /opt/openziti/etc/zrok/zrok-share-promptfoo.env
In the /opt/openziti/etc/zrok/zrok-share-promptfoo.env
file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN
and ZROK_UNIQUE_NAME
to our own specific values. We're also going to add basic authentication with ZROK_BASIC_AUTH
formatted like user:pass
to prevent unauthorized access. Authentication credentials will be requested when attempting to access the PromptFoo share.
ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="promptfoo"
ZROK_TARGET="http://0.0.0.0:15500"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the PromptFoo share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"
Next we'll create the Zrok PromptFoo service file /etc/systemd/system/zrok-share-promptfoo.service
.
sudo nano /etc/systemd/system/zrok-share-promptfoo.service
In our /etc/systemd/system/zrok-share-promptfoo.service
file we'll put the following contents.
[Unit]
Description=Zrok PromptFoo reserved public share service
After=network-online.target
[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-promptfoo
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-promptfoo.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-promptfoo.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
We're going to stop, and disable any currently existing zrok-share-promptfoo service, reload the service daemon, and then enable and start the zrok-share-promptfoo service for our change to the system services to take affect.
sudo systemctl stop zrok-share-promptfoo
sudo systemctl disable zrok-share-promptfoo
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-promptfoo
sudo systemctl restart zrok-share-promptfoo
Running Zrok Proxy to Stable Diffusion WebUI as System Service
We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env
which our Zrok Stable Diffusion WebUI share is going to use for our Zrok webtunnel proxy to Stable Diffusion WebUI endpoint.
sudo nano /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env
In the /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env
file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN
and ZROK_UNIQUE_NAME
to our own specific values. We're also going to add basic authentication with ZROK_BASIC_AUTH
formatted like user:pass
to prevent unauthorized access. Authentication credentials will be requested when attempting to access the Stable Diffusion WebUI share.
ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="stable-diffusion-webui"
ZROK_TARGET="http://0.0.0.0:8081"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the Stable Diffusion WebUI share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"
Next we'll create the Zrok Stable Diffusion WebUI service file /etc/systemd/system/zrok-share-stable-diffusion-webui.service
.
sudo nano /etc/systemd/system/zrok-share-stable-diffusion-webui.service
In our /etc/systemd/system/zrok-share-stable-diffusion-webui.service
file we'll put the following contents.
[Unit]
Description=Zrok Stable Diffusion WebUI reserved public share service
After=network-online.target
[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-stable-diffusion-webui
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
We're going to stop, and disable any currently existing zrok-share-stable-diffusion-webui service, reload the service daemon, and then enable and start the zrok-share-stable-diffusion-webui service for our change to the system services to take affect.
sudo systemctl stop zrok-share-stable-diffusion-webui
sudo systemctl disable zrok-share-stable-diffusion-webui
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-stable-diffusion-webui
sudo systemctl restart zrok-share-stable-diffusion-webui
Auto-Restart Zrok Proxies
crontab -e
# Restart each of our Zrok Proxies at 3:00 AM
0 3 * * * systemctl restart zrok-share-ollama
0 3 * * * systemctl restart zrok-share-open-webui
0 3 * * * systemctl restart zrok-share-litellm
0 3 * * * systemctl restart zrok-share-memgpt
0 3 * * * systemctl restart zrok-share-autogenstudio
0 3 * * * systemctl restart zrok-share-promptfoo
0 3 * * * systemctl restart zrok-share-stable-diffusion-webui
Additional Notes
For Those Using WSL 2
If we were using a WSL 2 instance as our Linux environment and we wanted to make the Ollama and/or LiteLLM end points available to network clients/other machines on our network then we'd need to open the ports that are being used for Ollama and LiteLLM and then configure a proxy to forward incoming requests from the network to the WSL 2 instance.
This can be done by running the following commands in Windows Powershell as an administrator. In the following example my Windows host machine's local area network IP is 192.168.2.2 and the WSL 2 instances IP is 172.22.74.100. You would want to change the IP's used in this example to those corresponding to the appropriate host and WSL 2 in your own set up.
netsh advfirewall firewall add rule name="Ollama LLM Server Allow Port 11434" dir=in action=allow protocol=TCP localport=11434
netsh interface portproxy add v4tov4 listenaddress=192.168.2.2 listenport=11434 connectaddress=172.22.74.100 connectport=11434
netsh advfirewall firewall add rule name="LiteLLM Proxy Server Allow Port 8000" dir=in action=allow protocol=TCP localport=8000
netsh interface portproxy add v4tov4 listenaddress=192.168.2.2 listenport=8000 connectaddress=172.22.74.100 connectport=8000
To verify that we're able to successfully send a request to the WSL 2 instance from another network client/other machine we'll use the following in terminal/cmd
curl --location "http://192.168.2.2:8000/chat/completions" --header "Content-Type: application/json" --data "{\"model\": \"openhermes2.5-mistral\", \"messages\": [{\"role\": \"user\", \"content\": \"why is the sky blue?\"}]}"
Langchain
- What is Langchain?
- Langchain is a framework for building applications using LLM implementing context awareness and reasoning.
- Installation, Testing
- Documentation
- Repository
- Discord
This implementation of Langchain -> Ollama is confirmed working with Langchain version 0.0.352 . Results beyond version 0.0.352 of Langchain may vary. Confirmed version via pip3 show langchain
Name | langchain |
---|---|
Version | 0.0.352 |
Summary | Building applications with LLMs through composability |
Home-page | https://github.com/langchain-ai/langchain |
Author | |
Author-email | |
License | MIT |
Location | /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages |
Requires | aiohttp, async-timeout, dataclasses-json, jsonpatch, langchain-community, langchain-core, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity |
Required-by | langchain-experimental |
Install Langchain
# Activate into our JTTW python environment
source /opt/LLM/jttw/jttw_venv/bin/activate
# Install Langchain python package and the additional packages Playwright, and Beautiful Soup for web scraping
pip3 install langchain
pip3 install langchain-cli
pip3 install langchain-experimental
pip3 install playwright
pip3 install beautifulsoup4
# Install Playwright
playwright install
Test Langchain
Now that we have Langchain in place we need to test to make sure it's working properly. We'll create a python script in our /opt/LLM/jttw/tests/
directory and name the file langchain_ollama_test_001.py
.
The contents of /opt/LLM/jttw/tests/langchain_ollama_test_001.py
will be the following
from langchain.chat_models import ChatLiteLLM
from langchain.schema import HumanMessage
chat = ChatLiteLLM(
api_base="http://0.0.0.0:11434",
model="ollama/openhermes2.5-mistral",
)
messages = [
HumanMessage(
content="why is the sky blue?"
)
]
response = chat(messages)
print(response)
Next we'll enter our python environment for JTTW and run the python test script. If everything is working properly we should be having output from our LLM being served from Langchain to Ollama.
# Activate into our JTTW python environment
source /opt/LLM/jttw/jttw_venv/bin/activate
# Run our Langchain-to-Ollama test script
python3 /opt/LLM/jttw/tests/langchain_ollama_test_001.py
This test sent a prompt request asking "Why is the sky blue?" via Langchain to our Ollama server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.
-
Aider
- What is Aider?
- Aider is command line tool for instructing LLM to create, update, revise, improve, and document/comment code for local git repositories
- Installation
- Documentation
- Respository
- What is Aider?
-
Sweep
-
Llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git /opt/LLM/jttw/components/llama.cpp
mkdir -p /opt/LLM/jttw/components/llama.cpp/llama.cpp_venv
cd /opt/LLM/jttw/components/llama.cpp/
python3 -m venv /opt/LLM/jttw/components/llama.cpp/llama.cpp_venv
source /opt/LLM/jttw/components/llama.cpp/llama.cpp_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
pip3 install -r requirements.txt
-
TTS-WebUI
git clone https://github.com/rsxdalv/one-click-installers-tts.git /opt/LLM/jttw/components/tts-webui
mkdir -p /opt/LLM/jttw/components/tts-webui/tts-webui_venv
cd /opt/LLM/jttw/components/tts-webui/
python3 -m venv /opt/LLM/jttw/components/tts-webui/tts-webui_venv
source /opt/LLM/jttw/components/tts-webui/tts-webui_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
chmod +x /opt/LLM/jttw/components/tts-webui/start_linux.sh
./start_linux.sh
-
Rope
git clone https://github.com/Hillobar/Rope /opt/LLM/jttw/components/rope
mkdir -p /opt/LLM/jttw/components/rope/rope_venv
cd /opt/LLM/jttw/components/rope/
python3 -m venv /opt/LLM/jttw/components/rope/rope_venv
source /opt/LLM/jttw/components/rope/rope_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
pip3 install -r requirements.txt
-
DSPy
cd /opt/LLM/jttw
source /opt/LLM/jttw/jttw_venv/bin/activate
pip3 install git+https://github.com/stanfordnlp/dspy.git
python3 -m pip install --upgrade pip
pip3 cache purge
-
Neo4j
sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo nala update
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.com stable latest' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo nala update
sudo nala install neo4j
sudo nala start neo4j
-
WhisperX
git clone https://github.com/m-bain/whisperX /opt/LLM/jttw/components/whisperx
cd /opt/LLM/jttw/components/whisperx/
python3 -m venv /opt/LLM/jttw/components/whisperx/whisperx_venv
source /opt/LLM/jttw/components/whisperx/whisperx_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
pip3 install -e .
Will have to log into hugging face and visit https://hf.co/pyannote/segmentation-3.0, https://huggingface.co/pyannote/voice-activity-detection, and https://huggingface.co/pyannote/speaker-diarization to accept the user conditions.
whisperx --model large-v2 --language en --vad_onset 0.10 --vad_offset 0.05 "<path to audio file>"