1.3.0. The Stack - SamuraiBarbi/jttw-ai GitHub Wiki

This installation of components is being performed on a Linux System. Installation of components and their interactions outside of a Linux environment I cannot guarantee will work as expected as I've not tested other environments. These are my system specs confirmed via neofetch.

Property Value
OS Linux Mint 21.2 x86_64
Host MS-7B93 1.0
Kernel 5.15.0-91-generic
Terminal gnome-terminal
CPU AMD Ryzen 9 3950X (32) @ 3.500GHz
GPU NVIDIA GeForce RTX 3090
GPU NVIDIA GeForce GTX 1080 Ti
Memory 13740MiB / 128731MiB

Preparation

Let's get our jttw space and environment setup completed and then we'll move on to install the components.

sudo apt-get install nala
sudo nala install python3.10-venv
sudo nala install python3-dev

sudo mkdir -p /opt/LLM/
sudo chown -R $(whoami):$(id -gn) /opt/LLM/

# Create the directory for our test files
mkdir -p /opt/LLM/jttw/tests

# Create the directory for our components
mkdir -p /opt/LLM/jttw/components/open-webui/
mkdir -p /opt/LLM/jttw/components/litellm/

# Create our LiteLLM models list config
touch /opt/LLM/jttw/components/litellm/llm_servers.yaml

# Create the directory for our JTTW python environment
mkdir -p /opt/LLM/jttw/jttw_venv

# Create our JTTW python environment
python3 -m venv /opt/LLM/jttw/jttw_venv

# Activate into our JTTW python environment, install/update pip, and clear python package cache
source /opt/LLM/jttw/jttw_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge

Ollama

This implementation of Ollama is confirmed working with Ollama version 0.1.18 Results beyond version 0.1.18 of Ollama may vary. Confirmed version via ollama --version

Name ollama
Version 0.1.18

Install Ollama

While you can run Ollama as a docker I've found that the Ollama docker version does not work with LiteLLM since LiteLLM automatically serves Ollama rather than presenting Ollama. After determining this incompatibility was an issue between the docker version of Ollama and LiteLLM I decided to use the sh install method for Ollama instead. Your mileage may vary if you instead decide to use the Ollama docker version as well as the Ollama compatible LiteLLM docker.

Installing Ollama automatically installs it as a service. We're including OLLAMA_HOST=0.0.0.0:11434 to explicitly force Ollama to run on 11434. We make sure to kill any existing processes running on port 11434 - the default port for Ollama. This is to ensure that Ollama launches with the expected port.

# Kill processes running on port number 11434
lsof -ti :11434 | xargs -r kill

# Stop, disable, and delete any existing Ollama service
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /etc/systemd/system/ollama.service

# Delete any existing Ollama files, any existing Ollama user, and any existing Ollama group
sudo rm $(which ollama)
sudo rm -r /usr/share/ollama
sudo userdel -r ollama
sudo groupdel ollama

# Install Ollama 
curl https://ollama.ai/install.sh | sh
 
# Stop and update Ollama service to run Ollama on port 11434
sudo systemctl stop ollama
sudo sed -i '/^Environment=/ {/OLLAMA_HOST=0\.0\.0\.0:11434/! s/$/ OLLAMA_HOST=0.0.0.0:11434/}' /etc/systemd/system/ollama.service
sudo sed -i '/^Environment=/ {/OLLAMA_FLASH_ATTENTION=1/! s/$/ OLLAMA_FLASH_ATTENTION=1/}' /etc/systemd/system/ollama.service

Next we're going to move the /usr/share/ollama directory to /opt/LLM/jttw/components/ollama but first we'll need to elevate the user privelages in order to do so.

# Let's first elevate privelages
sudo su

Now we'll move the /usr/share/ollama directory to /opt/LLM/jttw/components/ollama and create a symbolic link from /usr/share/ollama directory to /opt/LLM/jttw/components/ollama

mv /usr/share/ollama /opt/LLM/jttw/components/ollama
ln -s /opt/LLM/jttw/components/ollama /usr/share/ollama

We nolonger need elevated privelages so we'll just exit that

exit

We're all set to reload the service with the changes we've made.

# Reload system services, enable and start Ollama service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Running Ollama Manually

Installing Ollama automatically installs it as a service that starts when the machine boots up. If for some reason the service is disabled or stopped and we need to run Ollama manually then we'd proceed as follows.

We're going to include OLLAMA_HOST=0.0.0.0:11434 to explicitly force Ollama to run on 11434.

# Launch Ollama server on port 11434
OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0:11434 ollama serve

Download Ollama Models

We'll need to download the LLM now. There's various good models to choose from which are documented at https://ollama.ai/library. For the purposes of this project we'll be using a number of different models for different purposes

Models Trained for Chatting/General Assistance

# Download chat/general assistance trained language models for Ollama
ollama pull openhermes2.5-mistral
ollama pull dolphin2.2-mistral:7b-q6_K
ollama pull dolphin-mixtral
ollama pull eas/nous-capybara
ollama pull samantha-mistral
ollama pull gemma
ollama pull command-r
ollama pull mixtral
ollama pull mistral
ollama pull openchat
ollama pull qwen
ollama pull falcon
ollama pull wizard-vicuna-uncensored
ollama pull nous-hermes2-mixtral
ollama pull wizardlm-uncensored
ollama pull everythinglm
ollama pull yarn-mistral
ollama pull stablelm-zephyr
ollama pull deepseek-llm
ollama pull nexusraven
ollama pull alfred
ollama pull xwinlm
ollama pull wizardlm2:7b-q6_K

Models Trained for Coding/Programming

# Download coding/programming trained language models for Ollama
ollama pull codellama
ollama pull codeup
ollama pull deepseek-coder
ollama pull magicoder
ollama pull open-orca-platypus2
ollama pull phind-codellama
ollama pull starcoder
ollama pull wizardcoder
ollama pull codegemma
ollama pull nous-hermes2
ollama pull starcoder2
ollama pull dolphincoder
ollama pull codebooga
ollama pull codeqwen

Models Trained for SQL/Database Queries

# Download sql/database query trained language models for Ollama
ollama pull sqlcoder
ollama pull duckdb-nsql

Models Trained for Math/Calculations

# Download math/calculations trained language models for Ollama
ollama pull wizard-math

Models Trained for Image Analysis

# Download image analysis trained language models for Ollama
ollama pull llava
ollama pull bakllava

Models Trained for Medical Tasks

ollama pull medllama2
ollama pull meditron

Test Ollama

Now that we have Ollama running and serving at least one model we need to test to make sure we're it's working properly. To test - in a new tab we're going to send a curl request to the Ollama server making sure to use one of the models we've downloaded in the "model": portion of the request. Since I've downloaded openhermes2.5-mistral that is what I'm going to specify in the "model": portion. It may take a moment but we should see activity in the tab where Ollama is running.

# Send test prompt via curl to Ollama endpoint serving openhermes2.5-mistral model on port 11434
curl -X POST -H "Content-Type: application/json" -d '{"model": "openhermes2.5-mistral", "prompt": "Why is the sky blue?"}' http://localhost:11434/api/generate

This test sent a prompt request asking "Why is the sky blue?" to our Ollama server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.


Open WebUI

Install Open Web-UI

git clone https://github.com/open-webui/open-webui.git /opt/LLM/jttw/components/open-webui
mkdir -p /opt/LLM/jttw/components/open-webui/open-webui_venv
cd /opt/LLM/jttw/components/open-webui/
python3 -m venv /opt/LLM/jttw/components/open-webui/open-webui_venv
source /opt/LLM/jttw/components/open-webui/open-webui_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge

cp -RPp .env.example .env

# Install NodeJS
sudo su
curl -fsSL https://deb.nodesource.com/setup_21.x | bash - &&\apt-get install -y nodejs
npm i
npm run build
exit


# Serving Frontend with the Backend
cd /opt/LLM/jttw/components/open-webui/backend
pip3 install -r requirements.txt -U

# Change the launch port from default 8080 to 11435
sed -i 's/PORT:-8080/PORT:-11435/g' /opt/LLM/jttw/components/open-webui/backend/start.sh

Running Open-WebUI as System Service

sudo nano /etc/systemd/system/open-webui.service
[Unit]
Description=Open WebUI Service
After=network.target

[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/open-webui/backend
ExecStart=/bin/bash -c ". /opt/LLM/jttw/components/open-webui/open-webui_venv/bin/activate && sh /opt/LLM/jttw/components/open-webui/backend/start.sh"
Restart=always

[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable open-webui
sudo systemctl restart open-webui

Running Open-WebUI Manually

If for some reason the service is disabled or stopped and we need to run Open-WebUI manually then we'd proceed as follows.

source /opt/LLM/jttw/components/open-webui/open-webui_venv/bin/activate
cd /opt/LLM/jttw/components/open-webui/backend
sh /opt/LLM/jttw/components/open-webui/backend/start.sh

Test Open-WebUI

We should now be able to access our Open-WebUI server by using http://0.0.0.0:11435 in any browser. We'll need to click the create account link when you first load it. The first account created is going to be the administrator by default.


LiteLLM

This implementation of LiteLLM -> Ollama is confirmed working with LiteLLM version 1.14.1 Results beyond version 1.14.1 of LiteLLM may vary. Confirmed version via pip3 show litellm

Name litellm
Version 1.14.1
Summary Library to easily interface with LLM API providers
Home-page
Author BerriAI
Author-email
License MIT
Location /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages
Requires aiohttp, appdirs, certifi, click, importlib-metadata, jinja2, openai, python-dotenv, tiktoken, tokenizers
Required-by

Install LiteLLM

We're installing specifically the LiteLLM 1.14.1 version of the LiteLLM python package because latest version 1.15.1 introduced a bug that broke Ollama requests. The fix should be out soon in the next release though. Before launching LiteLLM I'm going to make sure to kill any existing processes running on port 8000 - the default port for LiteLLM. This is to ensure that LiteLLM launches with the expected port.

## Kill processes running on port number 8000
lsof -ti :8000 | xargs -r kill
# Activate into our LiteLLM python environment
mkdir -p /opt/LLM/jttw/components/litellm/litellm_venv
cd /opt/LLM/jttw/components/litellm/
python3 -m venv /opt/LLM/jttw/components/litellm/litellm_venv
source /opt/LLM/jttw/components/litellm/litellm_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge

# Install LiteLLM python package and the required Async Generator python package
pip3 install 'litellm[proxy]'
pip3 install async_generator

Configure LiteLLM Servers

nano /opt/LLM/jttw/components/litellm/update_llm_servers.sh

In the /opt/LLM/jttw/components/litellm/update_llm_servers.sh file we'll place the following contents and save the file

#!/bin/bash

{
    echo "model_list:"
    ollama list | awk 'NR>1 {sub(/:latest$/, "", $1); print "  - model_name:", $1, "\n    litellm_params:\n      model: ollama/"$1"\n      api_base: http://0.0.0.0:11434"}'
} > /opt/LLM/jttw/components/litellm/llm_servers.yaml

Now we need to make our /opt/LLM/jttw/components/litellm/update_llm_servers.sh file executable by setting the appropriate chmod

chmod +x /opt/LLM/jttw/components/litellm/update_llm_servers.sh

Running LiteLLM as System Service

We're automatically updating the models config file for LiteLLM any time the LiteLLM service starts, this way our available models is always up to date to be used. We achieve this by echo'ing the formatted output from ollama list into our models config file /opt/LLM/jttw/components/litellm/llm_servers.yaml.

sudo nano /etc/systemd/system/litellm.service
[Unit]
Description=LiteLLM Service
After=network.target

[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/litellm/
ExecStart=/bin/bash -c 'sh /opt/LLM/jttw/components/litellm/update_llm_servers.sh && . /opt/LLM/jttw/components/litellm/litellm_venv/bin/activate && litellm --config /opt/LLM/jttw/components/litellm/llm_servers.yaml >
Restart=always

[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable litellm
sudo systemctl restart litellm

Running LiteLLM Manually

We're automatically updating the models config file for LiteLLM before launching LiteLLM, this way our available models is always up to date to be used. We achieve this by echo'ing the formatted output from ollama list into our models config file /opt/LLM/jttw/components/litellm/llm_servers.yaml.

We are adding the --debug argument so that we can see detailed info about requests LiteLLM receives and it's responses back. We're also adding the --add_function_to_prompt and --drop_params arguments when executing LiteLLM because MemGPT makes heavy use of function calling and params which Ollama does not support and without these arguments LiteLLM will produce errors indicating that Ollama doesn't support the function and param calls MemGPT is sending. All of our models are being automatically aliased in /opt/LLM/jttw/components/litellm/llm_servers.yaml allowing us to use the alias value when integrating with other software packages rather than needing to use the full --model value - so for example ollama/openhermes2.5-mistral is aliased and can be used as openhermes2.5-mistral, ollama/llava is aliased as llava, so forth and so on.

# Activate into our LiteLLM python environment
source /opt/LLM/jttw/components/litellm/litellm_venv/bin/activate
# Update LiteLLM models config file and launch LiteLLM proxy server using Ollama serving the model openhermes2.5-mistral on port 11434. 
cd source /opt/LLM/jttw/components/litellm
/opt/LLM/jttw/components/litellm/update_llm_servers.sh && litellm --config /opt/LLM/jttw/components/litellm/llm_servers.yaml --host 0.0.0.0 --port 8000 --debug --add_function_to_prompt --drop_params

Test LiteLLM

Now that we have LiteLLM running and serving as an OpenAI proxy for the Ollama endpoint we need to test to make sure we're it's working properly. To test - in a new tab we're going to send a curl request to the LiteLLM server making sure to use one of the models we've downloaded in the "model": portion of the request. Since I've downloaded openhermes2.5-mistral that is what I'm going to specify in the "model": portion, however any models that we've downloaded can be used. It may take a moment but we should see activity in the tab where Ollama is running as well as tab where LiteLLM is running.

# Send test prompt via curl to LiteLLM proxy server on port 8000 using Ollama serving the model openhermes2.5-mistral
curl --location 'http://0.0.0.0:8000/chat/completions' --header 'Content-Type: application/json' --data '{"model": "openhermes2.5-mistral", "messages": [{"role": "user", "content": "why is the sky blue?"}]}'

This test sent a prompt request asking "Why is the sky blue?" to our LiteLLM server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.


Fooocus

Install Fooocus

# Download the Fooocus repository
git clone https://github.com/lllyasviel/Fooocus.git /opt/LLM/jttw/components/fooocus
# Activate into our Fooocus python environment
mkdir -p /opt/LLM/jttw/components/fooocus/fooocus_venv
cd /opt/LLM/jttw/components/fooocus/
python3 -m venv /opt/LLM/jttw/components/fooocus/fooocus_venv
source /opt/LLM/jttw/components/fooocus/fooocus_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
pip3 install -r requirements_versions.txt

Because we want all of our generative outputs placed in one easy place ( /opt/LLM/jttw/output/ ) we're going to first move the output folder for Fooocus ( /opt/LLM/jttw/components/fooocus/outputs ), and then create a symbolic link in the old location pointing to the new location ( /opt/LLM/jttw/output/fooocus ).

mv /opt/LLM/jttw/components/fooocus/outputs /opt/LLM/jttw/output/fooocus
ln -s /opt/LLM/jttw/output/fooocus /opt/LLM/jttw/components/fooocus/outputs

Running Fooocus as System Service

sudo nano /etc/systemd/system/fooocus.service
[Unit]
Description=Fooocus Service
After=network-online.target

[Service]
Type=simple
User=owner
Group=owner
WorkingDirectory=/opt/LLM/jttw/components/fooocus
ExecStart=/usr/bin/bash -c ". /opt/LLM/jttw/components/fooocus/fooocus_venv/bin/activate && python3 entry_with_update.py --listen 8888"
Restart=always
RestartSec=3

[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable fooocus
sudo systemctl restart fooocus

Stable Diffusion WebUI

Install Stable Diffusion WebUI

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git /opt/LLM/jttw/components/stable-diffusion-webui
mkdir -p /opt/LLM/jttw/components/stable-diffusion-webui
mkdir -p /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv
cd /opt/LLM/jttw/components/stable-diffusion-webui
python3 -m venv /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv
source /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
sudo apt-get install libgoogle-perftools4 libtcmalloc-minimal4 -y
bash webui.sh --server-name 0.0.0.0 --port 7860 --device-id 0 --api --listen --xformers --enable-insecure-extension-access

Because we want all of our generative outputs placed in one easy place ( /opt/LLM/jttw/output/ ) we're going to first move the output folder for Stable Diffusion Webui ( /opt/LLM/jttw/components/stable-diffusion-webui/output ), and then create a symbolic link in the old location pointing to the new location ( /opt/LLM/jttw/output/stable-diffusion-webui ).

mv /opt/LLM/jttw/components/stable-diffusion-webui/output /opt/LLM/jttw/output/stable-diffusion-webui
ln -s /opt/LLM/jttw/output/stable-diffusion-webui /opt/LLM/jttw/components/stable-diffusion-webui/output

See the Stable Diffusion WebUI CLI documentation for more information about the CLI arguments/options.

Configuring Stable Diffusion WebUI

Notes For best results SD 1.5 Models keep CFG Scale at or below x, using X Sampling Method SDXL Turbo Models keep CFG Scale at or below 2, using either Euler A or LCM Sampling Method, about 25 sampling steps or higher results in higher detail SDXL Models keep CFG Scale below at or below 4, using either Euler A or LCM Sampling Method, about 25 sampling steps or higher results in higher detail

Extensions

Models

Running Stable Diffusion WebUI as System Service

sudo nano /etc/systemd/system/stable-diffusion-webui.service
[Unit]
Description=Stable Diffusion WebUI Service
After=network-online.target

[Service]
Type=simple
User=owner
Group=owner
WorkingDirectory=/opt/LLM/jttw/components/stable-diffusion-webui
ExecStart=/usr/bin/bash -c ". /opt/LLM/jttw/components/stable-diffusion-webui/stable-diffusion-webui_venv/bin/activate && /opt/LLM/jttw/components/stable-diffusion-webui/webui.sh --server-name 0.0.0.0 --device-id 0 --api --listen --xformers --enable-insecure-extension-access"
Restart=always
RestartSec=3

[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable stable-diffusion-webui
sudo systemctl restart stable-diffusion-webui

Guidance

This implementation of Guidance -> LiteLLM -> Ollama is confirmed working with Guidance version 0.1.10 Results beyond version 0.1.10 of Guidance may vary. Confirmed version via pip3 show guidance

Name guidance
Version 0.1.10
Summary A guidance language for controlling large language models.
Home-page https://github.com/guidance-ai/guidance
Author Scott Lundberg and Marco Tulio Ribeiro
Author-email [email protected]
License
Location /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages
Requires aiohttp, diskcache, gptcache, msal, numpy, openai, ordered-set, platformdirs, pyformlang, requests, tiktoken
Required-by

Install Guidance

# Activate into our Guidance python environment
mkdir -p /opt/LLM/jttw/components/guidance/guidance_venv
cd /opt/LLM/jttw/components/guidance/
python3 -m venv /opt/LLM/jttw/components/guidance/guidance_venv
source /opt/LLM/jttw/components/guidance/guidance_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
# Install Guidance python package ( this requires that the litellm[proxy] version 1.14.1 package is already installed )
pip3 install guidance==0.1.10
pip3 install 'litellm[proxy]'

Test Guidance

nano /opt/LLM/jttw/tests/guidance_litellm_test_001.py

Now that we have Guidance in place we need to test to make sure it's working properly. We'll create a python script in our /opt/LLM/jttw/tests/ directory and name the file guidance_litellm_test_001.py. Fun fact, the LiteLLM model is not called in Guidance like other models traditionally would be - example models.OpenAI, but rather by one of three methods as detailed in the Guidance _lite_llm.py class in their repo - LiteLLMChat, LiteLLMInstruct, or LiteLLMCompletion. For this test we're going to use LiteLLMCompletion. The contents of /opt/LLM/jttw/tests/guidance_litellm_test_001.py will be the following

from guidance import models, gen

# Create our test prompt
prompt = """
why is the sky blue?
"""
# Create our configuration for the LiteLLM endpoint. API Key is required but the value can be anything.
# Guidance explicitly requires the exact model that was supplied in the LiteLLM --model argument, you cannot use the LiteLLM --alias value
model_endpoint = models.LiteLLMCompletion(
    "ollama/openhermes2.5-mistral",
    temperature=0.8, 
    api_base="http://0.0.0.0:8000"
)

# Initiate our model endpoint, append our test prompt, and generate a response
lm = model_endpoint
lm += prompt
lm += gen()

# Convert the response to a string value and print it so we can read it
print(str(lm))

Next we'll enter our python environment for JTTW and run the python test script. If everything is working properly we should be having output from our LLM being served from Guidance through LiteLLM to Ollama.

# Activate into our Guidance python environment
source /opt/LLM/jttw/components/guidance/guidance_venv/bin/activate
# Run our Guidance-to-LiteLLM test script
python3 /opt/LLM/jttw/tests/guidance_litellm_test_001.py

This test sent a prompt request asking "Why is the sky blue?" via Guidance to our LiteLLM server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.


MemGPT

This implementation of MemGPT -> LiteLLM -> Ollama is confirmed working with MemGPT version 0.3.3. Results beyond version 0.3.3 of MemGPT may vary. Confirmed version via pip3 show pymemgpt

Name pymemgpt
Version 0.3.3
Summary Teaching LLMs memory management for unbounded context
Home-page
Author Charles Packer
Author-email [email protected]
License Apache License
Location /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages
Requires chromadb, demjson3, docstring-parser, docx2txt, html2text, httpx, lancedb, llama-index, numpy, prettytable, pydantic, pypdf, python-box, pytz, pyyaml, questionary, setuptools, tiktoken, tqdm, typer
Required-by

Install MemGPT

# Activate into our MemGPT python environment
mkdir -p /opt/LLM/jttw/components/memgpt/memgpt_venv
cd /opt/LLM/jttw/components/memgpt/
python3 -m venv /opt/LLM/jttw/components/memgpt/memgpt_venv
source /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge

# Install MemGPT local python package and the required Transformers and PyTorch python packages
pip3 install 'pymemgpt==0.3.3'
pip3 install transformers
pip3 install torch
pip3 install 'pymemgpt[local]==0.3.3'

Configure MemGPT

MemGPT needs the OPENAI_API_BASE set to our LiteLLM Proxy endpoint, and the OPENAI_API_KEY environment variable to be set and have a value. It doesn't matter what the value is for OPENAI_API_KEY, it just can't be blank.

# Set OPENAI_API_BASE to our LiteLLM Proxy endpoint and OPENAI_API_KEY environment variable to "key-to-success"
export OPENAI_API_BASE=http://0.0.0.0:8000
export OPENAI_API_KEY=key-to-success

Next we need to configure MemGPT in a way that it will work specifically with sending requests through LiteLLM to Ollama, and process responses back properly. In order to do this we need to make sure that the LLM inference provider is local, the LLM backend is webui, and that we are pointing the default endpoint to our LitelLLM endpoint at http://0.0.0.0:8000. If everything has been done correctly, after finishing the configuration process we should see Saving config to the memgpt config path in terminal.

# Activate into our MemGPT python environment
source /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate
# Configure MemGPT
memgpt configure

We'll want to make sure we answer the configuration questions as follows

    ? Select LLM inference provider: local
    ? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): vllm
    ? Enter default endpoint: http://localhost:8000
    ? Select default model: dolphin2.2-mistral:7b-q6_K
    ? Select default model wrapper (recommended: chatml): chatml-noforce-roles-grammar
    ? Is your LLM endpoint authenticated? (default no) No
    ? Select your model's context window (for Mistral 7B models, this is probably 8k / 8192): 8192
    ? Select embedding provider: local
    ? Select default preset: memgpt_chat
    ? Select default persona: sam_pov
    ? Select default human: basic
    ? Select storage backend for archival data: chroma
    ? Select chroma backend: persistent
    ? Select storage backend for recall data: sqlite

UPDATED NOTE: Upon further testing I've found that selecting any of the following combinations in MemGPT configure will work with MemGPT -> LiteLLM -> Ollama.

  • LLM intererence local, LLM backend vllm
  • LLM intererence local, LLM backend lmstudio

Running MemGPT as System Service

MemGPT can be set up to run as a server on port 8283 by default with callable API. We're going to set up a MemGPT server to run as a system service that starts up automatically and restarts should it crash.

sudo nano /etc/systemd/system/memgpt.service

In our /etc/systemd/system/memgpt.service file we'll place the following contents and save the file

[Unit]
Description=MemGPT Service
After=network.target

[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/memgpt/
ExecStart=/bin/bash -c ". /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate && memgpt server"
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable memgpt
sudo systemctl restart memgpt

Test MemGPT

Now that we have MemGPT running and sending requests to the LiteLLM endpoint we need to test to make sure it's working properly. We'll go through and have a conversation with a MemGPT LLM agent being served from MemGPT through LiteLLM to Ollama

# Activate into our MemGPT python environment
source /opt/LLM/jttw/components/memgpt/memgpt_venv/bin/activate
# Launch MemGPT using Ollama serving the model openhermes2.5-mistral via LiteLLM proxy on port 8000
memgpt run --model-endpoint http://0.0.0.0:8000 --model dolphin2.2-mistral:7b-q6_K --model-endpoint-type vllm --debug

This test should brings up a MemGPT LLM agent for us to converse with. After asking the agent several questions we should have received responses back with no errors in terminal. We cam also test our MemGPT server API via curl request to the send agent send message endpoint.

Note: Remove /api from the endpoint url. The documentation needs to be updated but the Discord channel indicated this.

Also user_id is needed. This is how memgpt will remember/recall your conversations - by referencing the user_id value that is passed in the arguments.

curl --request POST --url http://0.0.0.0:8283/agents/message --header 'accept: application/json' --header 'content-type: application/json' --data '{"user_id": "throwaway_user", "agent_id": "agent_1", "message": "why is the sky blue?", "stream": false, "role": "user"}'

This test sent a prompt request asking "Why is the sky blue?" via MemGPT to our LiteLLM server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.


AutoGen

This implementation of AutoGen -> LiteLLM -> Ollama is confirmed working with AutoGen version 0.2.8. Results beyond version 0.2.8 of AutoGen may vary. Confirmed version via pip3 show pyautogen

Name pyautogen
Version 0.2.8
Summary Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Home-page https://github.com/microsoft/autogen
Author AutoGen
Author-email [email protected]
License UNKNOWN
Location /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages
Requires diskcache, docker, flaml, openai, pydantic, python-dotenv, termcolor, tiktoken
Required-by autogenstudio

Install AutoGen

# Activate into our AutoGen python environment
mkdir -p /opt/LLM/jttw/components/autogen/autogen_venv
cd /opt/LLM/jttw/components/autogen/
python3 -m venv /opt/LLM/jttw/components/autogen/autogen_venv
source /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
# Install AutoGen python package
pip3 install pyautogen
pip3 install autogenstudio

Running AutoGen Studio as System Service

AutoGen Studio is a web interface that we can use to manage autogen agents, skills/tools for those agents, workflows, and sessions on port 8081. We're going to set up a AutoGen Studio to run as a system service that starts up automatically and restarts should it crash.

sudo nano /etc/systemd/system/autogenstudio.service

In our /etc/systemd/system/autogenstudio.service file we'll place the following contents and save the file

[Unit]
Description=AutoGen Studio Service
After=network.target

[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/autogen/
ExecStart=/bin/bash -c ". /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate && autogenstudio ui --port 8081"
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable autogenstudio
sudo systemctl restart autogenstudio

Running AutoGen Studio Manually

If for some reason the service is disabled or stopped and we need to run AutoGen Studio manually then we'd proceed as follows.

source /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate
autogenstudio ui --port 8081

Test AutoGen

nano /opt/LLM/jttw/tests/autogen_litellm_test_001.py

Now that we have AutoGen in place we need to test to make sure it's working properly. We'll create a python script in our /opt/LLM/jttw/tests/ directory and name the file autogen_litellm_test_001.py. The contents of /opt/LLM/jttw/tests/autogen_litellm_test_001.py will be the following

from autogen import UserProxyAgent, ConversableAgent

# Create our configuration for the LiteLLM endpoint. API Key is required but the value can be anything.
config_list = [
    {
        "base_url": "http://0.0.0.0:8000",
        "api_key": "key-to-success",
        "model": "openhermes2.5-mistral"
    }
]

# Create the agent that uses the LLM.
assistant = ConversableAgent(
    "agent", 
    llm_config = {
        "config_list": config_list
    }
)

# Create the agent that represents the user in the conversation.
user_proxy = UserProxyAgent(
    "user", 
    code_execution_config = False
)

# Let the assistant start the conversation.  It will end when the user types exit.
assistant.initiate_chat(
    user_proxy, 
    message = "How can I help you today?"
)

Next we'll enter our python environment for JTTW and run the python test script. If everything is working properly we should be having a conversation with a AutoGen LLM agent being served from AutoGen through LiteLLM to Ollama.

# Activate into our JTTW python environment
source /opt/LLM/jttw/components/autogen/autogen_venv/bin/activate
# Run our AutoGen-to-LiteLLM test script
python3 /opt/LLM/jttw/tests/autogen_litellm_test_001.py

This test should brings up an AutoGen LLM agent for us to converse with. After asking the agent several questions we should have received responses back with no errors in terminal.


PromptFoo

  • What is PromptFoo?
    • PromptFoo is a unit testing and evaluation tool for prompts allowing for a better understand and insight into prompts that consistantly behave as expected, and those that need refactored.
  • Installation
  • Documentation
  • Repository
  • Discord

Install PromptFoo

mkdir -p /opt/LLM/jttw/components/promptfoo
mkdir -p /opt/LLM/jttw/components/promptfoo/promptfoo_venv
cd /opt/LLM/jttw/components/promptfoo
python3 -m venv /opt/LLM/jttw/components/promptfoo/promptfoo_venv
source /opt/LLM/jttw/components/promptfoo/promptfoo_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
npm install -g promptfoo
promptfoo init

This will create a file called configuration file promptfooconfig.yaml in /opt/LLM/jttw/components/promptfoo which you can use to define the prompts you want to evaluate and their inputs.

Running PromptFoo as System Service

PromptFoo can be utilized with a web interface making it much easier to manage and review model evaluations. We're going to set PromptFoo to launch in web ui mode as a system service that starts up automatically and restarts should it crash.

sudo nano /etc/systemd/system/promptfoo.service

In our /etc/systemd/system/promptfoo.service file we'll place the following contents and save the file. We'll need to make sure to set Environment=PATH= to the output from echo $PATH which gives us the value of out PATH environment variable, and ExecStart= to the output of echo $(command -v node); which gives us the path of our node application, followed by a space and the output of echo $(command -v promptfoo); which gives us the path of our promptfoo application.

Promptfoo has the ability to cache evaluation results for the purposes of saving time and cost with paid service providers however we want fresh results each time we evaluate so we'll turn off PromptFoo caching by explicitely specifying Environment=PROMPTFOO_CACHE_ENABLED=false

[Unit]
Description=PromptFoo Service
After=network-online.target

[Service]
Type=simple
WorkingDirectory=/opt/LLM/jttw/components/promptfoo
Environment=PATH=<output of echo $PATH>
Environment=PROMPTFOO_CACHE_ENABLED=false
ExecStart=<output of echo $(command -v node);> <output of echo $(command -v promptfoo);> view
Restart=always
RestartSec=3

[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable promptfoo
sudo systemctl restart promptfoo

Running PromptFoo Manually

If for some reason the service is disabled or stopped and we need to run PromptFoo manually then we'd proceed as follows. The output of echo $(command -v node); gives us the path of our node application, followed by a space and the output of echo $(command -v promptfoo); gives us the path of our promptfoo application.

cd /opt/LLM/jttw/components/promptfoo
promptfoo view

Test PromptFoo

PromptFoo web ui uses port 15500 by default. To access the web ui and test we'll open http://0.0.0.0:15500 in a browser. We'll want to


Zrok

Zrok can be utilized to make our services and endoints each individually public facing - when configured a particular way. It should be noted that Zrok creates a folder located in /var/lib/private/ for each share that is created. If at any point after the initialization/creation of a share there are changes made the the .env file for that share, the service for that share needs to be stopped, the folder for the share in /var/lib/private/ needs to be deleted, and the api reference for the share needs to be deleted in order the changes to take affect.

Install Zrok

For this we're going to use the Zrok frontdoor feature.

# Install Zrok 
curl -sSLf https://get.openziti.io/install.bash \
| sudo bash -s zrok-share

Now we need to update the zrok-share.env file with our Zrok environemnt token for our account. All of our zrok environment files are stores in /opt/openziti/etc/zrok/ by default.

sudo nano /opt/openziti/etc/zrok/zrok-share.env

If we've already created a zrok account we can find this token by logging in and navigating to the api page, clicking our email address in the top right hand corner or the page, clicking the 'Enable Environment' option, and copying the alpha numeric code that appears in the pop up window. So for example if the pop up displays 'zrok enable q8KVXTMNEp7T' we will want to copy 'q8KVXTMNEp7T'. Set the value of ZROK_UNIQUE_NAME to q8KVXTMNEp7T

bash
ZROK_UNIQUE_NAME="<YOUR ZROK ENVIRONMENT TOKEN>"

Running Zrok Proxy to Ollama as System Service

We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-ollama.env which our Zrok Ollama share is going to use for our Zrok webtunnel proxy to Ollama endpoint.

sudo nano /opt/openziti/etc/zrok/zrok-share-ollama.env

In the /opt/openziti/etc/zrok/zrok-share-ollama.env file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN and ZROK_UNIQUE_NAME to our own specific values.

ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="ollama"
ZROK_TARGET="http://0.0.0.0:11434"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the Ollama share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"

Next we'll create the Zrok Ollama service file /etc/systemd/system/zrok-share-ollama.service.

sudo nano /etc/systemd/system/zrok-share-ollama.service

In our /etc/systemd/system/zrok-share-ollama.service file we'll put the following contents.

[Unit]
Description=Zrok Ollama reserved public share service
After=network-online.target

[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-ollama
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-ollama.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-ollama.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

We're going to stop, and disable any currently existing zrok-share-ollama service, reload the service daemon, and then enable and start the zrok-share-ollama service for our change to the system services to take affect.

sudo systemctl stop zrok-share-ollama
sudo systemctl disable zrok-share-ollama
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-ollama
sudo systemctl restart zrok-share-ollama

Running Zrok Proxy to Open WebUI as System Service

We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-open-webui.env which our Zrok Open WebUI share is going to use for our Zrok webtunnel proxy to Open WebUI endpoint.

sudo nano /opt/openziti/etc/zrok/zrok-share-open-webui.env

In the /opt/openziti/etc/zrok/zrok-share-open-webui.env file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN and ZROK_UNIQUE_NAME to our own specific values.

ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="open-webui"
ZROK_TARGET="http://0.0.0.0:11435"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the Open WebUI share>"
ZROK_BACKEND_MODE="proxy"

Next we'll create the Zrok Open WebUI service file /etc/systemd/system/zrok-share-open-webui.service.

sudo nano /etc/systemd/system/zrok-share-open-webui.service

In our /etc/systemd/system/zrok-share-open-webui.service file we'll put the following contents.

[Unit]
Description=Zrok Open-WebUI reserved public share service
After=network-online.target

[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-open-webui
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-open-webui.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-open-webui.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

We're going to stop, and disable any currently existing zrok-share-open-webui service, reload the service daemon, and then enable and start the zrok-share-open-webui service for our change to the system services to take affect.

sudo systemctl stop zrok-share-open-webui
sudo systemctl disable zrok-share-open-webui
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-open-webui
sudo systemctl restart zrok-share-open-webui

Running Zrok Proxy to LiteLLM as System Service

We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-litellm.env which our Zrok LiteLLM share is going to use for our Zrok webtunnel proxy to LiteLLM endpoint.

sudo nano /opt/openziti/etc/zrok/zrok-share-litellm.env

In the /opt/openziti/etc/zrok/zrok-share-litellm.env file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN and ZROK_UNIQUE_NAME to our own specific values.

ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="litellm"
ZROK_TARGET="http://0.0.0.0:8000"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the LiteLLM share>"
ZROK_BACKEND_MODE="proxy"

Next we'll create the Zrok LiteLLM service file /etc/systemd/system/zrok-share-litellm.service.

sudo nano /etc/systemd/system/zrok-share-litellm.service

In our /etc/systemd/system/zrok-share-litellm.service file we'll put the following contents.

[Unit]
Description=Zrok LiteLLM reserved public share service
After=network-online.target

[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-litellm
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-litellm.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-litellm.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

We're going to stop, and disable any currently existing zrok-share-litellm service, reload the service daemon, and then enable and start the zrok-share-litellm service for our change to the system services to take affect.

sudo systemctl stop zrok-share-litellm
sudo systemctl disable zrok-share-litellm
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-litellm
sudo systemctl restart zrok-share-litellm

Running Zrok Proxy to MemGPT as System Service

We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-memgpt.env which our Zrok MemGPT share is going to use for our Zrok webtunnel proxy to MemGPT endpoint.

sudo nano /opt/openziti/etc/zrok/zrok-share-memgpt.env

In the /opt/openziti/etc/zrok/zrok-share-memgpt.env file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN and ZROK_UNIQUE_NAME to our own specific values.

ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="memgpt"
ZROK_TARGET="http://0.0.0.0:8283"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the MemGPT share>"
ZROK_BACKEND_MODE="proxy"

Next we'll create the Zrok MemGPT service file /etc/systemd/system/zrok-share-memgpt.service.

sudo nano /etc/systemd/system/zrok-share-memgpt.service

In our /etc/systemd/system/zrok-share-memgpt.service file we'll put the following contents.

[Unit]
Description=Zrok MemGPT reserved public share service
After=network-online.target

[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-memgpt
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-memgpt.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-memgpt.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

We're going to stop, and disable any currently existing zrok-share-memgpt service, reload the service daemon, and then enable and start the zrok-share-memgpt service for our change to the system services to take affect.

sudo systemctl stop zrok-share-memgpt
sudo systemctl disable zrok-share-memgpt
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-memgpt
sudo systemctl restart zrok-share-memgpt

Running Zrok Proxy to AutoGen Studio as System Service

We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-autogenstudio.env which our Zrok AutoGen Studio share is going to use for our Zrok webtunnel proxy to AutoGen Studio endpoint.

sudo nano /opt/openziti/etc/zrok/zrok-share-autogenstudio.env

In the /opt/openziti/etc/zrok/zrok-share-autogenstudio.env file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN and ZROK_UNIQUE_NAME to our own specific values. We're also going to add basic authentication with ZROK_BASIC_AUTH formatted like user:pass to prevent unauthorized access. Authentication credentials will be requested when attempting to access the AutoGen Studio share.

ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="autogenstudio"
ZROK_TARGET="http://0.0.0.0:8081"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the AutoGen Studio share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"

Next we'll create the Zrok AutoGen Studio service file /etc/systemd/system/zrok-share-autogenstudio.service.

sudo nano /etc/systemd/system/zrok-share-autogenstudio.service

In our /etc/systemd/system/zrok-share-autogenstudio.service file we'll put the following contents.

[Unit]
Description=Zrok AutoGen Studio reserved public share service
After=network-online.target

[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-autogenstudio
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-autogenstudio.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-autogenstudio.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

We're going to stop, and disable any currently existing zrok-share-autogenstudio service, reload the service daemon, and then enable and start the zrok-share-autogenstudio service for our change to the system services to take affect.

sudo systemctl stop zrok-share-autogenstudio
sudo systemctl disable zrok-share-autogenstudio
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-autogenstudio
sudo systemctl restart zrok-share-autogenstudio

Running Zrok Proxy to PromptFoo as System Service

We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-promptfoo.env which our Zrok PromptFoo share is going to use for our Zrok webtunnel proxy to PromptFoo endpoint.

sudo nano /opt/openziti/etc/zrok/zrok-share-promptfoo.env

In the /opt/openziti/etc/zrok/zrok-share-promptfoo.env file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN and ZROK_UNIQUE_NAME to our own specific values. We're also going to add basic authentication with ZROK_BASIC_AUTH formatted like user:pass to prevent unauthorized access. Authentication credentials will be requested when attempting to access the PromptFoo share.

ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="promptfoo"
ZROK_TARGET="http://0.0.0.0:15500"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the PromptFoo share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"

Next we'll create the Zrok PromptFoo service file /etc/systemd/system/zrok-share-promptfoo.service.

sudo nano /etc/systemd/system/zrok-share-promptfoo.service

In our /etc/systemd/system/zrok-share-promptfoo.service file we'll put the following contents.

[Unit]
Description=Zrok PromptFoo reserved public share service
After=network-online.target

[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-promptfoo
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-promptfoo.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-promptfoo.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

We're going to stop, and disable any currently existing zrok-share-promptfoo service, reload the service daemon, and then enable and start the zrok-share-promptfoo service for our change to the system services to take affect.

sudo systemctl stop zrok-share-promptfoo
sudo systemctl disable zrok-share-promptfoo
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-promptfoo
sudo systemctl restart zrok-share-promptfoo

Running Zrok Proxy to Stable Diffusion WebUI as System Service

We're going to create the required .env file /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env which our Zrok Stable Diffusion WebUI share is going to use for our Zrok webtunnel proxy to Stable Diffusion WebUI endpoint.

sudo nano /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env

In the /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env file we'll put the following contents, making sure to changed the values for ZROK_ENABLE_TOKEN and ZROK_UNIQUE_NAME to our own specific values. We're also going to add basic authentication with ZROK_BASIC_AUTH formatted like user:pass to prevent unauthorized access. Authentication credentials will be requested when attempting to access the Stable Diffusion WebUI share.

ZROK_ENABLE_TOKEN="<your zrok enable token>"
ZROK_ENVIRONMENT_NAME="stable-diffusion-webui"
ZROK_TARGET="http://0.0.0.0:8081"
ZROK_UNIQUE_NAME="<a unique alphanumeric name for the Stable Diffusion WebUI share>"
ZROK_BACKEND_MODE="proxy"
ZROK_BASIC_AUTH="<a username of your choosing>:<a password of your choosing>"

Next we'll create the Zrok Stable Diffusion WebUI service file /etc/systemd/system/zrok-share-stable-diffusion-webui.service.

sudo nano /etc/systemd/system/zrok-share-stable-diffusion-webui.service

In our /etc/systemd/system/zrok-share-stable-diffusion-webui.service file we'll put the following contents.

[Unit]
Description=Zrok Stable Diffusion WebUI reserved public share service
After=network-online.target

[Service]
Type=simple
DynamicUser=yes
StateDirectory=zrok-share-stable-diffusion-webui
UMask=0007
Environment=PFXLOG_NO_JSON=true
ExecStartPre=/opt/openziti/bin/zrok-enable.bash /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env
ExecStart=/opt/openziti/bin/zrok-share.bash /opt/openziti/etc/zrok/zrok-share-stable-diffusion-webui.env
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

We're going to stop, and disable any currently existing zrok-share-stable-diffusion-webui service, reload the service daemon, and then enable and start the zrok-share-stable-diffusion-webui service for our change to the system services to take affect.

sudo systemctl stop zrok-share-stable-diffusion-webui
sudo systemctl disable zrok-share-stable-diffusion-webui
sudo systemctl daemon-reload
sudo systemctl enable zrok-share-stable-diffusion-webui
sudo systemctl restart zrok-share-stable-diffusion-webui

Auto-Restart Zrok Proxies

crontab -e
# Restart each of our Zrok Proxies at 3:00 AM
0 3 * * * systemctl restart zrok-share-ollama
0 3 * * * systemctl restart zrok-share-open-webui
0 3 * * * systemctl restart zrok-share-litellm
0 3 * * * systemctl restart zrok-share-memgpt
0 3 * * * systemctl restart zrok-share-autogenstudio
0 3 * * * systemctl restart zrok-share-promptfoo
0 3 * * * systemctl restart zrok-share-stable-diffusion-webui

Additional Notes

For Those Using WSL 2

If we were using a WSL 2 instance as our Linux environment and we wanted to make the Ollama and/or LiteLLM end points available to network clients/other machines on our network then we'd need to open the ports that are being used for Ollama and LiteLLM and then configure a proxy to forward incoming requests from the network to the WSL 2 instance.

This can be done by running the following commands in Windows Powershell as an administrator. In the following example my Windows host machine's local area network IP is 192.168.2.2 and the WSL 2 instances IP is 172.22.74.100. You would want to change the IP's used in this example to those corresponding to the appropriate host and WSL 2 in your own set up.

netsh advfirewall firewall add rule name="Ollama LLM Server Allow Port 11434" dir=in action=allow protocol=TCP localport=11434
netsh interface portproxy add v4tov4 listenaddress=192.168.2.2 listenport=11434 connectaddress=172.22.74.100 connectport=11434

netsh advfirewall firewall add rule name="LiteLLM Proxy Server Allow Port 8000" dir=in action=allow protocol=TCP localport=8000
netsh interface portproxy add v4tov4 listenaddress=192.168.2.2 listenport=8000 connectaddress=172.22.74.100 connectport=8000

To verify that we're able to successfully send a request to the WSL 2 instance from another network client/other machine we'll use the following in terminal/cmd

curl --location "http://192.168.2.2:8000/chat/completions" --header "Content-Type: application/json" --data "{\"model\": \"openhermes2.5-mistral\", \"messages\": [{\"role\": \"user\", \"content\": \"why is the sky blue?\"}]}"

Langchain

This implementation of Langchain -> Ollama is confirmed working with Langchain version 0.0.352 . Results beyond version 0.0.352 of Langchain may vary. Confirmed version via pip3 show langchain

Name langchain
Version 0.0.352
Summary Building applications with LLMs through composability
Home-page https://github.com/langchain-ai/langchain
Author
Author-email
License MIT
Location /opt/LLM/jttw/jttw_venv/lib/python3.10/site-packages
Requires aiohttp, async-timeout, dataclasses-json, jsonpatch, langchain-community, langchain-core, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by langchain-experimental

Install Langchain

# Activate into our JTTW python environment
source /opt/LLM/jttw/jttw_venv/bin/activate
# Install Langchain python package and the additional packages Playwright, and Beautiful Soup for web scraping
pip3 install langchain
pip3 install langchain-cli
pip3 install langchain-experimental
pip3 install playwright
pip3 install beautifulsoup4
# Install Playwright
playwright install

Test Langchain

Now that we have Langchain in place we need to test to make sure it's working properly. We'll create a python script in our /opt/LLM/jttw/tests/ directory and name the file langchain_ollama_test_001.py. The contents of /opt/LLM/jttw/tests/langchain_ollama_test_001.py will be the following

from langchain.chat_models import ChatLiteLLM
from langchain.schema import HumanMessage

chat = ChatLiteLLM(
        api_base="http://0.0.0.0:11434",
        model="ollama/openhermes2.5-mistral",
)

messages = [
    HumanMessage(
        content="why is the sky blue?"
    )
]

response = chat(messages)

print(response)

Next we'll enter our python environment for JTTW and run the python test script. If everything is working properly we should be having output from our LLM being served from Langchain to Ollama.

# Activate into our JTTW python environment
source /opt/LLM/jttw/jttw_venv/bin/activate
# Run our Langchain-to-Ollama test script
python3 /opt/LLM/jttw/tests/langchain_ollama_test_001.py

This test sent a prompt request asking "Why is the sky blue?" via Langchain to our Ollama server. Momentarily we should receive a response back explaining scientifically why the sky appears blue to us.



  • Sweep

  • Llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git /opt/LLM/jttw/components/llama.cpp
mkdir -p /opt/LLM/jttw/components/llama.cpp/llama.cpp_venv
cd /opt/LLM/jttw/components/llama.cpp/
python3 -m venv /opt/LLM/jttw/components/llama.cpp/llama.cpp_venv
source /opt/LLM/jttw/components/llama.cpp/llama.cpp_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge

pip3 install -r requirements.txt
  • TTS-WebUI

git clone https://github.com/rsxdalv/one-click-installers-tts.git /opt/LLM/jttw/components/tts-webui
mkdir -p /opt/LLM/jttw/components/tts-webui/tts-webui_venv
cd /opt/LLM/jttw/components/tts-webui/
python3 -m venv /opt/LLM/jttw/components/tts-webui/tts-webui_venv
source /opt/LLM/jttw/components/tts-webui/tts-webui_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge

chmod +x /opt/LLM/jttw/components/tts-webui/start_linux.sh
./start_linux.sh
  • Rope

git clone https://github.com/Hillobar/Rope /opt/LLM/jttw/components/rope
mkdir -p /opt/LLM/jttw/components/rope/rope_venv
cd /opt/LLM/jttw/components/rope/
python3 -m venv /opt/LLM/jttw/components/rope/rope_venv
source /opt/LLM/jttw/components/rope/rope_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge

pip3 install -r requirements.txt
  • DSPy

cd /opt/LLM/jttw
source /opt/LLM/jttw/jttw_venv/bin/activate
pip3 install git+https://github.com/stanfordnlp/dspy.git
python3 -m pip install --upgrade pip
pip3 cache purge
  • Neo4j

sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo nala update
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.com stable latest' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo nala update
sudo nala install neo4j
sudo nala start neo4j
  • WhisperX

git clone https://github.com/m-bain/whisperX /opt/LLM/jttw/components/whisperx
cd /opt/LLM/jttw/components/whisperx/
python3 -m venv /opt/LLM/jttw/components/whisperx/whisperx_venv
source /opt/LLM/jttw/components/whisperx/whisperx_venv/bin/activate
python3 -m pip install --upgrade pip
pip3 cache purge
pip3 install -e .

Will have to log into hugging face and visit https://hf.co/pyannote/segmentation-3.0, https://huggingface.co/pyannote/voice-activity-detection, and https://huggingface.co/pyannote/speaker-diarization to accept the user conditions.

whisperx --model large-v2 --language en --vad_onset 0.10 --vad_offset 0.05 "<path to audio file>"