Using Generative Ai in Unity - artorias961/Smart-Program-VR-AR-XR-Tech-for-Virtual-Learning-Space-Summer-2024 GitHub Wiki

In this section, we will create a Unity script that allows us to connect and send question to our local AI server.


We should have Unity installed. Installation instructions can be found here.

For our AI, we will be using the open source llama.cpp project alongside the large language model llama-2-7B-Chat from TheBloke. We also have to make sure that the LLM we downloaded is a CHAT model, or else the AI won't respond correctly to our prompt.

Installing and hosting llama-server

Downloading llama.cpp

To install llama.cpp, we can either build the program ourselves by following the instruction on the project page, or we can go over to the project's releases page and download the correct pre-built zip file based on our system's architecture. Once we have downloaded the zip file and extracted them, we should have a bunch of executable files inside our folder.

We need to look for an executable called llama-server, as that is the program we will be using. The wiki page of llama.cpp has a llama-server section that goes into extensive detail on how to run the program. For this project, we just need to know enough to host the server with our llama-2 LLM.

Hosting llama-server

Note: For this tutorial, I will be using the llama-2-7b-chat.Q4_K_M.gguf model, but you are free to choose any of its variants. If you are using another GGUF file, make sure to replace the name llama-2-7b-chat.Q4_K_M.gguf in the commands below with the name of your file.

Assume that we have already downloaded the GGUF file from TheBloke, place the file in the same folder where llama-server resides. We then need run the following command


./llama-server -m llama-2-7b-chat.Q4_K_M.gguf


.\llama-server.exe -m llama-2-7b-chat.Q4_K_M.gguf

If we have done everything correctly, we will get an output on our terminal that is similar to the image below.

With the server running, we can directly access its web interface by going to the URL


on our browser.

By scrolling to the bottom, we can see a prompt field where we can type our question to the AI.

After typing our question, we will hit "send" and the AI will generate a reply for us. Depending on our question, we might have to keep chatting with the AI until it can give us the complete answer.

Connecting our Llama-server to Unity

To connect Unity to the llama-server, we will be using Unity's HTTP request API. The llama-server's documentation has an example of using the CURL command to prompt our question to the AI. We will adapt this format in our Unity code

curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

In Unity, we will create a new script and attach this script to our camera object. Unity has a documentation on using the POST request that we will use alongside the format from the CURL request from above.

One last thing to consider, llama-2 has a prompt template that we need to follow.

[INST] <<SYS>>
You are a helpful assistant. Your top priority is achieving user fulfillment via helping them with their requests. 

This is very important because the AI will start giving us incoherent response if we do not follow this format.

Unity code:

using System;
using System.Collections;
using System.Diagnostics;
using UnityEngine;
using UnityEngine.Networking;

public class NewBehaviourScript : MonoBehaviour
    // Start is called before the first frame update
    void Start()

    IEnumerator Upload()
        String question = "Can you tell me how what is the solution of 3 + 4*(7-2)?"; //Put your question here
        String system = "You are a helpful assistant. Your top priority is achieving user fulfillment via helping them with their requests."; //change the ai's behavior here if you need it to act differently
        String prompt = $" \"prompt\": \"[INST] <<SYS>> {system} <</SYS>> {question} [/INST]\"  ";
        Stopwatch st = new Stopwatch();
        using (UnityWebRequest www = UnityWebRequest.Post("http://localhost:8080/completions", $"{{ {prompt} }}", "application/json"))
            yield return www.SendWebRequest();

            if (www.result != UnityWebRequest.Result.Success)
                UnityEngine.Debug.Log("Form upload complete!");
        UnityEngine.Debug.Log("This prompt took " + st.ElapsedMilliseconds / 1000.0 + " seconds to complete.");

    public class MyOutput
        public string content;
        public static MyOutput CreateFromJSON(String jsonString){

            return JsonUtility.FromJson<MyOutput>(jsonString);

    public void PrintOutput(String jsonString){
        MyOutput test = MyOutput.CreateFromJSON(jsonString);
        string outString = test.content;
⚠️ ** Fallback** ⚠️