Plugin Context - SemlerPDX/OpenAI-VoiceAttack-Plugin GitHub Wiki

This is the complete list of all possible Plugin Context options for the OpenAI API Plugin for VoiceAttack, along with their descriptions, what they return and how, and any VoiceAttack variables used by that individual plugin call. Most are quite straight-forward and self explanatory, but two special context modifiers can be applied individually or simultaneously to ChatGPT context calls to the plugin. These are Raw. and .Session, with Session dictating a looping ChatGPT flow until the OpenAI_Chatting# boolean variable becomes False. A Raw ChatGPT call (or Session) is a very trimmed down version of the full-featured ChatGPT method used by this plugin for VoiceAttack. Refer to example commands and functions in the provided OpenAI Sample Profile for VoiceAttack, which has notes from me (SemlerPDX) about all the various ways profile developers can make use of these systems, along with working voice commands for each system you can try for yourself. See complete details below.

Standard ChatGPT calls can access an integrated Whisper processing phase on the user input, which can transcribe or translate the dictation audio file into English text directly before sending the user input to ChatGPT. This feature adds a tiny 1-2 second delay to ChatGPT requests, which themselves can take from 3-5 seconds to return, or longer depending on many factors including the length of the prompt, the complexity of the input and response generation for it, or even heavy traffic on OpenAI servers which provide this VoiceAttack Plugin access to their API. The tradeoff is very much worth it, which is why I designed this directly into ChatGPT context calls so users don't need to do their own pre-processing using possibly slower means (calling this plugin twice for the same command, for example). Windows Speech Recognition Engine (which nearly all of us use for VoiceAttack) is not at all good at properly capturing what we said during Dictation, and the optional Whisper context modifier perfectly understands what we say each and every time. NOTE: Because this is a second call to the OpenAI during a ChatGPT Request, which is itself a call to the OpenAI API, using this feature will add to the monetary cost of each Whisper.ChatGPT context call made from this OpenAI Plugin for VoiceAttack. That being said, due to the nature of inputs from this VoiceAttack plugin being typically just a few sentences, this cost is quite negligible and again, very much worth it!

Learn more about OpenAI API Call Pricing and Costs here: https://openai.com/pricing

See also OpenAI_Context

Profiles can use a variable in a {TXT} token to provide the Context for a call:
Using a Variable in Context Field


... or Context can be written directly into the Plugin Action in VoiceAttack:
Using Direct Text in Context Field



NOTE: All Plugin Context is case-insensitive! chatGpT.sEssion is the same as ChatGPT.Session


Context List



Key Form Menu


CONTEXT:

KeyForm


USAGE: Save or Delete the OpenAI API Key in a GUI Form Window

RETURNS:

None


DESCRIPTION:
This is a very basic GUI Menu I created for users to enter their OpenAI API Key and save it to file or delete it from the file. For each Plugin Context listed below, the OpenAI Plugin for VoiceAttack will first attempt to read this keyfile, and if not present or not containing a valid API key, will attempt to read the OpenAI_API_Key variable, and if still no valid API Key has been set, the only Plugin Context call which will work is this one. Profile developers are more than encouraged to create their own means of allowing their users to set their API Key (if they have not already personally used this plugin and it's included GUI Menu to set and save it).

The location of this keyfile will always be: %AppData%\OpenAI_VoiceAttack_Plugin\key.openai
Default keyfile format: OPENAI_API_KEY=sk-12345abcdefghijklmnopqrstuvwxyz
See also OpenAI_API_Org

VoiceAttack Variables used:


      |   (context list)   |   (table of contents)   |   (back to top)   |




Text Completion


CONTEXT:

Completion


USAGE: Send Text Completion Request to the OpenAI API

RETURNS:

OpenAI_Response value: Text Completion Response(s)


DESCRIPTION:
Send a text completion request to the OpenAI API using the supplied input, and receive a return containing the completed text, using the specified parameters (optional). The OpenAI_Model variable must contain one of the Completion models for OpenAI Text Completion Requests. The default DavinciText model will be used if Not set, or if a ChatGPT model is provided.

VoiceAttack Variables used:


      |   (context list)   |   (table of contents)   |   (back to top)   |




ChatGPT Request

| (view flow charts) |

graph LR;
    id1{{ChatGPT}}-->GetInput;
    GetInput-->Timeout;
    Timeout-->ChatComplete;
    GetInput-->UserInput;
    GetInput-->ExistingCommand;
    UserInput-->WhisperAudioToText;
    UserInput-->SendInput;
    WhisperAudioToText-->SendInput;
    SendInput-->GetResponse;
    SendInput-->ExistingCommand;
    GetResponse-->ChatComplete;
    GetResponse-->InternalResponders;
    GetResponse-->ExternalResponder;
    InternalResponders-->ChatComplete;
    InternalResponders-->ExternalResponder;
    ExternalResponder-->ChatComplete;
    ExistingCommand-->ChatComplete;
    ChatComplete-->id2[\PluginCallEnd/];
    ChatComplete-->LoopSession;
    LoopSession-->id1{{ChatGPT}};
    LoopSession-->|Timeout|WaitForProfile;
    WaitForProfile-->|''Hey VoiceAttack''|id1{{ChatGPT}};
Loading

CONTEXT:

ChatGPT

CONTEXT MODIFIERS ALLOWED:

ChatGPT.Session

Whisper.ChatGPT

Whisper.ChatGPT.Session

Whisper.Transcribe.ChatGPT

Whisper.Transcribe.ChatGPT.Session

Whisper.Translate.ChatGPT

Whisper.Translate.ChatGPT.Session


USAGE: Send ChatGPT Request(s) to the OpenAI API

RETURNS:

OpenAI_Response value: ChatGPT Response(s)

OpenAI_TTS_Response value: ChatGPT Response(s) post-processed [code blocks/URL's culled]

OpenAI_ResponseCode value: Code blocks culled from ChatGPT (TTS) Response(s)

OpenAI_ResponseLinks value: Hyperlinks culled from ChatGPT (TTS) Response(s), ";" (semicolon) delimited


DESCRIPTION:
Send a ChatGPT request to the OpenAI API and get a response tailored by the System Prompt (and input/output examples, if any) for the input prompt which was sent. When Whisper. is used with no modifier of type, it is the same as using a Whisper.Transcribe. context modifier. The primary design of the ChatGPT context is for vocal conversations with OpenAI's ChatGPT system, where responses are spoken using text-to-speech. This context will use a very specialized method that allows for several modifications to the manner in which it functions, from the way it gathers input to the way it presents responses, giving full control to profile builders to customize the flow and actions at each phase of a single or looping session ChatGPT Request. With no additional context modifiers, ChatGPT context will send a single chat request to the OpenAI API using the supplied input (if provided), using the specified parameters (optional). Valid responses will execute a Text-to-Speech command in VoiceAttack to speak the response if allowed.

The plugin will begin the default or custom Listen and/or Dictation Stop commands if no input is provided. The default OpenAI_SystemPrompt, when Not set or empty, is preset as a specialized "ChatGPT for Text-to-Speech" system, see this text variable description for more details, including the default system prompt contents.

When the .Session modifier is used, the call will loop back around to the GetInput phase after a response has been processed (by default, spoken with text-to-speech). Options for printing to the VoiceAttack Event Log, speaking or not speaking responses, executing an External Responder command, and even an External 'wait to continue' command are all available for profile developers. When a Whisper modifier is applied, with either the Transcribe. or Translate. option set, the default Dictation Stop method in the GetInput phase will save the short audio file of the last dictation input and then send this file to the OpenAI Whisper API for transcription into English text, which will then be used as the text input prompt sent to the ChatGPT request. This pre-process is quite fast compared to the processing of the input prompt by ChatGPT, with Whisper returning short transcriptions of a sentence or few within 2 seconds, as compared to the typically 3-5 second return of a ChatGPT response to an input prompt (if not longer).

A note regarding System Prompts and Input/Output Examples: The larger the system prompt and the more example input/output pairs included, the more data that needs to be processed by the OpenAI API for each request. This can potentially increase the per-call cost and response time of each request, depending on the pricing model used by the OpenAI API and the specific task being performed. It's important to be mindful of the size of the system prompt and the number of example input/output pairs included, and to balance the need for fine-tuning with the cost of each API call. In general, it's a good idea to start with a relatively small system prompt and a few example inputs/outputs, and gradually increase the size and complexity as needed to achieve the desired performance. This can help keep the per-call cost of the API within reasonable limits while still achieving good results. See notes in the descriptions of these input/output example variables for more details.


VoiceAttack Variables used:


      |   (context list)   |   (table of contents)   |   (back to top)   |




Raw ChatGPT Request

| (view flow charts) |

graph LR;
    id1{{ChatGPT.Raw}}-->HasInput;
    HasInput-->SendInput;
    SendInput-->EmptyInput;
    EmptyInput-->Responded;
    SendInput-->GetResponse;
    GetResponse-->ExternalResponder;
    GetResponse-->Responded;
    ExternalResponder-->Responded;
    Responded-->ChatComplete;
    Responded-->WaitForProfile;
    WaitForProfile-->ChatComplete;
    ChatComplete-->PluginCallEnd;
    ChatComplete-->LoopSession;
    LoopSession-->HasInput;
Loading

CONTEXT:

ChatGPT.Raw

CONTEXT MODIFIERS ALLOWED:

ChatGPT.Raw.Session


USAGE: Send ChatGPT Data Request(s) to the OpenAI API

RETURNS:

OpenAI_Response value: ChatGPT Response(s)


DESCRIPTION:
Send a ChatGPT request to the OpenAI API and get a response tailored by the System Prompt (and input/output examples, if any) for the input prompt which was sent. The primary design of the ChatGPT.Raw context is for data input processing with OpenAI's ChatGPT system, where responses are never directly spoken using text-to-speech in VoiceAttack, and which can operate at the same time as a standard ChatGPT.Session (with any modifier but .Raw).

A Raw ChatGPT call (or Session) is a very trimmed down version of the full-featured ChatGPT method used by this plugin for VoiceAttack. While it does allow the OpenAI_ExternalContinue and OpenAI_ExternalResponder flow controls, 'wait to continue' will not be available outside a .Session context and cannot use the OpenAI_ExternalContinue_Unstoppable option, also there is no OpenAI_LogChat or OpenAI_SpeakChat after a return, it has no pre-listen or pre-process feedback options, and it only sets the OpenAI_Response text variable to the raw response from OpenAI ChatGPT.

The purpose and design of this method is for concurrent ChatGPT requests using a possibly different OpenAI_SystemPrompt.
A Session cannot be established in a Raw ChatGPT call if one is already enabled anywhere (another Raw Session or standard ChatGPT Session), and a non-Session Raw ChatGPT call will bypass OpenAI_ExternalContinue even when True. This exists so that profiles can use ChatGPT to process data or gather responses to input on the side during a simultaneous ChatGPT session with completely different call parameters, including the OpenAI_SystemPrompt.

A note regarding System Prompts and Input/Output Examples: The larger the system prompt and the more example input/output pairs included, the more data that needs to be processed by the OpenAI API for each request. This can potentially increase the per-call cost and response time of each request, depending on the pricing model used by the OpenAI API and the specific task being performed. It's important to be mindful of the size of the system prompt and the number of example input/output pairs included, and to balance the need for fine-tuning with the cost of each API call. In general, it's a good idea to start with a relatively small system prompt and a few example inputs/outputs, and gradually increase the size and complexity as needed to achieve the desired performance. This can help keep the per-call cost of the API within reasonable limits while still achieving good results. See notes in the descriptions of these input/output example variables for more details.


VoiceAttack Variables used:



      |   (context list)   |   (table of contents)   |   (back to top)   |




Whisper Audio Processing


CONTEXT:

Whisper

CONTEXT MODIFIERS ALLOWED:

Whisper.Transcribe

Whisper.Translate


USAGE: Send Audio File as a Whisper Request to the OpenAI API

RETURNS:

OpenAI_Response value: The translated or transcribed text in English


DESCRIPTION:
Send an audio Transcription or Translation request to the OpenAI Whisper API, which turns English and non-English audio into English text. When no modifier is used, it is the same as using a Whisper.Transcribe context. Windows Speech Recognition Engine (which nearly all of us use for VoiceAttack) is not at all good at properly capturing what we said during Dictation, and the Whisper system perfectly understands what we say each and every time. Whisper processing is quite fast as it applies to captured dictation phrases in VoiceAttack, though it depends on the size of the file and the amount of speech to process into text, naturally. First, set the OpenAI_AudioFile text variable to the path of the audio file you wish to use for this Whisper call, set the context as desired, and execute the plugin.

The return will be swift, and can be detected by waiting for the plugin to complete or the value of the OpenAI_Response text variable to become anything other than Not set. You may also use the OpenAI_AudioPath text variable instead, which will not be cleared (Not set) by the plugin at the end of any/all plugin calls, if you need to retain this path for further post processing following a Whisper call - but be sure to clear its value to Not set when you are done with it, or any/all future plugin calls which use Whisper context (or context modifiers) will only read this file!

VoiceAttack Variables used:


      |   (context list)   |   (table of contents)   |   (back to top)   |




Dall-E Image Request


CONTEXT:

DallE

CONTEXT MODIFIERS ALLOWED:

DallE.Generation

DallE.Editing

DallE.Editing.Bytes

DallE.Variation

DallE.Variation.Bytes


USAGE: Send Image Request to the OpenAI Dall-E API

RETURNS:

OpenAI_Response value: URL hyperlinks to Images, ";" (semicolon) delimited


DESCRIPTION:
Send an image request to the OpenAI Dall-E API, which uses artificial intelligence to generate, variate, and edit images. When no modifier is used, it is the same as using a DallE.Generation context. You must provide a user input prompt value contained in the OpenAI_UserInput text variable prior to executing a DallE.Generation or DallE.Editing plugin call. Using the .Bytes context modifier is an Upload (Bytes) option which can reduce the overhead of reading an image file from disk and transferring it over the internet, since it is directly uploading the binary data of the image. This can potentially result in faster response times and lower data transfer costs on average. Responses will always be in URL form, separated by ";" (semicolon), making it very easy to present these choices in a VoiceAttack 'Get Choice' action, which creates its lists of choices from values separated by ";" (semicolon).

The provided OpenAI Plugin Sample Profile contains working examples with extensive comments describing options for how profile developers could potentially provide input to (and process responses from) Dall-E requests. Each request for an image will return at least one URL to that image, set the OpenAI_ImageSize text variable to the dimensions every image in returned URLs should be, and the OpenAI_ImageCount text variable to the number of image URLs to return. At time of writing, each individual image cost is between $0.016 and $0.02 (USD), making this one of the most expensive features of OpenAI and this Plugin for VoiceAttack. Increasing OpenAI_ImageCount and using the maximum OpenAI_ImageSize may also increase the time to process a return. Refer to OpenAI Documentation on how to make use of Image Masks for Editing, and be sure to monitor your own usage of this feaure. Since it can cost up to two cents per API call, with a maximum of 10 images returned per call, real money could be wasted if not used wisely and within our individual means and budgets.

NOTE: At time of writing, the OpenAI documentation specifies that "The Images API is in beta. During this time the API and models will evolve based on your feedback. To ensure all users can prototype comfortably, the default rate limit is 50 images per minute."


Learn more about Pricing and Costs here: https://openai.com/pricing
Learn more about Rate Limits here: https://platform.openai.com/docs/guides/rate-limits
Learn more about the Images API here: https://platform.openai.com/docs/guides/images
Learn more about improving your Dall-E Image Prompts here:
https://help.openai.com/en/articles/6582391-how-can-i-improve-my-prompts-with-dall-e

VoiceAttack Variables used:


      |   (context list)   |   (table of contents)   |   (back to top)   |




Moderation Request


CONTEXT:

Moderation

CONTEXT MODIFIERS ALLOWED:

Moderation.Check

Moderation.Explain


USAGE: Send Text Input as a Moderation Request to the OpenAI API

RETURNS:

OpenAI_Response value: The original unmodified input, a flagged text-to-speech message, or flagged categories

OpenAI_ContentFlagged value: True (if flagged), or False


DESCRIPTION:
Send an input content flag request to the OpenAI Moderation API, which evaluates text input for inappropriate content. When no modifier is used, it is the same as using a Moderation.Check context. The Moderation context provides a means to check if a body of text would be flagged by other systems of OpenAI API which require an input prompt, and fairly well apply to general sensibilities and civility as far as the majority of modern society considers. The OpenAI_ContentFlagged boolean variable will become True following a Moderation request if the content has been flagged, and if not, it will be False and the OpenAI_Response text variable will contain the original OpenAI_UserInput which was sent to the Moderation request.

When the .Explain context modifier is used, and content has been flagged, the value of the OpenAI_Response text variable will contain the categories which apply to the reasons the content was flagged, formatted for text-to-speech with "," (commas) and the word "and" before the final item (when more than one category applies). This is structured to be used at the end of a sentence which may begin such as, "The content was flagged for ", where the value of the return format is similar to "reason1, reason2, and reason3". When the .Check context modifier is used, and content has been flagged, the value of the OpenAI_Response text variable will contain the phrase "The input provided has been flagged as inappropriate." unless a custom phrase has been set to the OpenAI_TTS_ContentFlagged text-to-speech variable. The OpenAI_ContentFlagged boolean variable is set to False at the beginning of any Moderation context plugin call, it never reads it or clears it to Not set.



VoiceAttack Variables used:


      |   (context list)   |   (table of contents)   |   (back to top)   |




File Request


CONTEXT:

File

CONTEXT MODIFIERS ALLOWED:

File.List

File.Upload

File.Delete


USAGE: Send File Request to the OpenAI API

RETURNS:

OpenAI_Response value: The deleted file name, uploaded file ID, or list of file names

OpenAI_TTS_Response value: A success or failure text-to-speech message


DESCRIPTION:
Send a file request to the OpenAI Files API. The functions are as plain as the context modifiers. When no modifier is used, it is the same as using a File.List context. Refer to OpenAI Documentation regarding use of files, file structure, format and purpose. The OpenAI Plugin Sample Profile for VoiceAttack includes working demonstration commands and a sample fine-tuning file in the .JSON format for the purpose of testing upload, list, and delete commands. Uploaded files can be used across various endpoints/features. Currently, the size of all the files uploaded by one organization can be up to 1 GB, though users can contact OpenAI to request an increase to their storage limit. Similar to the Embedding context below, this is an advanced feature of the OpenAI API, and will require education and testing by users of this plugin or profile developers for VoiceAttack who wish to use this feature.

The .List context will return a ";" (semicolon) separated list of file names, or the return will be empty if no files exist, and the OpenAI_TTS_Response will contain a text-to-speech message stating such. This command should be used as the means to acquire file name(s) to provide to a VoiceAttack command or function which would subsequently call a .Delete context plugin call, as the OpenAI Files API expects these to be exactly as they know them. If profile developers would like to create systems to also store the list of names locally for users, this could be kept in sync with the files on their OpenAI API account and eliminate the need to call a .List context plugin call first. These returns are super fast, and unless the files to be processed are very large, the costs are also quite small. Although, the true use of these files may be very large files, with a purpose of fine-tuning an OpenAI model which allows it, so again, refer to OpenAI Documentation and Pricing information regarding this advance feature. The complete details of the OpenAI Files API is beyond the scope of this guide, and this plugin merely provides a working interface VoiceAttack profile users and developers can access as needed. The plugin always clears the OpenAI_FilePath text variable to Not set at the end of every plugin call regardless of the context.

Learn more about Files here: https://platform.openai.com/docs/guides/fine-tuning



VoiceAttack Variables used:


      |   (context list)   |   (table of contents)   |   (back to top)   |




Embedding Request


CONTEXT:

Embedding


USAGE: Send Embedding Request to the OpenAI API

RETURNS:

OpenAI_EmbeddingResponse value: A "\r\n" (newline) and "; " (semicolon+space) separated string of returned metadata.


DESCRIPTION:
Send inputs as Embedding requests to the OpenAI API. Just like the File context above, this too is an advanced feature of the OpenAI API, and will require education and testing by users of this plugin or profile developers for VoiceAttack who wish to use this feature. The metadata returned will contain a "\r\n" (newline) and "; " (semicolon+space) separated list, or the return will be empty if an error occurs, in which case the OpenAI_Error will also have become set to True by the plugin. The entire topic of OpenAI API Embeddings is far beyond the scope of this guide, and again, like the Files context above, this plugin merely provides a working interface VoiceAttack profile users and developers can access as needed. The plugin never clears the OpenAI_EmbeddingInput text variable though it can be overwritten in a new call, and the plugin will always empty the value of the OpenAI_EmbeddingResponse text variable at the beginning of every Embedding context plugin call.

As best as ol' SemlerPDX here understands it so far, Embeddings are a way to represent text as a set of numbers that capture the meaning and context of the text, allowing for comparisons between different texts. They are generated using a deep learning model, where similar texts will have similar embeddings. Essentially, embeddings allow for the comparison of the similarity between two different texts, even if they are written differently.

So how might embeddings be applicable in a real-world example like VoiceAttack? One potential use case could be to create a database of all the possible commands that users can speak in their VoiceAttack profile, storing the inputs and corresponding responses along with their embeddings. When a new input is received that is not recognized by the system, the embeddings for that input are calculated on the fly, and then compared to the stored embeddings in the database using cosine similarity. The most similar interactions would then be returned for reference, allowing for systems to recognize and process unrecognized commands more accurately and quickly if properly designed. Hypothetically, such a system could then execute the poorly recognized command (by name) and hopefully in a system that can do so faster than the users are forced to repeat themselves. Just pulling that out of the clouds, but I hope the analogy is sound.

NOTE: A response will contain mostly metadata which you will be responsible for parsing.

Learn more about Embeddings here: https://platform.openai.com/docs/guides/embeddings/what-are-embeddings
Learn more about Embeddings Limitations and Risks here: https://platform.openai.com/docs/guides/embeddings/limitations-risks



VoiceAttack Variables used:



      |   (context list)   |   (table of contents)   |   (back to top)   |



⚠️ **GitHub.com Fallback** ⚠️